Error analysis of stochastic gradient descent ranking.
Chen, Hong; Tang, Yi; Li, Luoqing; Yuan, Yuan; Li, Xuelong; Tang, Yuanyan
2013-06-01
Ranking is always an important task in machine learning and information retrieval, e.g., collaborative filtering, recommender systems, drug discovery, etc. A kernel-based stochastic gradient descent algorithm with the least squares loss is proposed for ranking in this paper. The implementation of this algorithm is simple, and an expression of the solution is derived via a sampling operator and an integral operator. An explicit convergence rate for leaning a ranking function is given in terms of the suitable choices of the step size and the regularization parameter. The analysis technique used here is capacity independent and is novel in error analysis of ranking learning. Experimental results on real-world data have shown the effectiveness of the proposed algorithm in ranking tasks, which verifies the theoretical analysis in ranking error.
Nexus Between Protein–Ligand Affinity Rank-Ordering, Biophysical Approaches, and Drug Discovery
2013-01-01
The confluence of computational and biophysical methods to accurately rank-order the binding affinities of small molecules and determine structures of macromolecular complexes is a potentially transformative advance in the work flow of drug discovery. This viewpoint explores the impact that advanced computational methods may have on the efficacy of small molecule drug discovery and optimization, particularly with respect to emerging fragment-based methods. PMID:24900579
Castillo, Jonathan; Stueve, Theresa R.; Marconett, Crystal N.
2017-01-01
Previously thought of as junk transcripts and pseudogene remnants, long non-coding RNAs (lncRNAs) have come into their own over the last decade as an essential component of cellular activity, regulating a plethora of functions within multicellular organisms. lncRNAs are now known to participate in development, cellular homeostasis, immunological processes, and the development of disease. With the advent of next generation sequencing technology, hundreds of thousands of lncRNAs have been identified. However, movement beyond mere discovery to the understanding of molecular processes has been stymied by the complicated genomic structure, tissue-restricted expression, and diverse regulatory roles lncRNAs play. In this review, we will focus on lncRNAs involved in lung cancer, the most common cause of cancer-related death in the United States and worldwide. We will summarize their various methods of discovery, provide consensus rankings of deregulated lncRNAs in lung cancer, and describe in detail the limited functional analysis that has been undertaken so far. PMID:29113413
Liu, Jiangang; Jolly, Robert A.; Smith, Aaron T.; Searfoss, George H.; Goldstein, Keith M.; Uversky, Vladimir N.; Dunker, Keith; Li, Shuyu; Thomas, Craig E.; Wei, Tao
2011-01-01
Toxicogenomics promises to aid in predicting adverse effects, understanding the mechanisms of drug action or toxicity, and uncovering unexpected or secondary pharmacology. However, modeling adverse effects using high dimensional and high noise genomic data is prone to over-fitting. Models constructed from such data sets often consist of a large number of genes with no obvious functional relevance to the biological effect the model intends to predict that can make it challenging to interpret the modeling results. To address these issues, we developed a novel algorithm, Predictive Power Estimation Algorithm (PPEA), which estimates the predictive power of each individual transcript through an iterative two-way bootstrapping procedure. By repeatedly enforcing that the sample number is larger than the transcript number, in each iteration of modeling and testing, PPEA reduces the potential risk of overfitting. We show with three different cases studies that: (1) PPEA can quickly derive a reliable rank order of predictive power of individual transcripts in a relatively small number of iterations, (2) the top ranked transcripts tend to be functionally related to the phenotype they are intended to predict, (3) using only the most predictive top ranked transcripts greatly facilitates development of multiplex assay such as qRT-PCR as a biomarker, and (4) more importantly, we were able to demonstrate that a small number of genes identified from the top-ranked transcripts are highly predictive of phenotype as their expression changes distinguished adverse from nonadverse effects of compounds in completely independent tests. Thus, we believe that the PPEA model effectively addresses the over-fitting problem and can be used to facilitate genomic biomarker discovery for predictive toxicology and drug responses. PMID:21935387
When drug discovery meets web search: Learning to Rank for ligand-based virtual screening.
Zhang, Wei; Ji, Lijuan; Chen, Yanan; Tang, Kailin; Wang, Haiping; Zhu, Ruixin; Jia, Wei; Cao, Zhiwei; Liu, Qi
2015-01-01
The rapid increase in the emergence of novel chemical substances presents a substantial demands for more sophisticated computational methodologies for drug discovery. In this study, the idea of Learning to Rank in web search was presented in drug virtual screening, which has the following unique capabilities of 1). Applicable of identifying compounds on novel targets when there is not enough training data available for these targets, and 2). Integration of heterogeneous data when compound affinities are measured in different platforms. A standard pipeline was designed to carry out Learning to Rank in virtual screening. Six Learning to Rank algorithms were investigated based on two public datasets collected from Binding Database and the newly-published Community Structure-Activity Resource benchmark dataset. The results have demonstrated that Learning to rank is an efficient computational strategy for drug virtual screening, particularly due to its novel use in cross-target virtual screening and heterogeneous data integration. To the best of our knowledge, we have introduced here the first application of Learning to Rank in virtual screening. The experiment workflow and algorithm assessment designed in this study will provide a standard protocol for other similar studies. All the datasets as well as the implementations of Learning to Rank algorithms are available at http://www.tongji.edu.cn/~qiliu/lor_vs.html. Graphical AbstractThe analogy between web search and ligand-based drug discovery.
Cruz-Monteagudo, Maykel; Borges, Fernanda; Cordeiro, M Natália D S; Cagide Fajin, J Luis; Morell, Carlos; Ruiz, Reinaldo Molina; Cañizares-Carmenate, Yudith; Dominguez, Elena Rosa
2008-01-01
Up to now, very few applications of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies have been reported in the literature. However, none of them report the optimization of objectives related directly to the final pharmaceutical profile of a drug. In this paper, a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies, simultaneously considering the potency, bioavailability, and safety of a set of drug candidates, is introduced. The results of the desirability-based MOOP (the levels of the predictor variables concurrently producing the best possible compromise between the properties determining an optimal drug candidate) are used for the implementation of a ranking method that is also based on the application of desirability functions. This method allows ranking drug candidates with unknown pharmaceutical properties from combinatorial libraries according to the degree of similarity with the previously determined optimal candidate. Application of this method will make it possible to filter the most promising drug candidates of a library (the best-ranked candidates), which should have the best pharmaceutical profile (the best compromise between potency, safety and bioavailability). In addition, a validation method of the ranking process, as well as a quantitative measure of the quality of a ranking, the ranking quality index (Psi), is proposed. The usefulness of the desirability-based methods of MOOP and ranking is demonstrated by its application to a library of 95 fluoroquinolones, reporting their gram-negative antibacterial activity and mammalian cell cytotoxicity. Finally, the combined use of the desirability-based methods of MOOP and ranking proposed here seems to be a valuable tool for rational drug discovery and development.
Chen, Qianting; Dai, Congling; Zhang, Qianjun; Du, Juan; Li, Wen
2016-10-01
To study the prediction performance evaluation with five kinds of bioinformatics software (SIFT, PolyPhen2, MutationTaster, Provean, MutationAssessor). From own database for genetic mutations collected over the past five years, Chinese literature database, Human Gene Mutation Database, and dbSNP, 121 missense mutations confirmed by functional studies, and 121 missense mutations suspected to be pathogenic by pedigree analysis were used as positive gold standard, while 242 missense mutations with minor allele frequency (MAF)>5% in dominant hereditary diseases were used as negative gold standard. The selected mutations were predicted with the five software. Based on the results, the performance of the five software was evaluated for their sensitivity, specificity, positive predict value, false positive rate, negative predict value, false negative rate, false discovery rate, accuracy, and receiver operating characteristic curve (ROC). In terms of sensitivity, negative predictive value and false negative rate, the rank was MutationTaster, PolyPhen2, Provean, SIFT, and MutationAssessor. For specificity and false positive rate, the rank was MutationTaster, Provean, MutationAssessor, SIFT, and PolyPhen2. For positive predict value and false discovery rate, the rank was MutationTaster, Provean, MutationAssessor, PolyPhen2, and SIFT. For area under the ROC curve (AUC) and accuracy, the rank was MutationTaster, Provean, PolyPhen2, MutationAssessor, and SIFT. The prediction performance of software may be different when using different parameters. Among the five software, MutationTaster has the best prediction performance.
Song, J; Doucette, C; Hanniford, D; Hunady, K; Wang, N; Sherf, B; Harrington, J J; Brunden, K R; Stricker-Krongrad, A
2005-06-01
Target-based high-throughput screening (HTS) plays an integral role in drug discovery. The implementation of HTS assays generally requires high expression levels of the target protein, and this is typically accomplished using recombinant cDNA methodologies. However, the isolated gene sequences to many drug targets have intellectual property claims that restrict the ability to implement drug discovery programs. The present study describes the pharmacological characterization of the human histamine H3 receptor that was expressed using random activation of gene expression (RAGE), a technology that over-expresses proteins by up-regulating endogenous genes rather than introducing cDNA expression vectors into the cell. Saturation binding analysis using [125I]iodoproxyfan and RAGE-H3 membranes revealed a single class of binding sites with a K(D) value of 0.77 nM and a B(max) equal to 756 fmol/mg of protein. Competition binding studies showed that the rank order of potency for H3 agonists was N(alpha)-methylhistamine approximately (R)-alpha- methylhistamine > histamine and that the rank order of potency for H3 antagonists was clobenpropit > iodophenpropit > thioperamide. The same rank order of potency for H3 agonists and antagonists was observed in the functional assays as in the binding assays. The Fluorometic Imaging Plate Reader assays in RAGE-H3 cells gave high Z' values for agonist and antagonist screening, respectively. These results reveal that the human H3 receptor expressed with the RAGE technology is pharmacologically comparable to that expressed through recombinant methods. Moreover, the level of expression of the H3 receptor in the RAGE-H3 cells is suitable for HTS and secondary assays.
NASA Technical Reports Server (NTRS)
McGreevy, Michael W.; Connors, Mary M. (Technical Monitor)
2001-01-01
To support Search Requests and Quick Responses at the Aviation Safety Reporting System (ASRS), four new QUORUM methods have been developed: keyword search, phrase search, phrase generation, and phrase discovery. These methods build upon the core QUORUM methods of text analysis, modeling, and relevance-ranking. QUORUM keyword search retrieves ASRS incident narratives that contain one or more user-specified keywords in typical or selected contexts, and ranks the narratives on their relevance to the keywords in context. QUORUM phrase search retrieves narratives that contain one or more user-specified phrases, and ranks the narratives on their relevance to the phrases. QUORUM phrase generation produces a list of phrases from the ASRS database that contain a user-specified word or phrase. QUORUM phrase discovery finds phrases that are related to topics of interest. Phrase generation and phrase discovery are particularly useful for finding query phrases for input to QUORUM phrase search. The presentation of the new QUORUM methods includes: a brief review of the underlying core QUORUM methods; an overview of the new methods; numerous, concrete examples of ASRS database searches using the new methods; discussion of related methods; and, in the appendices, detailed descriptions of the new methods.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery
Chen, Yang; Xu, Rong
2017-01-01
Abstract Motivation: Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. Results: We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach (p
Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access
NASA Astrophysics Data System (ADS)
Yang, C.; Huang, T.; Armstrong, E. M.; Moroni, D. F.; Liu, K.; Gui, Z.
2013-12-01
Abstract: We present a Semantically optiMize the dAta seRvice operaTion (SMART) system for better data discovery and access across the NASA data systems, Global Earth Observation System of Systems (GEOSS) Clearinghouse and Data.gov to facilitate scientists to select Earth observation data that fit better their needs in four aspects: 1. Integrating and interfacing the SMART system to include the functionality of a) semantic reasoning based on Jena, an open source semantic reasoning engine, b) semantic similarity calculation, c) recommendation based on spatiotemporal, semantic, and user workflow patterns, and d) ranking results based on similarity between search terms and data ontology. 2. Collaborating with data user communities to a) capture science data ontology and record relevant ontology triple stores, b) analyze and mine user search and download patterns, c) integrate SMART into metadata-centric discovery system for community-wide usage and feedback, and d) customizing data discovery, search and access user interface to include the ranked results, recommendation components, and semantic based navigations. 3. Laying the groundwork to interface the SMART system with other data search and discovery systems as an open source data search and discovery solution. The SMART systems leverages NASA, GEO, FGDC data discovery, search and access for the Earth science community by enabling scientists to readily discover and access data appropriate to their endeavors, increasing the efficiency of data exploration and decreasing the time that scientists must spend on searching, downloading, and processing the datasets most applicable to their research. By incorporating the SMART system, it is a likely aim that the time being devoted to discovering the most applicable dataset will be substantially reduced, thereby reducing the number of user inquiries and likewise reducing the time and resources expended by a data center in addressing user inquiries. Keywords: EarthCube; ECHO, DAACs, GeoPlatform; Geospatial Cyberinfrastructure References: 1. Yang, P., Evans, J., Cole, M., Alameh, N., Marley, S., & Bambacus, M., (2007). The Emerging Concepts and Applications of the Spatial Web Portal. Photogrammetry Engineering &Remote Sensing,73(6):691-698. 2. Zhang, C, Zhao, T. and W. Li. (2010). The Framework of a Geospatial Semantic Web based Spatial Decision Support System for Digital Earth. International Journal of Digital Earth. 3(2):111-134. 3. Yang C., Raskin R., Goodchild M.F., Gahegan M., 2010, Geospatial Cyberinfrastructure: Past, Present and Future,Computers, Environment, and Urban Systems, 34(4):264-277. 4. Liu K., Yang C., Li W., Gui Z., Xu C., Xia J., 2013. Using ontology and similarity calculations to rank Earth science data searching results, International Journal of Geospatial Information Applications. (in press)
USDA-ARS?s Scientific Manuscript database
The key components of biocontrol product development—discovery, fermentation, and formulation—are interactively linked to each other and ultimately to product performance. To identify biocontrol agents suited for commercial development, our discovery programs utilize a cumulative ranking system tha...
Generalization Performance of Regularized Ranking With Multiscale Kernels.
Zhou, Yicong; Chen, Hong; Lan, Rushi; Pan, Zhibin
2016-05-01
The regularized kernel method for the ranking problem has attracted increasing attentions in machine learning. The previous regularized ranking algorithms are usually based on reproducing kernel Hilbert spaces with a single kernel. In this paper, we go beyond this framework by investigating the generalization performance of the regularized ranking with multiscale kernels. A novel ranking algorithm with multiscale kernels is proposed and its representer theorem is proved. We establish the upper bound of the generalization error in terms of the complexity of hypothesis spaces. It shows that the multiscale ranking algorithm can achieve satisfactory learning rates under mild conditions. Experiments demonstrate the effectiveness of the proposed method for drug discovery and recommendation tasks.
Rank and independence in contingency table
NASA Astrophysics Data System (ADS)
Tsumoto, Shusaku
2004-04-01
A contingency table summarizes the conditional frequencies of two attributes and shows how these two attributes are dependent on each other. Thus, this table is a fundamental tool for pattern discovery with conditional probabilities, such as rule discovery. In this paper, a contingency table is interpreted from the viewpoint of statistical independence and granular computing. The first important observation is that a contingency table compares two attributes with respect to the number of equivalence classes. For example, a n x n table compares two attributes with the same granularity, while a m x n(m >= n) table compares two attributes with different granularities. The second important observation is that matrix algebra is a key point of analysis of this table. Especially, the degree of independence, rank plays a very important role in evaluating the degree of statistical independence. Relations between rank and the degree of dependence are also investigated.
Knowledge extraction from evolving spiking neural networks with rank order population coding.
Soltic, Snjezana; Kasabov, Nikola
2010-12-01
This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.
Dewhurst, Henry M.; Choudhury, Shilpa; Torres, Matthew P.
2015-01-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)—a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits—conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit–N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. PMID:26070665
Quantitative estimation of pesticide-likeness for agrochemical discovery.
Avram, Sorin; Funar-Timofei, Simona; Borota, Ana; Chennamaneni, Sridhar Rao; Manchala, Anil Kumar; Muresan, Sorel
2014-12-01
The design of chemical libraries, an early step in agrochemical discovery programs, is frequently addressed by means of qualitative physicochemical and/or topological rule-based methods. The aim of this study is to develop quantitative estimates of herbicide- (QEH), insecticide- (QEI), fungicide- (QEF), and, finally, pesticide-likeness (QEP). In the assessment of these definitions, we relied on the concept of desirability functions. We found a simple function, shared by the three classes of pesticides, parameterized particularly, for six, easy to compute, independent and interpretable, molecular properties: molecular weight, logP, number of hydrogen bond acceptors, number of hydrogen bond donors, number of rotatable bounds and number of aromatic rings. Subsequently, we describe the scoring of each pesticide class by the corresponding quantitative estimate. In a comparative study, we assessed the performance of the scoring functions using extensive datasets of patented pesticides. The hereby-established quantitative assessment has the ability to rank compounds whether they fail well-established pesticide-likeness rules or not, and offer an efficient way to prioritize (class-specific) pesticides. These findings are valuable for the efficient estimation of pesticide-likeness of vast chemical libraries in the field of agrochemical discovery. Graphical AbstractQuantitative models for pesticide-likeness were derived using the concept of desirability functions parameterized for six, easy to compute, independent and interpretable, molecular properties: molecular weight, logP, number of hydrogen bond acceptors, number of hydrogen bond donors, number of rotatable bounds and number of aromatic rings.
Phenome-driven disease genetics prediction toward drug discovery.
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-06-15
Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. nlp. edu/public/data/DMN © The Author 2015. Published by Oxford University Press.
Context-sensitive network-based disease genetics prediction and its implications in drug discovery.
Chen, Yang; Xu, Rong
2017-04-01
Disease phenotype networks play an important role in computational approaches to identifying new disease-gene associations. Current disease phenotype networks often model disease relationships based on pairwise similarities, therefore ignore the specific context on how two diseases are connected. In this study, we propose a new strategy to model disease associations using context-sensitive networks (CSNs). We developed a CSN-based phenome-driven approach for disease genetics prediction, and investigated the translational potential of the predicted genes in drug discovery. We constructed CSNs by directly connecting diseases with associated phenotypes. Here, we constructed two CSNs using different data sources; the two networks contain 26 790 and 13 822 nodes respectively. We integrated the CSNs with a genetic functional relationship network and predicted disease genes using a network-based ranking algorithm. For comparison, we built Similarity-Based disease Networks (SBN) using the same disease phenotype data. In a de novo cross validation for 3324 diseases, the CSN-based approach significantly increased the average rank from top 12.6 to top 8.8% for all tested genes comparing with the SBN-based approach ( p
An ion channel library for drug discovery and safety screening on automated platforms.
Wible, Barbara A; Kuryshev, Yuri A; Smith, Stephen S; Liu, Zhiqi; Brown, Arthur M
2008-12-01
Ion channels represent the third largest class of targets in drug discovery after G-protein coupled receptors and kinases. In spite of this ranking, ion channels continue to be under exploited as drug targets compared with the other two groups for several reasons. First, with 400 ion channel genes and an even greater number of functional channels due to mixing and matching of individual subunits, a systematic collection of ion channel-expressing cell lines for drug discovery and safety screening has not been available. Second, the lack of high-throughput functional assays for ion channels has limited their use as drug targets. Now that automated electrophysiology has come of age and provided the technology to assay ion channels at medium to high throughput, we have addressed the need for a library of ion channel cell lines by constructing the Ion Channel Panel (ChanTest Corp., Cleveland, OH). From 400 ion channel genes, a collection of 82 of the most relevant human ion channels for drug discovery, safety, and human disease has been assembled.Each channel has been stably overexpressed in human embryonic kidney 293 or Chinese hamster ovary cells. Cell lines have been selected and validated on automated electrophysiology systems to facilitate cost-effective screening for safe and selective compounds at earlier stages in the drug development process. The screening and validation processes as well as the relative advantages of different screening platforms are discussed.
Li, Guo-Bo; Yu, Zhu-Jun; Liu, Sha; Huang, Lu-Yi; Yang, Ling-Ling; Lohans, Christopher T; Yang, Sheng-Yong
2017-07-24
Small-molecule target identification is an important and challenging task for chemical biology and drug discovery. Structure-based virtual target identification has been widely used, which infers and prioritizes potential protein targets for the molecule of interest (MOI) principally via a scoring function. However, current "universal" scoring functions may not always accurately identify targets to which the MOI binds from the retrieved target database, in part due to a lack of consideration of the important binding features for an individual target. Here, we present IFPTarget, a customized virtual target identification method, which uses an interaction fingerprinting (IFP) method for target-specific interaction analyses and a comprehensive index (Cvalue) for target ranking. Evaluation results indicate that the IFP method enables substantially improved binding pose prediction, and Cvalue has an excellent performance in target ranking for the test set. When applied to screen against our established target library that contains 11,863 protein structures covering 2842 unique targets, IFPTarget could retrieve known targets within the top-ranked list and identified new potential targets for chemically diverse drugs. IFPTarget prediction led to the identification of the metallo-β-lactamase VIM-2 as a target for quercetin as validated by enzymatic inhibition assays. This study provides a new in silico target identification tool and will aid future efforts to develop new target-customized methods for target identification.
DenguePredict: An Integrated Drug Repositioning Approach towards Drug Discovery for Dengue.
Wang, QuanQiu; Xu, Rong
2015-01-01
Dengue is a viral disease of expanding global incidence without cures. Here we present a drug repositioning system (DenguePredict) leveraging upon a unique drug treatment database and vast amounts of disease- and drug-related data. We first constructed a large-scale genetic disease network with enriched dengue genetics data curated from biomedical literature. We applied a network-based ranking algorithm to find dengue-related diseases from the disease network. We then developed a novel algorithm to prioritize FDA-approved drugs from dengue-related diseases to treat dengue. When tested in a de-novo validation setting, DenguePredict found the only two drugs tested in clinical trials for treating dengue and ranked them highly: chloroquine ranked at top 0.96% and ivermectin at top 22.75%. We showed that drugs targeting immune systems and arachidonic acid metabolism-related apoptotic pathways might represent innovative drugs to treat dengue. In summary, DenguePredict, by combining comprehensive disease- and drug-related data and novel algorithms, may greatly facilitate drug discovery for dengue.
Samudrala, Ram
2015-01-01
We have examined the effect of eight different protein classes (channels, GPCRs, kinases, ligases, nuclear receptors, proteases, phosphatases, transporters) on the benchmarking performance of the CANDO drug discovery and repurposing platform (http://protinfo.org/cando). The first version of the CANDO platform utilizes a matrix of predicted interactions between 48278 proteins and 3733 human ingestible compounds (including FDA approved drugs and supplements) that map to 2030 indications/diseases using a hierarchical chem and bio-informatic fragment based docking with dynamics protocol (> one billion predicted interactions considered). The platform uses similarity of compound-proteome interaction signatures as indicative of similar functional behavior and benchmarking accuracy is calculated across 1439 indications/diseases with more than one approved drug. The CANDO platform yields a significant correlation (0.99, p-value < 0.0001) between the number of proteins considered and benchmarking accuracy obtained indicating the importance of multitargeting for drug discovery. Average benchmarking accuracies range from 6.2 % to 7.6 % for the eight classes when the top 10 ranked compounds are considered, in contrast to a range of 5.5 % to 11.7 % obtained for the comparison/control sets consisting of 10, 100, 1000, and 10000 single best performing proteins. These results are generally two orders of magnitude better than the average accuracy of 0.2% obtained when randomly generated (fully scrambled) matrices are used. Different indications perform well when different classes are used but the best accuracies (up to 11.7% for the top 10 ranked compounds) are achieved when a combination of classes are used containing the broadest distribution of protein folds. Our results illustrate the utility of the CANDO approach and the consideration of different protein classes for devising indication specific protocols for drug repurposing as well as drug discovery. PMID:25694071
An introduction to web scale discovery systems.
Hoy, Matthew B
2012-01-01
This article explores the basic principles of web-scale discovery systems and how they are being implemented in libraries. "Web scale discovery" refers to a class of products that index a vast number of resources in a wide variety formats and allow users to search for content in the physical collection, print and electronic journal collections, and other resources from a single search box. Search results are displayed in a manner similar to Internet searches, in a relevance ranked list with links to online content. The advantages and disadvantages of these systems are discussed, and a list of popular discovery products is provided. A list of library websites with discovery systems currently implemented is also provided.
Morrison, John S; Nophsker, Michelle J; Haskell, Roy J
2014-10-01
A unique opportunity exists at the drug discovery stage to overcome inherently poor solubility by selecting drug candidates with superior supersaturation propensity. Existing supersaturation assays compare either precipitation-resistant or precipitation-inhibiting excipients, or higher-energy polymorphic forms, but not multiple compounds or multiple concentrations. Furthermore, these assays lack sufficient throughput and compound conservation necessary for implementation in the discovery environment. A microplate-based combination turbidity and supernatant concentration assay was therefore developed to determine the extent to which different compounds remain in solution as a function of applied concentration in biorelevant media over a specific period of time. Dimethyl sulfoxide stock solutions at multiple concentrations of four poorly soluble, weak base compounds (Dipyridamole, Ketoconazole, Albendazole, and Cinnarizine) were diluted with pH 6.5 buffer as well as FaSSIF. All samples were monitored for precipitation by turbidity at 600 nm over 1 h and the final supernatant concentrations were measured. The maximum supersaturation ratio was calculated from the supersaturation limit and the equilibrium solubility in each media. Compounds were rank-ordered by supersaturation ratio: Ketoconazole > Dipyridamole > Cinnarizine ∼ Albendazole. These in vitro results correlated well with oral AUC ratios from published in vivo pH effect studies, thereby confirming the validity of this approach. © 2014 Wiley Periodicals, Inc. and the American Pharmacists Association.
NASA Astrophysics Data System (ADS)
Costanzi, Stefano; Tikhonova, Irina G.; Harden, T. Kendall; Jacobson, Kenneth A.
2009-11-01
Accurate in silico models for the quantitative prediction of the activity of G protein-coupled receptor (GPCR) ligands would greatly facilitate the process of drug discovery and development. Several methodologies have been developed based on the properties of the ligands, the direct study of the receptor-ligand interactions, or a combination of both approaches. Ligand-based three-dimensional quantitative structure-activity relationships (3D-QSAR) techniques, not requiring knowledge of the receptor structure, have been historically the first to be applied to the prediction of the activity of GPCR ligands. They are generally endowed with robustness and good ranking ability; however they are highly dependent on training sets. Structure-based techniques generally do not provide the level of accuracy necessary to yield meaningful rankings when applied to GPCR homology models. However, they are essentially independent from training sets and have a sufficient level of accuracy to allow an effective discrimination between binders and nonbinders, thus qualifying as viable lead discovery tools. The combination of ligand and structure-based methodologies in the form of receptor-based 3D-QSAR and ligand and structure-based consensus models results in robust and accurate quantitative predictions. The contribution of the structure-based component to these combined approaches is expected to become more substantial and effective in the future, as more sophisticated scoring functions are developed and more detailed structural information on GPCRs is gathered.
Phenome-driven disease genetics prediction toward drug discovery
Chen, Yang; Li, Li; Zhang, Guo-Qiang; Xu, Rong
2015-01-01
Motivation: Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data. Results: To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e−4) and 81.3% (P < e−12) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn’s disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn’s disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn’s disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery. Availability and implementation: nlp.case.edu/public/data/DMN Contact: rxx@case.edu PMID:26072493
Wang, Bing; Westerhoff, Lance M.; Merz, Kenneth M.
2008-01-01
We have generated docking poses for the FKBP-GPI complex using eight docking programs, and compared their scoring functions with scoring based on NMR chemical shift perturbations (NMRScore). Because the chemical shift perturbation (CSP) is exquisitely sensitive on the orientation of ligand inside the binding pocket, NMRScore offers an accurate and straightforward approach to score different poses. All scoring functions were inspected by their abilities to highly rank the native-like structures and separate them from decoy poses generated for a protein-ligand complex. The overall performance of NMRScore is much better than that of energy-based scoring functions associated with docking programs in both aspects. In summary, we find that the combination of docking programs with NMRScore results in an approach that can robustly determine the binding site structure for a protein-ligand complex, thereby, providing a new tool facilitating the structure-based drug discovery process. PMID:17867664
Dewhurst, Henry M; Choudhury, Shilpa; Torres, Matthew P
2015-08-01
Predicting the biological function potential of post-translational modifications (PTMs) is becoming increasingly important in light of the exponential increase in available PTM data from high-throughput proteomics. We developed structural analysis of PTM hotspots (SAPH-ire)--a quantitative PTM ranking method that integrates experimental PTM observations, sequence conservation, protein structure, and interaction data to allow rank order comparisons within or between protein families. Here, we applied SAPH-ire to the study of PTMs in diverse G protein families, a conserved and ubiquitous class of proteins essential for maintenance of intracellular structure (tubulins) and signal transduction (large and small Ras-like G proteins). A total of 1728 experimentally verified PTMs from eight unique G protein families were clustered into 451 unique hotspots, 51 of which have a known and cited biological function or response. Using customized software, the hotspots were analyzed in the context of 598 unique protein structures. By comparing distributions of hotspots with known versus unknown function, we show that SAPH-ire analysis is predictive for PTM biological function. Notably, SAPH-ire revealed high-ranking hotspots for which a functional impact has not yet been determined, including phosphorylation hotspots in the N-terminal tails of G protein gamma subunits--conserved protein structures never before reported as regulators of G protein coupled receptor signaling. To validate this prediction we used the yeast model system for G protein coupled receptor signaling, revealing that gamma subunit-N-terminal tail phosphorylation is activated in response to G protein coupled receptor stimulation and regulates protein stability in vivo. These results demonstrate the utility of integrating protein structural and sequence features into PTM prioritization schemes that can improve the analysis and functional power of modification-specific proteomics data. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
Quantum probability ranking principle for ligand-based virtual screening.
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal
2017-04-01
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Quantum probability ranking principle for ligand-based virtual screening
NASA Astrophysics Data System (ADS)
Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal
2017-04-01
Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.
Zhang, Changsheng; Tang, Bo; Wang, Qian; Lai, Luhua
2014-10-01
Target structure-based virtual screening, which employs protein-small molecule docking to identify potential ligands, has been widely used in small-molecule drug discovery. In the present study, we used a protein-protein docking program to identify proteins that bind to a specific target protein. In the testing phase, an all-to-all protein-protein docking run on a large dataset was performed. The three-dimensional rigid docking program SDOCK was used to examine protein-protein docking on all protein pairs in the dataset. Both the binding affinity and features of the binding energy landscape were considered in the scoring function in order to distinguish positive binding pairs from negative binding pairs. Thus, the lowest docking score, the average Z-score, and convergency of the low-score solutions were incorporated in the analysis. The hybrid scoring function was optimized in the all-to-all docking test. The docking method and the hybrid scoring function were then used to screen for proteins that bind to tumor necrosis factor-α (TNFα), which is a well-known therapeutic target for rheumatoid arthritis and other autoimmune diseases. A protein library containing 677 proteins was used for the screen. Proteins with scores among the top 20% were further examined. Sixteen proteins from the top-ranking 67 proteins were selected for experimental study. Two of these proteins showed significant binding to TNFα in an in vitro binding study. The results of the present study demonstrate the power and potential application of protein-protein docking for the discovery of novel binding proteins for specific protein targets. © 2014 Wiley Periodicals, Inc.
POWER-ENHANCED MULTIPLE DECISION FUNCTIONS CONTROLLING FAMILY-WISE ERROR AND FALSE DISCOVERY RATES.
Peña, Edsel A; Habiger, Joshua D; Wu, Wensong
2011-02-01
Improved procedures, in terms of smaller missed discovery rates (MDR), for performing multiple hypotheses testing with weak and strong control of the family-wise error rate (FWER) or the false discovery rate (FDR) are developed and studied. The improvement over existing procedures such as the Šidák procedure for FWER control and the Benjamini-Hochberg (BH) procedure for FDR control is achieved by exploiting possible differences in the powers of the individual tests. Results signal the need to take into account the powers of the individual tests and to have multiple hypotheses decision functions which are not limited to simply using the individual p -values, as is the case, for example, with the Šidák, Bonferroni, or BH procedures. They also enhance understanding of the role of the powers of individual tests, or more precisely the receiver operating characteristic (ROC) functions of decision processes, in the search for better multiple hypotheses testing procedures. A decision-theoretic framework is utilized, and through auxiliary randomizers the procedures could be used with discrete or mixed-type data or with rank-based nonparametric tests. This is in contrast to existing p -value based procedures whose theoretical validity is contingent on each of these p -value statistics being stochastically equal to or greater than a standard uniform variable under the null hypothesis. Proposed procedures are relevant in the analysis of high-dimensional "large M , small n " data sets arising in the natural, physical, medical, economic and social sciences, whose generation and creation is accelerated by advances in high-throughput technology, notably, but not limited to, microarray technology.
Dominance-based ranking functions for interval-valued intuitionistic fuzzy sets.
Chen, Liang-Hsuan; Tu, Chien-Cheng
2014-08-01
The ranking of interval-valued intuitionistic fuzzy sets (IvIFSs) is difficult since they include the interval values of membership and nonmembership. This paper proposes ranking functions for IvIFSs based on the dominance concept. The proposed ranking functions consider the degree to which an IvIFS dominates and is not dominated by other IvIFSs. Based on the bivariate framework and the dominance concept, the functions incorporate not only the boundary values of membership and nonmembership, but also the relative relations among IvIFSs in comparisons. The dominance-based ranking functions include bipolar evaluations with a parameter that allows the decision-maker to reflect his actual attitude in allocating the various kinds of dominance. The relationship for two IvIFSs that satisfy the dual couple is defined based on four proposed ranking functions. Importantly, the proposed ranking functions can achieve a full ranking for all IvIFSs. Two examples are used to demonstrate the applicability and distinctiveness of the proposed ranking functions.
Zhao, Zhongming; Guo, An-Yuan; van den Oord, Edwin J C G; Aliev, Fazil; Jia, Peilin; Edenberg, Howard J; Riley, Brien P; Dick, Danielle M; Bettinger, Jill C; Davies, Andrew G; Grotewiel, Michael S; Schuckit, Marc A; Agrawal, Arpana; Kramer, John; Nurnberger, John I; Kendler, Kenneth S; Webb, Bradley T; Miles, Michael F
2012-01-01
A variety of species and experimental designs have been used to study genetic influences on alcohol dependence, ethanol response, and related traits. Integration of these heterogeneous data can be used to produce a ranked target gene list for additional investigation. In this study, we performed a unique multi-species evidence-based data integration using three microarray experiments in mice or humans that generated an initial alcohol dependence (AD) related genes list, human linkage and association results, and gene sets implicated in C. elegans and Drosophila. We then used permutation and false discovery rate (FDR) analyses on the genome-wide association studies (GWAS) dataset from the Collaborative Study on the Genetics of Alcoholism (COGA) to evaluate the ranking results and weighting matrices. We found one weighting score matrix could increase FDR based q-values for a list of 47 genes with a score greater than 2. Our follow up functional enrichment tests revealed these genes were primarily involved in brain responses to ethanol and neural adaptations occurring with alcoholism. These results, along with our experimental validation of specific genes in mice, C. elegans and Drosophila, suggest that a cross-species evidence-based approach is useful to identify candidate genes contributing to alcoholism.
Computer-aided discovery of a metal-organic framework with superior oxygen uptake.
Moghadam, Peyman Z; Islamoglu, Timur; Goswami, Subhadip; Exley, Jason; Fantham, Marcus; Kaminski, Clemens F; Snurr, Randall Q; Farha, Omar K; Fairen-Jimenez, David
2018-04-11
Current advances in materials science have resulted in the rapid emergence of thousands of functional adsorbent materials in recent years. This clearly creates multiple opportunities for their potential application, but it also creates the following challenge: how does one identify the most promising structures, among the thousands of possibilities, for a particular application? Here, we present a case of computer-aided material discovery, in which we complete the full cycle from computational screening of metal-organic framework materials for oxygen storage, to identification, synthesis and measurement of oxygen adsorption in the top-ranked structure. We introduce an interactive visualization concept to analyze over 1000 unique structure-property plots in five dimensions and delimit the relationships between structural properties and oxygen adsorption performance at different pressures for 2932 already-synthesized structures. We also report a world-record holding material for oxygen storage, UMCM-152, which delivers 22.5% more oxygen than the best known material to date, to the best of our knowledge.
AlloPred: prediction of allosteric pockets on proteins using normal mode perturbation analysis.
Greener, Joe G; Sternberg, Michael J E
2015-10-23
Despite being hugely important in biological processes, allostery is poorly understood and no universal mechanism has been discovered. Allosteric drugs are a largely unexplored prospect with many potential advantages over orthosteric drugs. Computational methods to predict allosteric sites on proteins are needed to aid the discovery of allosteric drugs, as well as to advance our fundamental understanding of allostery. AlloPred, a novel method to predict allosteric pockets on proteins, was developed. AlloPred uses perturbation of normal modes alongside pocket descriptors in a machine learning approach that ranks the pockets on a protein. AlloPred ranked an allosteric pocket top for 23 out of 40 known allosteric proteins, showing comparable and complementary performance to two existing methods. In 28 of 40 cases an allosteric pocket was ranked first or second. The AlloPred web server, freely available at http://www.sbg.bio.ic.ac.uk/allopred/home, allows visualisation and analysis of predictions. The source code and dataset information are also available from this site. Perturbation of normal modes can enhance our ability to predict allosteric sites on proteins. Computational methods such as AlloPred assist drug discovery efforts by suggesting sites on proteins for further experimental study.
Nilsson, Ingemar; Polla, Magnus O
2012-10-01
Drug design is a multi-parameter task present in the analysis of experimental data for synthesized compounds and in the prediction of new compounds with desired properties. This article describes the implementation of a binned scoring and composite ranking scheme for 11 experimental parameters that were identified as key drivers in the MC4R project. The composite ranking scheme was implemented in an AstraZeneca tool for analysis of project data, thereby providing an immediate re-ranking as new experimental data was added. The automated ranking also highlighted compounds overlooked by the project team. The successful implementation of a composite ranking on experimental data led to the development of an equivalent virtual score, which was based on Free-Wilson models of the parameters from the experimental ranking. The individual Free-Wilson models showed good to high predictive power with a correlation coefficient between 0.45 and 0.97 based on the external test set. The virtual ranking adds value to the selection of compounds for synthesis but error propagation must be controlled. The experimental ranking approach adds significant value, is parameter independent and can be tuned and applied to any drug discovery project.
AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs.
Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael
2017-06-15
Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Accurate phylogenetic classification of DNA fragments based onsequence composition
DOE Office of Scientific and Technical Information (OSTI.GOV)
McHardy, Alice C.; Garcia Martin, Hector; Tsirigos, Aristotelis
2006-05-01
Metagenome studies have retrieved vast amounts of sequenceout of a variety of environments, leading to novel discoveries and greatinsights into the uncultured microbial world. Except for very simplecommunities, diversity makes sequence assembly and analysis a verychallenging problem. To understand the structure a 5 nd function ofmicrobial communities, a taxonomic characterization of the obtainedsequence fragments is highly desirable, yet currently limited mostly tothose sequences that contain phylogenetic marker genes. We show that forclades at the rank of domain down to genus, sequence composition allowsthe very accurate phylogenetic 10 characterization of genomic sequence.We developed a composition-based classifier, PhyloPythia, for de novophylogenetic sequencemore » characterization and have trained it on adata setof 340 genomes. By extensive evaluation experiments we show that themethodis accurate across all taxonomic ranks considered, even forsequences that originate fromnovel organisms and are as short as 1kb.Application to two metagenome datasets 15 obtained from samples ofphosphorus-removing sludge showed that the method allows the accurateclassification at genus level of most sequence fragments from thedominant populations, while at the same time correctly characterizingeven larger parts of the samples at higher taxonomic levels.« less
Solution NMR Spectroscopy in Target-Based Drug Discovery.
Li, Yan; Kang, Congbao
2017-08-23
Solution NMR spectroscopy is a powerful tool to study protein structures and dynamics under physiological conditions. This technique is particularly useful in target-based drug discovery projects as it provides protein-ligand binding information in solution. Accumulated studies have shown that NMR will play more and more important roles in multiple steps of the drug discovery process. In a fragment-based drug discovery process, ligand-observed and protein-observed NMR spectroscopy can be applied to screen fragments with low binding affinities. The screened fragments can be further optimized into drug-like molecules. In combination with other biophysical techniques, NMR will guide structure-based drug discovery. In this review, we describe the possible roles of NMR spectroscopy in drug discovery. We also illustrate the challenges encountered in the drug discovery process. We include several examples demonstrating the roles of NMR in target-based drug discoveries such as hit identification, ranking ligand binding affinities, and mapping the ligand binding site. We also speculate the possible roles of NMR in target engagement based on recent processes in in-cell NMR spectroscopy.
Beyond Zipf's Law: The Lavalette Rank Function and Its Properties.
Fontanelli, Oscar; Miramontes, Pedro; Yang, Yaning; Cocho, Germinal; Li, Wentian
Although Zipf's law is widespread in natural and social data, one often encounters situations where one or both ends of the ranked data deviate from the power-law function. Previously we proposed the Beta rank function to improve the fitting of data which does not follow a perfect Zipf's law. Here we show that when the two parameters in the Beta rank function have the same value, the Lavalette rank function, the probability density function can be derived analytically. We also show both computationally and analytically that Lavalette distribution is approximately equal, though not identical, to the lognormal distribution. We illustrate the utility of Lavalette rank function in several datasets. We also address three analysis issues on the statistical testing of Lavalette fitting function, comparison between Zipf's law and lognormal distribution through Lavalette function, and comparison between lognormal distribution and Lavalette distribution.
Target Fishing for Chemical Compounds using Target-Ligand Activity data and Ranking based Methods
Wale, Nikil; Karypis, George
2009-01-01
In recent years the development of computational techniques that identify all the likely targets for a given chemical compound, also termed as the problem of Target Fishing, has been an active area of research. Identification of likely targets of a chemical compound helps to understand problems such as toxicity, lack of efficacy in humans, and poor physical properties associated with that compound in the early stages of drug discovery. In this paper we present a set of techniques whose goal is to rank or prioritize targets in the context of a given chemical compound such that most targets that this compound may show activity against appear higher in the ranked list. These methods are based on our extensions to the SVM and Ranking Perceptron algorithms for this problem. Our extensive experimental study shows that the methods developed in this work outperform previous approaches by 2% to 60% under different evaluation criterions. PMID:19764745
Interval-Valued Rank in Finite Ordered Sets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff; Pogel, Alex; Purvine, Emilie
We consider the concept of rank as a measure of the vertical levels and positions of elements of partially ordered sets (posets). We are motivated by the need for algorithmic measures on large, real-world hierarchically-structured data objects like the semantic hierarchies of ontolog- ical databases. These rarely satisfy the strong property of gradedness, which is required for traditional rank functions to exist. Representing such semantic hierarchies as finite, bounded posets, we recognize the duality of ordered structures to motivate rank functions which respect verticality both from the bottom and from the top. Our rank functions are thus interval-valued, and alwaysmore » exist, even for non-graded posets, providing order homomorphisms to an interval order on the interval-valued ranks. The concept of rank width arises naturally, allowing us to identify the poset region with point-valued width as its longest graded portion (which we call the “spindle”). A standard interval rank function is naturally motivated both in terms of its extremality and on pragmatic grounds. Its properties are examined, including the relation- ship to traditional grading and rank functions, and methods to assess comparisons of standard interval-valued ranks.« less
Fleeman, Renee; LaVoi, Travis M; Santos, Radleigh G; Morales, Angela; Nefzi, Adel; Welmaker, Gregory S; Medina-Franco, José L; Giulianotti, Marc A; Houghten, Richard A; Shaw, Lindsey N
2015-04-23
Mixture based synthetic combinatorial libraries offer a tremendous enhancement for the rate of drug discovery, allowing the activity of millions of compounds to be assessed through the testing of exponentially fewer samples. In this study, we used a scaffold-ranking library to screen 37 different libraries for antibacterial activity against the ESKAPE pathogens. Each library contained between 10000 and 750000 structural analogues for a total of >6 million compounds. From this, we identified a bis-cyclic guanidine library that displayed strong antibacterial activity. A positional scanning library for these compounds was developed and used to identify the most effective functional groups at each variant position. Individual compounds were synthesized that were broadly active against all ESKAPE organisms at concentrations <2 μM. In addition, these compounds were bactericidal, had antibiofilm effects, showed limited potential for the development of resistance, and displayed almost no toxicity when tested against human lung cells and erythrocytes. Using a murine model of peritonitis, we also demonstrate that these agents are highly efficacious in vivo.
Empirical and Theoretical Bases of Zipf's Law.
ERIC Educational Resources Information Center
Wyllys, Ronald E.
1981-01-01
Explains Zipf's Law of Vocabulary Distribution (i.e., relationship between frequency of a word in a corpus and its rank), noting the discovery of the law, alternative forms, and literature relating to the search for a rationale for Zipf's Law. Thirty-eight references are cited. (EJS)
Investigator profile. An interview with Russell D. Fernald, Ph.D. Interview by Vicki Glaser.
Fernald, Russell D
2006-01-01
Russell D. Fernald, Ph.D., is a Professor of Biological Sciences and the Benjamin Scott Crocker Professor in Human Biology at Stanford University (California). He received his Bachelor's degree from Swarthmore College (Swarthmore, PA) and his Ph.D. from the University of Pennsylvania (Philadelphia). Dr. Fernald completed a postdoctoral fellowship with Dr. O. Creutzfeldt at the Max-Planck-Institute for Psychiatry, in Munich, Germany, and a postdoctoral fellowship with Dr. Konrad Lorenz at the Max-Planck-Institute for Behavioral Physiology. In 2004 he shared the Rank Prize for discoveries about lens function. Dr. Fernald's lab uses an African cichlid fish species to study how social experience influences the brain and how retinal progenitor cell division and differentiation are controlled.
Dissecting Orthosteric Contacts for a Reverse-Fragment-Based Ligand Design.
Chandramohan, Arun; Tulsian, Nikhil K; Anand, Ganesh S
2017-08-01
Orthosteric sites on proteins are formed typically from noncontiguous interacting sites in three-dimensional space where the composite binding interaction of a biological ligand is mediated by multiple synergistic interactions of its constituent functional groups. Through these multiple interactions, ligands stabilize both the ligand binding site and the local secondary structure. However, relative energetic contributions of the individual contacts in these protein-ligand interactions are difficult to resolve. Deconvolution of the contributions of these various functional groups in natural inhibitors/ligand would greatly aid in iterative fragment-based drug discovery (FBDD). In this study, we describe an approach of progressive unfolding of a target protein using a gradient of denaturant urea to reveal the individual energetic contributions of various ligand-functional groups to the affinity of the entire ligand. Through calibrated unfolding of two protein-ligand systems: cAMP-bound regulatory subunit of Protein Kinase A (RIα) and IBMX-bound phosphodiesterase8 (PDE8), monitored by amide hydrogen-deuterium exchange mass spectrometry, we show progressive disruption of individual orthosteric contacts in the ligand binding sites, allowing us to rank the energetic contributions of these individual interactions. In the two cAMP-binding sites of RIα, exocyclic phosphate oxygens of cAMP were identified to mediate stronger interactions than ribose 2'-OH in both the RIα-cAMP binding interfaces. Further, we have also ranked the relative contributions of the different functional groups of IBMX based on their interactions with the orthosteric residues of PDE8. This strategy for deconstruction of individual binding sites and identification of the strongest functional group interaction in enzyme orthosteric sites offers a rational starting point for FBDD.
A Metadata based Knowledge Discovery Methodology for Seeding Translational Research.
Kothari, Cartik R; Payne, Philip R O
2015-01-01
In this paper, we present a semantic, metadata based knowledge discovery methodology for identifying teams of researchers from diverse backgrounds who can collaborate on interdisciplinary research projects: projects in areas that have been identified as high-impact areas at The Ohio State University. This methodology involves the semantic annotation of keywords and the postulation of semantic metrics to improve the efficiency of the path exploration algorithm as well as to rank the results. Results indicate that our methodology can discover groups of experts from diverse areas who can collaborate on translational research projects.
NASA Astrophysics Data System (ADS)
Leclerc, Arnaud; Thomas, Phillip S.; Carrington, Tucker
2017-08-01
Vibrational spectra and wavefunctions of polyatomic molecules can be calculated at low memory cost using low-rank sum-of-product (SOP) decompositions to represent basis functions generated using an iterative eigensolver. Using a SOP tensor format does not determine the iterative eigensolver. The choice of the interative eigensolver is limited by the need to restrict the rank of the SOP basis functions at every stage of the calculation. We have adapted, implemented and compared different reduced-rank algorithms based on standard iterative methods (block-Davidson algorithm, Chebyshev iteration) to calculate vibrational energy levels and wavefunctions of the 12-dimensional acetonitrile molecule. The effect of using low-rank SOP basis functions on the different methods is analysed and the numerical results are compared with those obtained with the reduced rank block power method. Relative merits of the different algorithms are presented, showing that the advantage of using a more sophisticated method, although mitigated by the use of reduced-rank SOP functions, is noticeable in terms of CPU time.
Functions of the cellular prion protein, the end of Moore's law, and Ockham's razor theory.
del Río, José A; Gavín, Rosalina
2016-01-01
Since its discovery the cellular prion protein (encoded by the Prnp gene) has been associated with a large number of functions. The proposed functions rank from basic cellular processes such as cell cycle and survival to neural functions such as behavior and neuroprotection, following a pattern similar to that of Moore's law for electronics. In addition, particular interest is increasing in the participation of Prnp in neurodegeneration. However, in recent years a redefinition of these functions has begun, since examples of previously attributed functions were increasingly re-associated with other proteins. Most of these functions are linked to so-called "Prnp-flanking genes" that are close to the genomic locus of Prnp and which are present in the genome of some Prnp mouse models. In addition, their role in neuroprotection against convulsive insults has been confirmed in recent studies. Lastly, in recent years a large number of models indicating the participation of different domains of the protein in apoptosis have been uncovered. However, after more than 10 years of molecular dissection our view is that the simplest mechanistic model in PrP(C)-mediated cell death should be considered, as Ockham's razor theory suggested.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kerns, Sarah L.; Departments of Pathology and Genetics, Albert Einstein College of Medicine, Bronx, New York; Stock, Richard
2013-01-01
Purpose: To identify single nucleotide polymorphisms (SNPs) associated with development of erectile dysfunction (ED) among prostate cancer patients treated with radiation therapy. Methods and Materials: A 2-stage genome-wide association study was performed. Patients were split randomly into a stage I discovery cohort (132 cases, 103 controls) and a stage II replication cohort (128 cases, 102 controls). The discovery cohort was genotyped using Affymetrix 6.0 genome-wide arrays. The 940 top ranking SNPs selected from the discovery cohort were genotyped in the replication cohort using Illumina iSelect custom SNP arrays. Results: Twelve SNPs identified in the discovery cohort and validated in themore » replication cohort were associated with development of ED following radiation therapy (Fisher combined P values 2.1 Multiplication-Sign 10{sup -5} to 6.2 Multiplication-Sign 10{sup -4}). Notably, these 12 SNPs lie in or near genes involved in erectile function or other normal cellular functions (adhesion and signaling) rather than DNA damage repair. In a multivariable model including nongenetic risk factors, the odds ratios for these SNPs ranged from 1.6 to 5.6 in the pooled cohort. There was a striking relationship between the cumulative number of SNP risk alleles an individual possessed and ED status (Sommers' D P value = 1.7 Multiplication-Sign 10{sup -29}). A 1-allele increase in cumulative SNP score increased the odds for developing ED by a factor of 2.2 (P value = 2.1 Multiplication-Sign 10{sup -19}). The cumulative SNP score model had a sensitivity of 84% and specificity of 75% for prediction of developing ED at the radiation therapy planning stage. Conclusions: This genome-wide association study identified a set of SNPs that are associated with development of ED following radiation therapy. These candidate genetic predictors warrant more definitive validation in an independent cohort.« less
Lessons from hot spot analysis for fragment-based drug discovery
Hall, David R.; Vajda, Sandor
2015-01-01
Analysis of binding energy hot spots at protein surfaces can provide crucial insights into the prospects for successful application of fragment-based drug discovery (FBDD), and whether a fragment hit can be advanced into a high affinity, druglike ligand. The key factor is the strength of the top ranking hot spot, and how well a given fragment complements it. We show that published data are sufficient to provide a sophisticated and quantitative understanding of how hot spots derive from protein three-dimensional structure, and how their strength, number and spatial arrangement govern the potential for a surface site to bind to fragment-sized and larger ligands. This improved understanding provides important guidance for the effective application of FBDD in drug discovery. PMID:26538314
Yager’s ranking method for solving the trapezoidal fuzzy number linear programming
NASA Astrophysics Data System (ADS)
Karyati; Wutsqa, D. U.; Insani, N.
2018-03-01
In the previous research, the authors have studied the fuzzy simplex method for trapezoidal fuzzy number linear programming based on the Maleki’s ranking function. We have found some theories related to the term conditions for the optimum solution of fuzzy simplex method, the fuzzy Big-M method, the fuzzy two-phase method, and the sensitivity analysis. In this research, we study about the fuzzy simplex method based on the other ranking function. It is called Yager's ranking function. In this case, we investigate the optimum term conditions. Based on the result of research, it is found that Yager’s ranking function is not like Maleki’s ranking function. Using the Yager’s function, the simplex method cannot work as well as when using the Maleki’s function. By using the Yager’s function, the value of the subtraction of two equal fuzzy numbers is not equal to zero. This condition makes the optimum table of the fuzzy simplex table is undetected. As a result, the simplified fuzzy simplex table becomes stopped and does not reach the optimum solution.
Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs
NASA Astrophysics Data System (ADS)
Alias, Christophe; Darte, Alain; Feautrier, Paul; Gonnord, Laure
Proving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Previous algorithms based on affine rankings either are applicable only to simple loops (i.e., single-node flowcharts) and rely on enumeration, or are not complete in the sense that they are not guaranteed to find a ranking in the class of functions they consider, if one exists. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also show the links and differences with previous techniques based on the insertion of counters.
NASA Astrophysics Data System (ADS)
Lee, K. J.; Stovall, K.; Jenet, F. A.; Martinez, J.; Dartez, L. P.; Mata, A.; Lunsford, G.; Cohen, S.; Biwer, C. M.; Rohr, M.; Flanigan, J.; Walker, A.; Banaszak, S.; Allen, B.; Barr, E. D.; Bhat, N. D. R.; Bogdanov, S.; Brazier, A.; Camilo, F.; Champion, D. J.; Chatterjee, S.; Cordes, J.; Crawford, F.; Deneva, J.; Desvignes, G.; Ferdman, R. D.; Freire, P.; Hessels, J. W. T.; Karuppusamy, R.; Kaspi, V. M.; Knispel, B.; Kramer, M.; Lazarus, P.; Lynch, R.; Lyne, A.; McLaughlin, M.; Ransom, S.; Scholz, P.; Siemens, X.; Spitler, L.; Stairs, I.; Tan, M.; van Leeuwen, J.; Zhu, W. W.
2013-07-01
Modern radio pulsar surveys produce a large volume of prospective candidates, the majority of which are polluted by human-created radio frequency interference or other forms of noise. Typically, large numbers of candidates need to be visually inspected in order to determine if they are real pulsars. This process can be labour intensive. In this paper, we introduce an algorithm called Pulsar Evaluation Algorithm for Candidate Extraction (PEACE) which improves the efficiency of identifying pulsar signals. The algorithm ranks the candidates based on a score function. Unlike popular machine-learning-based algorithms, no prior training data sets are required. This algorithm has been applied to data from several large-scale radio pulsar surveys. Using the human-based ranking results generated by students in the Arecibo Remote Command Center programme, the statistical performance of PEACE was evaluated. It was found that PEACE ranked 68 per cent of the student-identified pulsars within the top 0.17 per cent of sorted candidates, 95 per cent within the top 0.34 per cent and 100 per cent within the top 3.7 per cent. This clearly demonstrates that PEACE significantly increases the pulsar identification rate by a factor of about 50 to 1000. To date, PEACE has been directly responsible for the discovery of 47 new pulsars, 5 of which are millisecond pulsars that may be useful for pulsar timing based gravitational-wave detection projects.
Huang, Zirui; Davis, Henry Hap; Wolff, Annemarie; Northoff, Georg
2017-01-01
Brain plasticity studies have shown functional reorganization in participants with outstanding motor expertise. Little is known about neural plasticity associated with exceptionally long motor training or of its predictive value for motor performance excellence. The present study utilised resting-state functional magnetic resonance imaging (rs-fMRI) in a unique sample of world-class athletes: Olympic, elite, and internationally ranked swimmers ( n = 30). Their world ranking ranged from 1st to 250th: each had prepared for participation in the Olympic Games. Combining rs-fMRI graph-theoretical and seed-based functional connectivity analyses, it was discovered that the thalamus has its strongest connections with the sensorimotor network in elite swimmers with the highest world rankings (career best rank: 1-35). Strikingly, thalamo-sensorimotor functional connections were highly correlated with the swimmers' motor performance excellence, that is, accounting for 41% of the individual variance in best world ranking. Our findings shed light on neural correlates of long-term athletic performance involving thalamo-sensorimotor functional circuits.
Cross-organism learning method to discover new gene functionalities.
Domeniconi, Giacomo; Masseroli, Marco; Moro, Gianluca; Pinoli, Pietro
2016-04-01
Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount. Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism. We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Statistical Optimality in Multipartite Ranking and Ordinal Regression.
Uematsu, Kazuki; Lee, Yoonkyung
2015-05-01
Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.
Globalization and changing trends of biomedical research output.
Conte, Marisa L; Liu, Jing; Schnell, Santiago; Omary, M Bishr
2017-06-15
The US continues to lead the world in research and development (R&D) expenditures, but there is concern that stagnation in federal support for biomedical research in the US could undermine the leading role the US has played in biomedical and clinical research discoveries. As a readout of research output in the US compared with other countries, assessment of original research articles published by US-based authors in ten clinical and basic science journals during 2000 to 2015 showed a steady decline of articles in high-ranking journals or no significant change in mid-ranking journals. In contrast, publication output originating from China-based investigators, in both high- and mid-ranking journals, has steadily increased commensurate with significant growth in R&D expenditures. These observations support the current concerns of stagnant and year-to-year uncertainty in US federal funding of biomedical research.
Analyzing compound and project progress through multi-objective-based compound quality assessment.
Nissink, J Willem M; Degorce, Sébastien
2013-05-01
Compound-quality scoring methods designed to evaluate multiple drug properties concurrently are useful to analyze and prioritize output from drug-design efforts. However, formalized multiparameter optimization approaches are not widely used in drug design. We rank molecules synthesized in drug-discovery projects using simple and aggregated desirability functions reflecting medicinal chemistry 'rules'. Our quality score deals transparently with missing data, a key requirement in drug-hunting projects where data availability is often limited. We further estimate confidence in the interpretation of such a compound-quality measure. Scores and associated confidences provide systematic insight in the quality of emerging chemical equity. Tracking quality of synthetic output over time yields valuable insight into the progress of drug-design teams, with potential applications in risk and resource management of a drug portfolio.
Rational discovery of dengue type 2 non-competitive inhibitors.
Heh, Choon H; Othman, Rozana; Buckle, Michael J C; Sharifuddin, Yusrizam; Yusof, Rohana; Rahman, Noorsaadah A
2013-07-01
Various works have been carried out in developing therapeutics against dengue. However, to date, no effective vaccine or anti-dengue agent has yet been discovered. The development of protease inhibitors is considered as a promising option, but most previous works have involved competitive inhibition. In this study, we focused on rational discovery of potential anti-dengue agents based on non-competitive inhibition of DEN-2 NS2B/NS3 protease. A homology model of the DEN-2 NS2B/NS3 protease (using West Nile Virus NS2B/NS3 protease complex, 2FP7, as the template) was used as the target, and pinostrobin, a flavanone, was used as the standard ligand. Virtual screening was performed involving a total of 13 341 small compounds, with the backbone structures of chalcone, flavanone, and flavone, available in the ZINC database. Ranking of the resulting compounds yielded compounds with higher binding affinities compared with the standard ligand. Inhibition assay of the selected top-ranking compounds against DEN-2 NS2B/NS3 proteolytic activity resulted in significantly better inhibition compared with the standard and correlated well with in silico results. In conclusion, via this rational discovery technique, better inhibitors were identified. This method can be used in further work to discover lead compounds for anti-dengue agents. © 2013 John Wiley & Sons A/S.
SortNet: learning to rank by a neural preference function.
Rigutini, Leonardo; Papini, Tiziano; Maggini, Marco; Scarselli, Franco
2011-09-01
Relevance ranking consists in sorting a set of objects with respect to a given criterion. However, in personalized retrieval systems, the relevance criteria may usually vary among different users and may not be predefined. In this case, ranking algorithms that adapt their behavior from users' feedbacks must be devised. Two main approaches are proposed in the literature for learning to rank: the use of a scoring function, learned by examples, that evaluates a feature-based representation of each object yielding an absolute relevance score, a pairwise approach, where a preference function is learned to determine the object that has to be ranked first in a given pair. In this paper, we present a preference learning method for learning to rank. A neural network, the comparative neural network (CmpNN), is trained from examples to approximate the comparison function for a pair of objects. The CmpNN adopts a particular architecture designed to implement the symmetries naturally present in a preference function. The learned preference function can be embedded as the comparator into a classical sorting algorithm to provide a global ranking of a set of objects. To improve the ranking performances, an active-learning procedure is devised, that aims at selecting the most informative patterns in the training set. The proposed algorithm is evaluated on the LETOR dataset showing promising performances in comparison with other state-of-the-art algorithms.
Lessons from Hot Spot Analysis for Fragment-Based Drug Discovery.
Hall, David R; Kozakov, Dima; Whitty, Adrian; Vajda, Sandor
2015-11-01
Analysis of binding energy hot spots at protein surfaces can provide crucial insights into the prospects for successful application of fragment-based drug discovery (FBDD), and whether a fragment hit can be advanced into a high-affinity, drug-like ligand. The key factor is the strength of the top ranking hot spot, and how well a given fragment complements it. We show that published data are sufficient to provide a sophisticated and quantitative understanding of how hot spots derive from a protein 3D structure, and how their strength, number, and spatial arrangement govern the potential for a surface site to bind to fragment-sized and larger ligands. This improved understanding provides important guidance for the effective application of FBDD in drug discovery. Copyright © 2015 Elsevier Ltd. All rights reserved.
40 CFR 304.30 - Filing of pleadings.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 27 2010-07-01 2010-07-01 false Filing of pleadings. 304.30 Section... CLAIMS Hearings Before the Arbitrator § 304.30 Filing of pleadings. (a) Discovery shall be in accordance... nature of the substances contributed to the facility by each identified PRP, and a ranking by volume of...
Korotcov, Alexandru; Tkachenko, Valery; Russo, Daniel P; Ekins, Sean
2017-12-04
Machine learning methods have been applied to many data sets in pharmaceutical research for several decades. The relative ease and availability of fingerprint type molecular descriptors paired with Bayesian methods resulted in the widespread use of this approach for a diverse array of end points relevant to drug discovery. Deep learning is the latest machine learning algorithm attracting attention for many of pharmaceutical applications from docking to virtual screening. Deep learning is based on an artificial neural network with multiple hidden layers and has found considerable traction for many artificial intelligence applications. We have previously suggested the need for a comparison of different machine learning methods with deep learning across an array of varying data sets that is applicable to pharmaceutical research. End points relevant to pharmaceutical research include absorption, distribution, metabolism, excretion, and toxicity (ADME/Tox) properties, as well as activity against pathogens and drug discovery data sets. In this study, we have used data sets for solubility, probe-likeness, hERG, KCNQ1, bubonic plague, Chagas, tuberculosis, and malaria to compare different machine learning methods using FCFP6 fingerprints. These data sets represent whole cell screens, individual proteins, physicochemical properties as well as a data set with a complex end point. Our aim was to assess whether deep learning offered any improvement in testing when assessed using an array of metrics including AUC, F1 score, Cohen's kappa, Matthews correlation coefficient and others. Based on ranked normalized scores for the metrics or data sets Deep Neural Networks (DNN) ranked higher than SVM, which in turn was ranked higher than all the other machine learning methods. Visualizing these properties for training and test sets using radar type plots indicates when models are inferior or perhaps over trained. These results also suggest the need for assessing deep learning further using multiple metrics with much larger scale comparisons, prospective testing as well as assessment of different fingerprints and DNN architectures beyond those used.
On the Effect of Group Structures on Ranking Strategies in Folksonomies
NASA Astrophysics Data System (ADS)
Abel, Fabian; Henze, Nicola; Krause, Daniel; Kriesell, Matthias
Folksonomies have shown interesting potential for improving information discovery and exploration. Recent folksonomy systems explore the use of tag assignments, which combine Web resources with annotations (tags), and the users that have created the annotations. This article investigates on the effect of grouping resources in folksonomies, i.e. creating sets of resources, and using this additional structure for the tasks of search & ranking, and for tag recommendations. We propose several group-sensitive extensions of graph-based search and recommendation algorithms, and compare them with non group-sensitive versions. Our experiments show that the quality of search result ranking can be significantly improved by introducing and exploiting the grouping of resources (one-tailed t-Test, level of significance α=0.05). Furthermore, tag recommendations profit from the group context, and it is possible to make very good recommendations even for untagged resources- which currently known tag recommendation algorithms cannot fulfill.
Globalization and changing trends of biomedical research output
Conte, Marisa L.; Liu, Jing; Omary, M. Bishr
2017-01-01
The US continues to lead the world in research and development (R&D) expenditures, but there is concern that stagnation in federal support for biomedical research in the US could undermine the leading role the US has played in biomedical and clinical research discoveries. As a readout of research output in the US compared with other countries, assessment of original research articles published by US-based authors in ten clinical and basic science journals during 2000 to 2015 showed a steady decline of articles in high-ranking journals or no significant change in mid-ranking journals. In contrast, publication output originating from China-based investigators, in both high- and mid-ranking journals, has steadily increased commensurate with significant growth in R&D expenditures. These observations support the current concerns of stagnant and year-to-year uncertainty in US federal funding of biomedical research. PMID:28614799
Identification of significant features by the Global Mean Rank test.
Klammer, Martin; Dybowski, J Nikolaj; Hoffmann, Daniel; Schaab, Christoph
2014-01-01
With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.
A collaborative filtering approach for protein-protein docking scoring functions.
Bourquard, Thomas; Bernauer, Julie; Azé, Jérôme; Poupon, Anne
2011-04-22
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures.
A Collaborative Filtering Approach for Protein-Protein Docking Scoring Functions
Bourquard, Thomas; Bernauer, Julie; Azé, Jérôme; Poupon, Anne
2011-01-01
A protein-protein docking procedure traditionally consists in two successive tasks: a search algorithm generates a large number of candidate conformations mimicking the complex existing in vivo between two proteins, and a scoring function is used to rank them in order to extract a native-like one. We have already shown that using Voronoi constructions and a well chosen set of parameters, an accurate scoring function could be designed and optimized. However to be able to perform large-scale in silico exploration of the interactome, a near-native solution has to be found in the ten best-ranked solutions. This cannot yet be guaranteed by any of the existing scoring functions. In this work, we introduce a new procedure for conformation ranking. We previously developed a set of scoring functions where learning was performed using a genetic algorithm. These functions were used to assign a rank to each possible conformation. We now have a refined rank using different classifiers (decision trees, rules and support vector machines) in a collaborative filtering scheme. The scoring function newly obtained is evaluated using 10 fold cross-validation, and compared to the functions obtained using either genetic algorithms or collaborative filtering taken separately. This new approach was successfully applied to the CAPRI scoring ensembles. We show that for 10 targets out of 12, we are able to find a near-native conformation in the 10 best ranked solutions. Moreover, for 6 of them, the near-native conformation selected is of high accuracy. Finally, we show that this function dramatically enriches the 100 best-ranking conformations in near-native structures. PMID:21526112
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naftchi-Ardebili, Kasra; Hau, Nathania W.; Mazziotti, David A.
2011-11-15
Variational minimization of the ground-state energy as a function of the two-electron reduced density matrix (2-RDM), constrained by necessary N-representability conditions, provides a polynomial-scaling approach to studying strongly correlated molecules without computing the many-electron wave function. Here we introduce a route to enhancing necessary conditions for N representability through rank restriction of the 2-RDM. Rather than adding computationally more expensive N-representability conditions, we directly enhance the accuracy of two-particle (2-positivity) conditions through rank restriction, which removes degrees of freedom in the 2-RDM that are not sufficiently constrained. We select the rank of the particle-hole 2-RDM by deriving the ranks associatedmore » with model wave functions, including both mean-field and antisymmetrized geminal power (AGP) wave functions. Because the 2-positivity conditions are exact for quantum systems with AGP ground states, the rank of the particle-hole 2-RDM from the AGP ansatz provides a minimum for its value in variational 2-RDM calculations of general quantum systems. To implement the rank-restricted conditions, we extend a first-order algorithm for large-scale semidefinite programming. The rank-restricted conditions significantly improve the accuracy of the energies; for example, the percentages of correlation energies recovered for HF, CO, and N{sub 2} improve from 115.2%, 121.7%, and 121.5% without rank restriction to 97.8%, 101.1%, and 100.0% with rank restriction. Similar results are found at both equilibrium and nonequilibrium geometries. While more accurate, the rank-restricted N-representability conditions are less expensive computationally than the full-rank conditions.« less
Current approaches and future role of high content imaging in safety sciences and drug discovery.
van Vliet, Erwin; Daneshian, Mardas; Beilmann, Mario; Davies, Anthony; Fava, Eugenio; Fleck, Roland; Julé, Yvon; Kansy, Manfred; Kustermann, Stefan; Macko, Peter; Mundy, William R; Roth, Adrian; Shah, Imran; Uteng, Marianne; van de Water, Bob; Hartung, Thomas; Leist, Marcel
2014-01-01
High content imaging combines automated microscopy with image analysis approaches to simultaneously quantify multiple phenotypic and/or functional parameters in biological systems. The technology has become an important tool in the fields of safety sciences and drug discovery, because it can be used for mode-of-action identification, determination of hazard potency and the discovery of toxicity targets and biomarkers. In contrast to conventional biochemical endpoints, high content imaging provides insight into the spatial distribution and dynamics of responses in biological systems. This allows the identification of signaling pathways underlying cell defense, adaptation, toxicity and death. Therefore, high content imaging is considered a promising technology to address the challenges for the "Toxicity testing in the 21st century" approach. Currently, high content imaging technologies are frequently applied in academia for mechanistic toxicity studies and in pharmaceutical industry for the ranking and selection of lead drug compounds or to identify/confirm mechanisms underlying effects observed in vivo. A recent workshop gathered scientists working on high content imaging in academia, pharmaceutical industry and regulatory bodies with the objective to compile the state-of-the-art of the technology in the different institutions. Together they defined technical and methodological gaps, proposed quality control measures and performance standards, highlighted cell sources and new readouts and discussed future requirements for regulatory implementation. This review summarizes the discussion, proposed solutions and recommendations of the specialists contributing to the workshop.
Commercial assessment of petroleum systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stickland, P.J.
1996-12-31
One of the more difficult tasks facing an exploration team conducting a large regional prospectivity study is integrating the diverse data so that business decisions can be made. When is one Petroleum System more commercially attractive than another? They may have different geological risks and field sizes, but also different development and fiscal conditions. BHP Petroleum has developed a technique which incorporates all of the above factors into a decision making criteria called the Value Index. To achieve this each Petroleum System requires subdivision into smaller, more commercially focused units. These are termed Play Fairways. Play Fairways are temporally andmore » spatially distinct reservoir/seal packages. A number of these Play Fairways may exist within one Petroleum System. The Expected Value (EV) concept is widely established as a decision making tool for prospects but is inappropriate for evaluating a Play Fairway. The Value Index method applies the spirit of EV to the Play Fairway without the rigorous detail. The key factors in the Value Index calculation are: the probability that there will be at least one further discovery in the fairway, the estimated number of future discoveries, the discovery size distribution and the NPV versus reserves function. These are combined in a monte-carlo package to allow uncertainty in each parameter. The advantage of the Value Index over other criteria is it allows a comparison of fairways balancing both geological and commercial issues. BHP Petroleum has found this technique, used in conjunction with other criteria, extremely useful for ranking exploration focus areas in Australia.« less
Commercial assessment of petroleum systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stickland, P.J.
1996-01-01
One of the more difficult tasks facing an exploration team conducting a large regional prospectivity study is integrating the diverse data so that business decisions can be made. When is one Petroleum System more commercially attractive than another They may have different geological risks and field sizes, but also different development and fiscal conditions. BHP Petroleum has developed a technique which incorporates all of the above factors into a decision making criteria called the Value Index. To achieve this each Petroleum System requires subdivision into smaller, more commercially focused units. These are termed Play Fairways. Play Fairways are temporally andmore » spatially distinct reservoir/seal packages. A number of these Play Fairways may exist within one Petroleum System. The Expected Value (EV) concept is widely established as a decision making tool for prospects but is inappropriate for evaluating a Play Fairway. The Value Index method applies the spirit of EV to the Play Fairway without the rigorous detail. The key factors in the Value Index calculation are: the probability that there will be at least one further discovery in the fairway, the estimated number of future discoveries, the discovery size distribution and the NPV versus reserves function. These are combined in a monte-carlo package to allow uncertainty in each parameter. The advantage of the Value Index over other criteria is it allows a comparison of fairways balancing both geological and commercial issues. BHP Petroleum has found this technique, used in conjunction with other criteria, extremely useful for ranking exploration focus areas in Australia.« less
The Ins and Outs of Evaluating Web-Scale Discovery Services
ERIC Educational Resources Information Center
Hoeppner, Athena
2012-01-01
Librarians are familiar with the single-line form, the consolidated index, which represents a very large portion of a library's print and online collection. Their end users are familiar with the idea of a single search across a comprehensive index that produces a large, relevancy-ranked results list. Even though most patrons would not recognize…
Project Lefty: More Bang for the Search Query
ERIC Educational Resources Information Center
Varnum, Ken
2010-01-01
This article describes the Project Lefty, a search system that, at a minimum, adds a layer on top of traditional federated search tools that will make the wait for results more worthwhile for researchers. At best, Project Lefty improves search queries and relevance rankings for web-scale discovery tools to make the results themselves more relevant…
Xu, Rong; Wang, QuanQiu
2015-08-01
Schizophrenia (SCZ) is a common complex disorder with poorly understood mechanisms and no effective drug treatments. Despite the high prevalence and vast unmet medical need represented by the disease, many drug companies have moved away from the development of drugs for SCZ. Therefore, alternative strategies are needed for the discovery of truly innovative drug treatments for SCZ. Here, we present a disease phenome-driven computational drug repositioning approach for SCZ. We developed a novel drug repositioning system, PhenoPredict, by inferring drug treatments for SCZ from diseases that are phenotypically related to SCZ. The key to PhenoPredict is the availability of a comprehensive drug treatment knowledge base that we recently constructed. PhenoPredict retrieved all 18 FDA-approved SCZ drugs and ranked them highly (recall=1.0, and average ranking of 8.49%). When compared to PREDICT, one of the most comprehensive drug repositioning systems currently available, in novel predictions, PhenoPredict represented clear improvements over PREDICT in Precision-Recall (PR) curves, with a significant 98.8% improvement in the area under curve (AUC) of the PR curves. In addition, we discovered many drug candidates with mechanisms of action fundamentally different from traditional antipsychotics, some of which had published literature evidence indicating their treatment benefits in SCZ patients. In summary, although the fundamental pathophysiological mechanisms of SCZ remain unknown, integrated systems approaches to studying phenotypic connections among diseases may facilitate the discovery of innovative SCZ drugs. Copyright © 2015 Elsevier Inc. All rights reserved.
Discoveries far from the lamppost with matrix elements and ranking
DOE Office of Scientific and Technical Information (OSTI.GOV)
Debnath, Dipsikha; Gainer, James S.; Matchev, Konstantin T.
2015-04-01
The prevalence of null results in searches for new physics at the LHC motivates the effort to make these searches as model-independent as possible. We describe procedures for adapting the Matrix Element Method for situations where the signal hypothesis is not known a priori. We also present general and intuitive approaches for performing analyses and presenting results, which involve the flattening of background distributions using likelihood information. The first flattening method involves ranking events by background matrix element, the second involves quantile binning with respect to likelihood (and other) variables, and the third method involves reweighting histograms by the inversemore » of the background distribution.« less
Masugata, Hisashi; Senda, Shoichi; Goda, Fuminori; Yoshihara, Yumiko; Yoshikawa, Kay; Fujita, Norihiro; Himoto, Takashi; Okuyama, Hiroyuki; Taoka, Teruhisa; Imai, Masanobu; Kohno, Masakazu
2007-07-01
The aim of this study was to elucidate the cardiac function in bed-bound patients following cerebrovascular accidents. In accord with the criteria for activities of daily living (ADL) of the Japanese Ministry of Health, Labour and Welfare, 51 age-matched poststroke patients without heart disease were classified into 3 groups: rank A (house-bound) (n = 16, age, 85 +/- 6 years), rank B (chair-bound) (n = 16, age, 84 +/- 8 years), and rank C (bed-bound) (n = 19, age, 85 +/- 9 years). Using echocardiography, the left ventricular (LV) diastolic function was assessed by the ratio of early filling (E) and atrial contraction (A) transmitral flow velocities (E/A) of LV inflow. LV systolic function was assessed by LV ejection fraction (LVEF), and the Tei index was also measured to assess both LV systolic and diastolic function. No difference was observed in the E/A and LVEF among the 3 groups. The Tei index was higher in rank C (0.56 +/- 0.17) than in rank A (0.39 +/- 0.06) and rank B (0.48 +/- 0.17), and a statistically significant difference was observed between rank A and rank C (P < 0.05). Serum albumin and blood hemoglobin were significantly lower in rank C (3.1 +/- 0.4 and 10.6 +/- 1.8 g/dL) than in rank A (4.1 +/- 0.3 and 12.4 +/- 1.2 g/dL) (P < 0.001 and P < 0.05, respectively). These results indicate that latent cardiac dysfunction and poor nutritional status may exist in bed-bound patients (rank C) following cerebrovascular accidents. The Tei index may be a useful index of cardiac dysfunction in bed-bound patients because it is independent of the cardiac loading condition.
Jackknife Variance Estimator for Two Sample Linear Rank Statistics
1988-11-01
Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.
Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric
2010-07-20
Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
Improving predicted protein loop structure ranking using a Pareto-optimality consensus method
2010-01-01
Background Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. Results We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. Conclusions By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set. PMID:20642859
Compressed Continuous Computation v. 12/20/2016
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorodetsky, Alex
2017-02-17
A library for performing numerical computation with low-rank functions. The (C3) library enables performing continuous linear and multilinear algebra with multidimensional functions. Common tasks include taking "matrix" decompositions of vector- or matrix-valued functions, approximating multidimensional functions in low-rank format, adding or multiplying functions together, integrating multidimensional functions.
Pulmonary function outcomes for assessing cystic fibrosis care.
Wagener, Jeffrey S; Elkin, Eric P; Pasta, David J; Schechter, Michael S; Konstan, Michael W; Morgan, Wayne J
2015-05-01
Assessing cystic fibrosis (CF) patient quality of care requires the choice of an appropriate outcome measure. We looked systematically and in detail at pulmonary function outcomes that potentially reflect clinical practice patterns. Epidemiologic Study of Cystic Fibrosis data were used to evaluate six potential outcome variables (2002 best FVC, FEV(1), and FEF(25-75) and rate of decline for each from 2000 to 2002). We ranked CF care sites by outcome measure and then assessed any association with practice patterns and follow-up pulmonary function. Sites ranked in the top quartile had more frequent monitoring, treatment of exacerbations, and use of chronic therapies and oral corticosteroids. The follow-up rate of pulmonary function decline was not predicted by site ranking. Different pulmonary function outcomes associate slightly differently with practice patterns, although annual FEV(1) is at least as good as any other measure. Current site ranking only moderately predicts future ranking. Copyright © 2014 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.
Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy
Tian, Yuling; Zhang, Hongxian
2016-01-01
For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242
Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng
2016-01-01
Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability: http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307615
Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng
2016-06-15
Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. http://datamining-iip.fudan.edu.cn/service/DrugE-Rank zhusf@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Patrick, M; Ditunno, P; Ditunno, J F; Marino, R J; Scivoletto, G; Lam, T; Loffree, J; Tamburella, F; Leiby, B
2011-12-01
Blinded rank ordering. To determine consumer preference in walking function utilizing the walking Index for spinal cord injury II (WISCI II) in individuals with spinal cord injury (SCI)from the Canada, the Italy and the United States of America. In all, 42 consumers with incomplete SCI (25 cervical, 12 thoracic, 5 lumbar) from Canada (12/42), Italy (14/42) and the United States of America (16/42) ranked the 20 levels of the WISCI II scale by their individual preference for walking. Subjects were blinded to the original ranking of the WISCI II scale by clinical scientists. Photographs of each WISCI II level used in a previous pilot study were randomly shuffled and rank ordered. Percentile, conjoint/cluster and graphic analyses were performed. All three analyses illustrated consumer ranking followed a bimodal distribution. Ranking for two levels with physical assistance and two levels with a walker were bimodal with a difference of five to six ranks between consumer subgroups (quartile analysis). The larger cluster (N=20) showed preference for walking with assistance over the smaller cluster (N=12), whose preference was walking without assistance and more devices. In all, 64% (27/42) of consumers ranked WISCI II level with no devices or braces and 1 person assistance higher than multiple levels of the WISCI II requiring no assistance. These results were unexpected, as the hypothesis was that consumers would rank independent walking higher than walking with assistance. Consumer preference for walking function should be considered in addition to objective measures in designing SCI trials that use significant improvement in walking function as an outcome measure.
A Knowledge Discovery framework for Planetary Defense
NASA Astrophysics Data System (ADS)
Jiang, Y.; Yang, C. P.; Li, Y.; Yu, M.; Bambacus, M.; Seery, B.; Barbee, B.
2016-12-01
Planetary Defense, a project funded by NASA Goddard and the NSF, is a multi-faceted effort focused on the mitigation of Near Earth Object (NEO) threats to our planet. Currently, there exists a dispersion of information concerning NEO's amongst different organizations and scientists, leading to a lack of a coherent system of information to be used for efficient NEO mitigation. In this paper, a planetary defense knowledge discovery engine is proposed to better assist the development and integration of a NEO responding system. Specifically, we have implemented an organized information framework by two means: 1) the development of a semantic knowledge base, which provides a structure for relevant information. It has been developed by the implementation of web crawling and natural language processing techniques, which allows us to collect and store the most relevant structured information on a regular basis. 2) the development of a knowledge discovery engine, which allows for the efficient retrieval of information from our knowledge base. The knowledge discovery engine has been built on the top of Elasticsearch, an open source full-text search engine, as well as cutting-edge machine learning ranking and recommendation algorithms. This proposed framework is expected to advance the knowledge discovery and innovation in planetary science domain.
Discovery of extra-terrestrial life: assessment by scales of its importance and associated risks.
Almár, Iván; Race, Margaret S
2011-02-13
The Rio Scale accepted by the SETI Committee of the International Academy of Astronautics in 2002 is intended for use in evaluating the impact on society of any announcement regarding the discovery of evidence of extra-terrestrial (ET) intelligence. The Rio Scale is mathematically defined using three parameters (class of phenomenon, type of discovery and distance) and a δ factor, the assumed credibility of a claim. This paper proposes a new scale applicable to announcements alleging evidence of ET life within or outside our Solar System. The London Scale for astrobiology has mathematical structure and logic similar to the Rio Scale, and uses four parameters (life form, nature of phenomenon, type of discovery and distance) as well as a credibility factor δ to calculate a London Scale index (LSI) with values ranging from 0 to 10. The level of risk or biohazard associated with a purported discovery is evaluated independently of the LSI value and may be ranked in four categories. The combined information is intended to provide a scalar assessment of the scientific importance, validity and potential risks associated with putative evidence of ET life discovered on Earth, on nearby bodies in the Solar System or in our Galaxy.
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Švedas, Vytas
2014-01-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure–function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. PMID:24852248
Li, Xiu-Qing
2012-01-01
Most protein PageRank studies do not use signal flow direction information in protein interactions because this information was not readily available in large protein databases until recently. Therefore, four questions have yet to be answered: A) What is the general difference between signal emitting and receiving in a protein interactome? B) Which proteins are among the top ranked in directional ranking? C) Are high ranked proteins more evolutionarily conserved than low ranked ones? D) Do proteins with similar ranking tend to have similar subcellular locations? In this study, we address these questions using the forward, reverse, and non-directional PageRank approaches to rank an information-directional network of human proteins and study their evolutionary conservation. The forward ranking gives credit to information receivers, reverse ranking to information emitters, and non-directional ranking mainly to the number of interactions. The protein lists generated by the forward and non-directional rankings are highly correlated, but those by the reverse and non-directional rankings are not. The results suggest that the signal emitting/receiving system is characterized by key-emittings and relatively even receivings in the human protein interactome. Signaling pathway proteins are frequent in top ranked ones. Eight proteins are both informational top emitters and top receivers. Top ranked proteins, except a few species-related novel-function ones, are evolutionarily well conserved. Protein-subunit ranking position reflects subunit function. These results demonstrate the usefulness of different PageRank approaches in characterizing protein networks and provide insights to protein interaction in the cell. PMID:23028653
Xia, Wen-Fang; Tang, Fu-Lei; Xiong, Lei; Xiong, Shan; Jung, Ji-Ung; Lee, Dae-Hoon; Li, Xing-Sheng; Feng, Xu; Mei, Lin
2013-01-01
Receptor activator of NF-κB (RANK) plays a critical role in osteoclastogenesis, an essential process for the initiation of bone remodeling to maintain healthy bone mass and structure. Although the signaling and function of RANK have been investigated extensively, much less is known about the negative regulatory mechanisms of its signaling. We demonstrate in this paper that RANK trafficking, signaling, and function are regulated by VPS35, a major component of the retromer essential for selective endosome to Golgi retrieval of membrane proteins. VPS35 loss of function altered RANK ligand (RANKL)–induced RANK distribution, enhanced RANKL sensitivity, sustained RANKL signaling, and increased hyperresorptive osteoclast (OC) formation. Hemizygous deletion of the Vps35 gene in mice promoted hyperresorptive osteoclastogenesis, decreased bone formation, and caused a subsequent osteoporotic deficit, including decreased trabecular bone volumes and reduced trabecular thickness and density in long bones. These results indicate that VPS35 critically deregulates RANK signaling, thus restraining increased formation of hyperresorptive OCs and preventing osteoporotic deficits. PMID:23509071
A PageRank-based reputation model for personalised manufacturing service recommendation
NASA Astrophysics Data System (ADS)
Zhang, W. Y.; Zhang, S.; Guo, S. S.
2017-05-01
The number of manufacturing services for cross-enterprise business collaborations is increasing rapidly because of the explosive growth of Web service technologies. This trend demands intelligent and robust models to address information overload in order to enable efficient discovery of manufacturing services. In this paper, we present a personalised manufacturing service recommendation approach, which combines a PageRank-based reputation model and a collaborative filtering technique in a unified framework for recommending the right manufacturing services to an active service user for supply chain deployment. The novel aspect of this research is adapting the PageRank algorithm to a network of service-oriented multi-echelon supply chain in order to determine both user reputation and service reputation. In addition, it explores the use of these methods in alleviating data sparsity and cold start problems that hinder traditional collaborative filtering techniques. A case study is conducted to validate the practicality and effectiveness of the proposed approach in recommending the right manufacturing services to active service users.
Integrated Low-Rank-Based Discriminative Feature Learning for Recognition.
Zhou, Pan; Lin, Zhouchen; Zhang, Chao
2016-05-01
Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature learning methods have been proposed and have achieved great success in many applications. However, these methods perform feature learning and subsequent classification in two separate steps, which may not be optimal for recognition tasks. In this paper, we present a supervised low-rank-based approach for learning discriminative features. By integrating latent low-rank representation (LatLRR) with a ridge regression-based classifier, our approach combines feature learning with classification, so that the regulated classification error is minimized. In this way, the extracted features are more discriminative for the recognition tasks. Our approach benefits from a recent discovery on the closed-form solutions to noiseless LatLRR. When there is noise, a robust Principal Component Analysis (PCA)-based denoising step can be added as preprocessing. When the scale of a problem is large, we utilize a fast randomized algorithm to speed up the computation of robust PCA. Extensive experimental results demonstrate the effectiveness and robustness of our method.
Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data.
Tian, Bo; Duan, Qiong; Zhao, Can; Teng, Ben; He, Zengyou
2017-05-17
Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.
Physical and in silico approaches identify DNA-PK in a Tax DNA-damage response interactome
Ramadan, Emad; Ward, Michael; Guo, Xin; Durkin, Sarah S; Sawyer, Adam; Vilela, Marcelo; Osgood, Christopher; Pothen, Alex; Semmes, Oliver J
2008-01-01
Background We have initiated an effort to exhaustively map interactions between HTLV-1 Tax and host cellular proteins. The resulting Tax interactome will have significant utility toward defining new and understanding known activities of this important viral protein. In addition, the completion of a full Tax interactome will also help shed light upon the functional consequences of these myriad Tax activities. The physical mapping process involved the affinity isolation of Tax complexes followed by sequence identification using tandem mass spectrometry. To date we have mapped 250 cellular components within this interactome. Here we present our approach to prioritizing these interactions via an in silico culling process. Results We first constructed an in silico Tax interactome comprised of 46 literature-confirmed protein-protein interactions. This number was then reduced to four Tax-interactions suspected to play a role in DNA damage response (Rad51, TOP1, Chk2, 53BP1). The first-neighbor and second-neighbor interactions of these four proteins were assembled from available human protein interaction databases. Through an analysis of betweenness and closeness centrality measures, and numbers of interactions, we ranked proteins in the first neighborhood. When this rank list was compared to the list of physical Tax-binding proteins, DNA-PK was the highest ranked protein common to both lists. An overlapping clustering of the Tax-specific second-neighborhood protein network showed DNA-PK to be one of three bridge proteins that link multiple clusters in the DNA damage response network. Conclusion The interaction of Tax with DNA-PK represents an important biological paradigm as suggested via consensus findings in vivo and in silico. We present this methodology as an approach to discovery and as a means of validating components of a consensus Tax interactome. PMID:18922151
Nonconvex Nonsmooth Low Rank Minimization via Iteratively Reweighted Nuclear Norm.
Lu, Canyi; Tang, Jinhui; Yan, Shuicheng; Lin, Zhouchen
2016-02-01
The nuclear norm is widely used as a convex surrogate of the rank function in compressive sensing for low rank matrix recovery with its applications in image recovery and signal processing. However, solving the nuclear norm-based relaxed convex problem usually leads to a suboptimal solution of the original rank minimization problem. In this paper, we propose to use a family of nonconvex surrogates of L0-norm on the singular values of a matrix to approximate the rank function. This leads to a nonconvex nonsmooth minimization problem. Then, we propose to solve the problem by an iteratively re-weighted nuclear norm (IRNN) algorithm. IRNN iteratively solves a weighted singular value thresholding problem, which has a closed form solution due to the special properties of the nonconvex surrogate functions. We also extend IRNN to solve the nonconvex problem with two or more blocks of variables. In theory, we prove that the IRNN decreases the objective function value monotonically, and any limit point is a stationary point. Extensive experiments on both synthesized data and real images demonstrate that IRNN enhances the low rank matrix recovery compared with the state-of-the-art convex algorithms.
An intelligent content discovery technique for health portal content management.
De Silva, Daswin; Burstein, Frada
2014-04-23
Continuous content management of health information portals is a feature vital for its sustainability and widespread acceptance. Knowledge and experience of a domain expert is essential for content management in the health domain. The rate of generation of online health resources is exponential and thereby manual examination for relevance to a specific topic and audience is a formidable challenge for domain experts. Intelligent content discovery for effective content management is a less researched topic. An existing expert-endorsed content repository can provide the necessary leverage to automatically identify relevant resources and evaluate qualitative metrics. This paper reports on the design research towards an intelligent technique for automated content discovery and ranking for health information portals. The proposed technique aims to improve efficiency of the current mostly manual process of portal content management by utilising an existing expert-endorsed content repository as a supporting base and a benchmark to evaluate the suitability of new content A model for content management was established based on a field study of potential users. The proposed technique is integral to this content management model and executes in several phases (ie, query construction, content search, text analytics and fuzzy multi-criteria ranking). The construction of multi-dimensional search queries with input from Wordnet, the use of multi-word and single-word terms as representative semantics for text analytics and the use of fuzzy multi-criteria ranking for subjective evaluation of quality metrics are original contributions reported in this paper. The feasibility of the proposed technique was examined with experiments conducted on an actual health information portal, the BCKOnline portal. Both intermediary and final results generated by the technique are presented in the paper and these help to establish benefits of the technique and its contribution towards effective content management. The prevalence of large numbers of online health resources is a key obstacle for domain experts involved in content management of health information portals and websites. The proposed technique has proven successful at search and identification of resources and the measurement of their relevance. It can be used to support the domain expert in content management and thereby ensure the health portal is up-to-date and current.
Automatic Figure Ranking and User Interfacing for Intelligent Figure Search
Yu, Hong; Liu, Feifan; Ramesh, Balaji Polepalli
2010-01-01
Background Figures are important experimental results that are typically reported in full-text bioscience articles. Bioscience researchers need to access figures to validate research facts and to formulate or to test novel research hypotheses. On the other hand, the sheer volume of bioscience literature has made it difficult to access figures. Therefore, we are developing an intelligent figure search engine (http://figuresearch.askhermes.org). Existing research in figure search treats each figure equally, but we introduce a novel concept of “figure ranking”: figures appearing in a full-text biomedical article can be ranked by their contribution to the knowledge discovery. Methodology/Findings We empirically validated the hypothesis of figure ranking with over 100 bioscience researchers, and then developed unsupervised natural language processing (NLP) approaches to automatically rank figures. Evaluating on a collection of 202 full-text articles in which authors have ranked the figures based on importance, our best system achieved a weighted error rate of 0.2, which is significantly better than several other baseline systems we explored. We further explored a user interfacing application in which we built novel user interfaces (UIs) incorporating figure ranking, allowing bioscience researchers to efficiently access important figures. Our evaluation results show that 92% of the bioscience researchers prefer as the top two choices the user interfaces in which the most important figures are enlarged. With our automatic figure ranking NLP system, bioscience researchers preferred the UIs in which the most important figures were predicted by our NLP system than the UIs in which the most important figures were randomly assigned. In addition, our results show that there was no statistical difference in bioscience researchers' preference in the UIs generated by automatic figure ranking and UIs by human ranking annotation. Conclusion/Significance The evaluation results conclude that automatic figure ranking and user interfacing as we reported in this study can be fully implemented in online publishing. The novel user interface integrated with the automatic figure ranking system provides a more efficient and robust way to access scientific information in the biomedical domain, which will further enhance our existing figure search engine to better facilitate accessing figures of interest for bioscientists. PMID:20949102
Learning to rank using user clicks and visual features for image retrieval.
Yu, Jun; Tao, Dacheng; Wang, Meng; Rui, Yong
2015-04-01
The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.
Inferring drug-disease associations based on known protein complexes.
Yu, Liang; Huang, Jianbin; Ma, Zhixin; Zhang, Jing; Zou, Yapeng; Gao, Lin
2015-01-01
Inferring drug-disease associations is critical in unveiling disease mechanisms, as well as discovering novel functions of available drugs, or drug repositioning. Previous work is primarily based on drug-gene-disease relationship, which throws away many important information since genes execute their functions through interacting others. To overcome this issue, we propose a novel methodology that discover the drug-disease association based on protein complexes. Firstly, the integrated heterogeneous network consisting of drugs, protein complexes, and disease are constructed, where we assign weights to the drug-disease association by using probability. Then, from the tripartite network, we get the indirect weighted relationships between drugs and diseases. The larger the weight, the higher the reliability of the correlation. We apply our method to mental disorders and hypertension, and validate the result by using comparative toxicogenomics database. Our ranked results can be directly reinforced by existing biomedical literature, suggesting that our proposed method obtains higher specificity and sensitivity. The proposed method offers new insight into drug-disease discovery. Our method is publicly available at http://1.complexdrug.sinaapp.com/Drug_Complex_Disease/Data_Download.html.
Inferring drug-disease associations based on known protein complexes
2015-01-01
Inferring drug-disease associations is critical in unveiling disease mechanisms, as well as discovering novel functions of available drugs, or drug repositioning. Previous work is primarily based on drug-gene-disease relationship, which throws away many important information since genes execute their functions through interacting others. To overcome this issue, we propose a novel methodology that discover the drug-disease association based on protein complexes. Firstly, the integrated heterogeneous network consisting of drugs, protein complexes, and disease are constructed, where we assign weights to the drug-disease association by using probability. Then, from the tripartite network, we get the indirect weighted relationships between drugs and diseases. The larger the weight, the higher the reliability of the correlation. We apply our method to mental disorders and hypertension, and validate the result by using comparative toxicogenomics database. Our ranked results can be directly reinforced by existing biomedical literature, suggesting that our proposed method obtains higher specificity and sensitivity. The proposed method offers new insight into drug-disease discovery. Our method is publicly available at http://1.complexdrug.sinaapp.com/Drug_Complex_Disease/Data_Download.html. PMID:26044949
Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals
Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.
2018-03-20
A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.
A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less
Ficklin, Stephen P.; Feltus, F. Alex
2011-01-01
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species. PMID:21606319
Ficklin, Stephen P; Feltus, F Alex
2011-07-01
One major objective for plant biology is the discovery of molecular subsystems underlying complex traits. The use of genetic and genomic resources combined in a systems genetics approach offers a means for approaching this goal. This study describes a maize (Zea mays) gene coexpression network built from publicly available expression arrays. The maize network consisted of 2,071 loci that were divided into 34 distinct modules that contained 1,928 enriched functional annotation terms and 35 cofunctional gene clusters. Of note, 391 maize genes of unknown function were found to be coexpressed within modules along with genes of known function. A global network alignment was made between this maize network and a previously described rice (Oryza sativa) coexpression network. The IsoRankN tool was used, which incorporates both gene homology and network topology for the alignment. A total of 1,173 aligned loci were detected between the two grass networks, which condensed into 154 conserved subgraphs that preserved 4,758 coexpression edges in rice and 6,105 coexpression edges in maize. This study provides an early view into maize coexpression space and provides an initial network-based framework for the translation of functional genomic and genetic information between these two vital agricultural species.
Methods for evaluating and ranking transportation energy conservation programs
NASA Astrophysics Data System (ADS)
Santone, L. C.
1981-04-01
The energy conservation programs are assessed in terms of petroleum savings, incremental costs to consumers probability of technical and market success, and external impacts due to environmental, economic, and social factors. Three ranking functions and a policy matrix are used to evaluate the programs. The net present value measure which computes the present worth of petroleum savings less the present worth of costs is modified by dividing by the present value of DOE funding to obtain a net present value per program dollar. The comprehensive ranking function takes external impacts into account. Procedures are described for making computations of the ranking functions and the attributes that require computation. Computations are made for the electric vehicle, Stirling engine, gas turbine, and MPG mileage guide program.
Building a better search engine for earth science data
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Yang, C. P.; Moroni, D. F.; McGibbney, L. J.; Jiang, Y.; Huang, T.; Greguska, F. R., III; Li, Y.; Finch, C. J.
2017-12-01
Free text data searching of earth science datasets has been implemented with varying degrees of success and completeness across the spectrum of the 12 NASA earth sciences data centers. At the JPL Physical Oceanography Distributed Active Archive Center (PO.DAAC) the search engine has been developed around the Solr/Lucene platform. Others have chosen other popular enterprise search platforms like Elasticsearch. Regardless, the default implementations of these search engines leveraging factors such as dataset popularity, term frequency and inverse document term frequency do not fully meet the needs of precise relevancy and ranking of earth science search results. For the PO.DAAC, this shortcoming has been identified for several years by its external User Working Group that has assigned several recommendations to improve the relevancy and discoverability of datasets related to remotely sensed sea surface temperature, ocean wind, waves, salinity, height and gravity that comprise a total count of over 500 public availability datasets. Recently, the PO.DAAC has teamed with an effort led by George Mason University to improve the improve the search and relevancy ranking of oceanographic data via a simple search interface and powerful backend services called MUDROD (Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery) funded by the NASA AIST program. MUDROD has mined and utilized the combination of PO.DAAC earth science dataset metadata, usage metrics, and user feedback and search history to objectively extract relevance for improved data discovery and access. In addition to improved dataset relevance and ranking, the MUDROD search engine also returns recommendations to related datasets and related user queries. This presentation will report on use cases that drove the architecture and development, and the success metrics and improvements on search precision and recall that MUDROD has demonstrated over the existing PO.DAAC search interfaces.
Reports from Other Journals: Nature
NASA Astrophysics Data System (ADS)
Heinhorst, Sabine; Cannon, Gordon
1997-05-01
The first Nature issue of the new year (January 2, 1997, pp 13-16) featured the annual commentary on anniversaries of scientific discoveries and inventions through the centuries, a brief tour de force in the history of science. This year's enlightening list includes, among other things, the discovery of the electron (1897) and of mountains on the moon (1647), and the first description of Herba inebrians, now commonly known as tobacco (1497). A new book reviewed in the January 16 issue (pp 215-216), The Scientific 100: A Ranking of the Most Influential Scientists, Past and Present, describes lives and scientific contributions of the 100 most important scientists, as perceived by author John Simmons. "Top-of-the-line" scientists include Isaac Newton (No. 1), Niels Bohr (3),
NASA Technical Reports Server (NTRS)
Colozza, Anthony J.; Cataldo, Robert L.
2015-01-01
This study looks at the applicability of utilizing the Segmented Thermoelectric Modular Radioisotope Thermoelectric Generator (STEM-RTG) or a high-power radioisotope generator to replace the Advanced Stirling Radioisotope Generator (ASRG), which had been identified as the baseline power system for a number of planetary exploration mission studies. Nine different Discovery-Class missions were examined to determine the applicability of either the STEM-RTG or the high-power SRG power systems in replacing the ASRG. The nine missions covered exploration across the solar system and included orbiting spacecraft, landers and rovers. Based on the evaluation a ranking of the applicability of each alternate power system to the proposed missions was made.
Scientific impact: the story of your big hit
NASA Astrophysics Data System (ADS)
Sinatra, Roberta; Wang, Dashun; Deville, Pierre; Song, Chaoming; Barabasi, Albert-Laszlo
2014-03-01
A gradual increase in performance through learning and practice characterize most trades, from sport to music or engineering, and common sense suggests this to be true in science as well. This prompts us to ask: what are the precise patterns that lead to scientific excellence? Does performance indeed improve throughout a scientific career? Are there quantifiable signs of an impending scientific hit? Using citation-based measures as a proxy of impact, we show that (i) major discoveries are not preceded by works of increasing impact, nor are followed by work of higher impact, (ii) the precise time ranking of the highest impact work in a scientist's career is uniformly random, with the higher probability to have a major discovery in the middle of scientific careers being due only to changes in productivity, (iii) there is a strong correlation between the highest impact work and average impact of a scientist's work. These findings suggest that the impact of a paper is drawn randomly from an impact distribution that is unique for each scientist. We present a model which allows to reconstruct the individual impact distribution, making possible to create synthetic careers that exhibit the same properties of the real data and to define a ranking based on the overall impact of a scientist. RS acknowledges support from the James McDonnell Foundation.
Dufresne, Sébastien S; Dumont, Nicolas A; Boulanger-Piette, Antoine; Fajardo, Val A; Gamu, Daniel; Kake-Guena, Sandrine-Aurélie; David, Rares Ovidiu; Bouchard, Patrice; Lavergne, Éliane; Penninger, Josef M; Pape, Paul C; Tupling, A Russell; Frenette, Jérôme
2016-04-15
Receptor-activator of nuclear factor-κB (RANK), its ligand RANKL, and the soluble decoy receptor osteoprotegerin are the key regulators of osteoclast differentiation and bone remodeling. Here we show that RANK is also expressed in fully differentiated myotubes and skeletal muscle. Muscle RANK deletion has inotropic effects in denervated, but not in sham, extensor digitorum longus (EDL) muscles preventing the loss of maximum specific force while promoting muscle atrophy, fatigability, and increased proportion of fast-twitch fibers. In denervated EDL muscles, RANK deletion markedly increased stromal interaction molecule 1 content, a Ca(2+)sensor, and altered activity of the sarco(endo)plasmic reticulum Ca(2+)-ATPase (SERCA) modulating Ca(2+)storage. Muscle RANK deletion had no significant effects on the sham or denervated slow-twitch soleus muscles. These data identify a novel role for RANK as a key regulator of Ca(2+)storage and SERCA activity, ultimately affecting denervated skeletal muscle function. Copyright © 2016 the American Physiological Society.
Dufresne, Sébastien S.; Dumont, Nicolas A.; Boulanger-Piette, Antoine; Fajardo, Val A.; Gamu, Daniel; Kake-Guena, Sandrine-Aurélie; David, Rares Ovidiu; Bouchard, Patrice; Lavergne, Éliane; Penninger, Josef M.; Pape, Paul C.; Tupling, A. Russell
2016-01-01
Receptor-activator of nuclear factor-κB (RANK), its ligand RANKL, and the soluble decoy receptor osteoprotegerin are the key regulators of osteoclast differentiation and bone remodeling. Here we show that RANK is also expressed in fully differentiated myotubes and skeletal muscle. Muscle RANK deletion has inotropic effects in denervated, but not in sham, extensor digitorum longus (EDL) muscles preventing the loss of maximum specific force while promoting muscle atrophy, fatigability, and increased proportion of fast-twitch fibers. In denervated EDL muscles, RANK deletion markedly increased stromal interaction molecule 1 content, a Ca2+ sensor, and altered activity of the sarco(endo)plasmic reticulum Ca2+-ATPase (SERCA) modulating Ca2+ storage. Muscle RANK deletion had no significant effects on the sham or denervated slow-twitch soleus muscles. These data identify a novel role for RANK as a key regulator of Ca2+ storage and SERCA activity, ultimately affecting denervated skeletal muscle function. PMID:26825123
Cross-modal learning to rank via latent joint representation.
Wu, Fei; Jiang, Xinyang; Li, Xi; Tang, Siliang; Lu, Weiming; Zhang, Zhongfei; Zhuang, Yueting
2015-05-01
Cross-modal ranking is a research topic that is imperative to many applications involving multimodal data. Discovering a joint representation for multimodal data and learning a ranking function are essential in order to boost the cross-media retrieval (i.e., image-query-text or text-query-image). In this paper, we propose an approach to discover the latent joint representation of pairs of multimodal data (e.g., pairs of an image query and a text document) via a conditional random field and structural learning in a listwise ranking manner. We call this approach cross-modal learning to rank via latent joint representation (CML²R). In CML²R, the correlations between multimodal data are captured in terms of their sharing hidden variables (e.g., topics), and a hidden-topic-driven discriminative ranking function is learned in a listwise ranking manner. The experiments show that the proposed approach achieves a good performance in cross-media retrieval and meanwhile has the capability to learn the discriminative representation of multimodal data.
Discovery of a New Nearby Star
NASA Technical Reports Server (NTRS)
Teegarden, B. J.; Pravdo, S. H.; Covey, K.; Frazier, O.; Hawley, S. L.; Hicks, M.; Lawrence, K.; McGlynn, T.; Reid, I. N.; Shaklan, S. B.
2003-01-01
We report the discovery of a nearby star with a very large proper motion of 5.06 +/- 0.03 arcsec/yr. The star is called SO025300.5+165258 and referred to herein as HPMS (high proper motion star). The discovery came as a result of a search of the SkyMorph database, a sensitive and persistent survey that is well suited for finding stars with high proper motions. There are currently only 7 known stars with proper motions greater than 5 arcsec/yr. We have determined a preliminary value for the parallax of pi = 0.43 +/- 0.13 arcsec. If this value holds our new star ranks behind only the Alpha Centauri system (including Proxima Centauri) and Barnard's star in the list of our nearest stellar neighbours. The spectrum and measured tangential velocity indicate that HPMS is a main-sequence star with spectral type M6.5. However, if our distance measurement is correct, the HPMS is underluminous by 1.2 +/- 0.7 mag.
Bacterial polyesters: biosynthesis, biodegradable plastics and biotechnology.
Lenz, Robert W; Marchessault, Robert H
2005-01-01
The discovery and chemical identification, in the 1920s, of the aliphatic polyester: poly(3-hydroxybutyrate), PHB, as a granular component in bacterial cells proceeded without any of the controversies which marked the recognition of macromolecules by Staudinger. Some thirty years after its discovery, PHB was recognized as the prototypical biodegradable thermoplastic to solve the waste disposal challenge. The development effort led by Imperial Chemical Industries Ltd., encouraged interdisciplinary research from genetic engineering and biotechnology to the study of enzymes involved in biosynthesis and biodegradation. From the simple PHB homopolyester discovered by Maurice Lemoigne in the mid-twenties, a family of over 100 different aliphatic polyesters of the same general structure has been discovered. Depending on bacterial species and substrates, these high molecular weight stereoregular polyesters have emerged as a new family of natural polymers ranking with nucleic acids, polyamides, polyisoprenoids, polyphenols, polyphosphates, and polysaccharides. In this historical review, the chemical, biochemical and microbial highlights are linked to personalities and locations involved with the events covering a discovery timespan of 75 years.
Lost Near-Earth Object Candidates
NASA Astrophysics Data System (ADS)
Veres, Peter; Farnocchia, Davide; Williams, Gareth; Keys, Sonia; Boardman, Ian; Holman, Matthew J.; Payne, Matthew J.
2017-10-01
The number of discovered Near-Earth Objects (NEOs) increases rapidly, currently exceeding 16,000 NEOs. 2016 was the most productive year ever with 1,888 NEO discoveries. The NEO discovery process typically begins with three to five detections of a previously unidentified object that are reported to the Minor Planet Center (MPC). According to the plane-of-sky motion, the MPC ranks all of the new candidate discoveries for the likelihood of being NEOs using the so-called digest score. If the digest score is greater than 65 the observations appear on the publicly accessible NEO Confirmation Page (NEOCP). Objects on the NEOCP are followed up in subsequent hours and days. When enough observations are collected to ensure that the object is real and that the orbit is determined, the NEO is officially announced with its new designation by a Minor Planet Electronic Circular. However, 14% of NEO candidates never get confirmed and are therefore lost due to the lack of follow-up observations. We analyzed the lost NEO candidates that appeared on NEOCP in 2013-2016 and investigated the reasons why they were not confirmed. In particular, we studied the properties of the lost NEO candidates with a digest score of 100 that were reported by the two most prolific discovery sites - Pan-STARRS1 (F51) and Mt. Lemmon Survey (G96). We derived their plane-of-sky positions and rates, brightness, and ephemeris uncertainties, and assessed correlations with the phase of the moon and seasonal effects apparent in the given observatory’s data. We concluded that lost NEO candidates typically have a larger rate of motion and larger uncertainties than those of confirmed objects. However, many of the lost candidates could be recovered. In fact, the 1-sigma plane-of-sky uncertainty was still within ±0.5 deg in 79% (F51) and 69% (G96) of the cases 24 hours after discovery and in 31% (F51) and 30% (G96) of the cases 48 hours after discovery. If all of the NEO candidates with a digest score of 100 had been followed up, the number of discovered NEOs would have been larger by 685+/-30 in 2013-2016. The measures to decrease the number of lost NEO candidates include improved uncertainty maps and uncertainties as function of time on the NEOCP.
Eleftherohorinou, Hariklia; Hoggart, Clive J; Wright, Victoria J; Levin, Michael; Coin, Lachlan J M
2011-09-01
Rheumatoid arthritis (RA) is the commonest chronic, systemic, inflammatory disorder affecting ∼1% of the world population. It has a strong genetic component and a growing number of associated genes have been discovered in genome-wide association studies (GWAS), which nevertheless only account for 23% of the total genetic risk. We aimed to identify additional susceptibility loci through the analysis of GWAS in the context of biological function. We bridge the gap between pathway and gene-oriented analyses of GWAS, by introducing a pathway-driven gene stability-selection methodology that identifies potential causal genes in the top-associated disease pathways that may be driving the pathway association signals. We analysed the WTCCC and the NARAC studies of ∼5000 and ∼2000 subjects, respectively. We examined 700 pathways comprising ∼8000 genes. Ranking pathways by significance revealed that the NARAC top-ranked ∼6% laid within the top 10% of WTCCC. Gene selection on those pathways identified 58 genes in WTCCC and 61 in NARAC; 21 of those were common (P(overlap)< 10(-21)), of which 16 were novel discoveries. Among the identified genes, we validated 10 known RA associations in WTCCC and 13 in NARAC, not discovered using single-SNP approaches on the same data. Gene ontology functional enrichment analysis on the identified genes showed significant over-representation of signalling activity (P< 10(-29)) in both studies. Our findings suggest a novel model of RA genetic predisposition, which involves cell-membrane receptors and genes in second messenger signalling systems, in addition to genes that regulate immune responses, which have been the focus of interest previously.
Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre
2012-07-01
Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.
Suplatov, Dmitry; Kirilin, Eugeny; Arbatsky, Mikhail; Takhaveev, Vakil; Svedas, Vytas
2014-07-01
The new web-server pocketZebra implements the power of bioinformatics and geometry-based structural approaches to identify and rank subfamily-specific binding sites in proteins by functional significance, and select particular positions in the structure that determine selective accommodation of ligands. A new scoring function has been developed to annotate binding sites by the presence of the subfamily-specific positions in diverse protein families. pocketZebra web-server has multiple input modes to meet the needs of users with different experience in bioinformatics. The server provides on-site visualization of the results as well as off-line version of the output in annotated text format and as PyMol sessions ready for structural analysis. pocketZebra can be used to study structure-function relationship and regulation in large protein superfamilies, classify functionally important binding sites and annotate proteins with unknown function. The server can be used to engineer ligand-binding sites and allosteric regulation of enzymes, or implemented in a drug discovery process to search for potential molecular targets and novel selective inhibitors/effectors. The server, documentation and examples are freely available at http://biokinet.belozersky.msu.ru/pocketzebra and there are no login requirements. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
The fundamental unit of pain is the cell.
Reichling, David B; Green, Paul G; Levine, Jon D
2013-12-01
The molecular/genetic era has seen the discovery of a staggering number of molecules implicated in pain mechanisms [18,35,61,69,96,133,150,202,224]. This has stimulated pharmaceutical and biotechnology companies to invest billions of dollars to develop drugs that enhance or inhibit the function of many these molecules. Unfortunately this effort has provided a remarkably small return on this investment. Inevitably, transformative progress in this field will require a better understanding of the functional links among the ever-growing ranks of "pain molecules," as well as their links with an even larger number of molecules with which they interact. Importantly, all of these molecules exist side-by-side, within a functional unit, the cell, and its adjacent matrix of extracellular molecules. To paraphrase a recent editorial in Science magazine [223], although we live in the Golden age of Genetics, the fundamental unit of biology is still arguably the cell, and the cell is the critical structural and functional setting in which the function of pain-related molecules must be understood. This review summarizes our current understanding of the nociceptor as a cell-biological unit that responds to a variety of extracellular inputs with a complex and highly organized interaction of signaling molecules. We also discuss the insights that this approach is providing into peripheral mechanisms of chronic pain and sex dependence in pain.
NASA Astrophysics Data System (ADS)
Webb, S.
2013-09-01
Until relatively recently, many authors have assumed that if extraterrestrial life is discovered it will be via the discovery of extraterrestrial intelligence: we can best try to detect life by adopting the SETI approach of trying to detect beacons or artefacts. The Rio Scale, proposed by Almár and Tarter in 2000, is a tool for quantifying the potential significance for society of any such reported detection. However, improvements in technology and advances in astrobiology raise the possibility that the discovery of extraterrestrial life will instead be via the detection of atmospheric biosignatures. The London Scale, proposed by Almár in 2010, attempts to quantify the potential significance of the discovery of extraterrestrial life rather than extraterrestrial intelligence. What might be the consequences of the announcement of a discovery that ranks low on the London Scale? In other words, what might be society's reaction if 'first contact' is via the remote sensing of the byproducts of unicellular organisms rather than with the products of high intelligence? Here, I examine some possible reactions to that question; in particular, I discuss how such an announcement might affect our views of life here on Earth and of humanity's place in the universe.
KOJAK: Scalable Semantic Link Discovery Via Integrated Knowledge-Based and Statistical Reasoning
2006-11-01
program can find interesting connections in a network without having to learn the patterns of interestingness beforehand. The key advantage of our...Interesting Instances in Semantic Graphs Below we describe how the UNICORN framework can discover interesting instances in a multi-relational dataset...We can now describe how UNICORN solves the first problem of finding the top interesting nodes in a semantic net by ranking them according to
Computational functional genomics-based approaches in analgesic drug discovery and repurposing.
Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred; Lötsch, Jörn
2018-06-01
Persistent pain is a major healthcare problem affecting a fifth of adults worldwide with still limited treatment options. The search for new analgesics increasingly includes the novel research area of functional genomics, which combines data derived from various processes related to DNA sequence, gene expression or protein function and uses advanced methods of data mining and knowledge discovery with the goal of understanding the relationship between the genome and the phenotype. Its use in drug discovery and repurposing for analgesic indications has so far been performed using knowledge discovery in gene function and drug target-related databases; next-generation sequencing; and functional proteomics-based approaches. Here, we discuss recent efforts in functional genomics-based approaches to analgesic drug discovery and repurposing and highlight the potential of computational functional genomics in this field including a demonstration of the workflow using a novel R library 'dbtORA'.
Whittleton, Sarah R; Otero-de-la-Roza, A; Johnson, Erin R
2017-02-14
Accurate energy ranking is a key facet to the problem of first-principles crystal-structure prediction (CSP) of molecular crystals. This work presents a systematic assessment of B86bPBE-XDM, a semilocal density functional combined with the exchange-hole dipole moment (XDM) dispersion model, for energy ranking using 14 compounds from the first five CSP blind tests. Specifically, the set of crystals studied comprises 11 rigid, planar compounds and 3 co-crystals. The experimental structure was correctly identified as the lowest in lattice energy for 12 of the 14 total crystals. One of the exceptions is 4-hydroxythiophene-2-carbonitrile, for which the experimental structure was correctly identified once a quasi-harmonic estimate of the vibrational free-energy contribution was included, evidencing the occasional importance of thermal corrections for accurate energy ranking. The other exception is an organic salt, where charge-transfer error (also called delocalization error) is expected to cause the base density functional to be unreliable. Provided the choice of base density functional is appropriate and an estimate of temperature effects is used, XDM-corrected density-functional theory is highly reliable for the energetic ranking of competing crystal structures.
Cappi, C; Brentani, H; Lima, L; Sanders, S J; Zai, G; Diniz, B J; Reis, V N S; Hounie, A G; Conceição do Rosário, M; Mariani, D; Requena, G L; Puga, R; Souza-Duran, F L; Shavitt, R G; Pauls, D L; Miguel, E C; Fernandez, T V
2016-01-01
Studies of rare genetic variation have identified molecular pathways conferring risk for developmental neuropsychiatric disorders. To date, no published whole-exome sequencing studies have been reported in obsessive-compulsive disorder (OCD). We sequenced all the genome coding regions in 20 sporadic OCD cases and their unaffected parents to identify rare de novo (DN) single-nucleotide variants (SNVs). The primary aim of this pilot study was to determine whether DN variation contributes to OCD risk. To this aim, we evaluated whether there is an elevated rate of DN mutations in OCD, which would justify this approach toward gene discovery in larger studies of the disorder. Furthermore, to explore functional molecular correlations among genes with nonsynonymous DN SNVs in OCD probands, a protein–protein interaction (PPI) network was generated based on databases of direct molecular interactions. We applied Degree-Aware Disease Gene Prioritization (DADA) to rank the PPI network genes based on their relatedness to a set of OCD candidate genes from two OCD genome-wide association studies (Stewart et al., 2013; Mattheisen et al., 2014). In addition, we performed a pathway analysis with genes from the PPI network. The rate of DN SNVs in OCD was 2.51 × 10−8 per base per generation, significantly higher than a previous estimated rate in unaffected subjects using the same sequencing platform and analytic pipeline. Several genes harboring DN SNVs in OCD were highly interconnected in the PPI network and ranked high in the DADA analysis. Nearly all the DN SNVs in this study are in genes expressed in the human brain, and a pathway analysis revealed enrichment in immunological and central nervous system functioning and development. The results of this pilot study indicate that further investigation of DN variation in larger OCD cohorts is warranted to identify specific risk genes and to confirm our preliminary finding with regard to PPI network enrichment for particular biological pathways and functions. PMID:27023170
Ab Initio Reactive Computer Aided Molecular Design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Martínez, Todd J.
Few would dispute that theoretical chemistry tools can now provide keen insights into chemical phenomena. Yet the holy grail of efficient and reliable prediction of complex reactivity has remained elusive. Fortunately, recent advances in electronic structure theory based on the concepts of both element- and rank-sparsity, coupled with the emergence of new highly parallel computer architectures, have led to a significant increase in the time and length scales which can be simulated using first principles molecular dynamics. This then opens the possibility of new discovery-based approaches to chemical reactivity, such as the recently proposed ab initio nanoreactor. Here, we arguemore » that due to these and other recent advances, the holy grail of computational discovery for complex chemical reactivity is rapidly coming within our reach.« less
Ab Initio Reactive Computer Aided Molecular Design
Martínez, Todd J.
2017-03-21
Few would dispute that theoretical chemistry tools can now provide keen insights into chemical phenomena. Yet the holy grail of efficient and reliable prediction of complex reactivity has remained elusive. Fortunately, recent advances in electronic structure theory based on the concepts of both element- and rank-sparsity, coupled with the emergence of new highly parallel computer architectures, have led to a significant increase in the time and length scales which can be simulated using first principles molecular dynamics. This then opens the possibility of new discovery-based approaches to chemical reactivity, such as the recently proposed ab initio nanoreactor. Here, we arguemore » that due to these and other recent advances, the holy grail of computational discovery for complex chemical reactivity is rapidly coming within our reach.« less
Ranking Support Vector Machine with Kernel Approximation
Dou, Yong
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256
Ranking Support Vector Machine with Kernel Approximation.
Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi
2017-01-01
Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.
Indexing and Retrieval for the Web.
ERIC Educational Resources Information Center
Rasmussen, Edie M.
2003-01-01
Explores current research on indexing and ranking as retrieval functions of search engines on the Web. Highlights include measuring search engine stability; evaluation of Web indexing and retrieval; Web crawlers; hyperlinks for indexing and ranking; ranking for metasearch; document structure; citation indexing; relevance; query evaluation;…
An Intelligent Content Discovery Technique for Health Portal Content Management
2014-01-01
Background Continuous content management of health information portals is a feature vital for its sustainability and widespread acceptance. Knowledge and experience of a domain expert is essential for content management in the health domain. The rate of generation of online health resources is exponential and thereby manual examination for relevance to a specific topic and audience is a formidable challenge for domain experts. Intelligent content discovery for effective content management is a less researched topic. An existing expert-endorsed content repository can provide the necessary leverage to automatically identify relevant resources and evaluate qualitative metrics. Objective This paper reports on the design research towards an intelligent technique for automated content discovery and ranking for health information portals. The proposed technique aims to improve efficiency of the current mostly manual process of portal content management by utilising an existing expert-endorsed content repository as a supporting base and a benchmark to evaluate the suitability of new content Methods A model for content management was established based on a field study of potential users. The proposed technique is integral to this content management model and executes in several phases (ie, query construction, content search, text analytics and fuzzy multi-criteria ranking). The construction of multi-dimensional search queries with input from Wordnet, the use of multi-word and single-word terms as representative semantics for text analytics and the use of fuzzy multi-criteria ranking for subjective evaluation of quality metrics are original contributions reported in this paper. Results The feasibility of the proposed technique was examined with experiments conducted on an actual health information portal, the BCKOnline portal. Both intermediary and final results generated by the technique are presented in the paper and these help to establish benefits of the technique and its contribution towards effective content management. Conclusions The prevalence of large numbers of online health resources is a key obstacle for domain experts involved in content management of health information portals and websites. The proposed technique has proven successful at search and identification of resources and the measurement of their relevance. It can be used to support the domain expert in content management and thereby ensure the health portal is up-to-date and current. PMID:25654440
Shim, Hongseok; Kim, Ji Hyun; Kim, Chan Yeong; Hwang, Sohyun; Kim, Hyojin; Yang, Sunmo; Lee, Ji Eun; Lee, Insuk
2016-11-16
Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish (Danio rerio), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet (www.inetbio.org/danionet), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Ranking metrics in gene set enrichment analysis: do they matter?
Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna
2017-05-12
There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.
NASA Astrophysics Data System (ADS)
Obayashi, Takeshi; Kinoshita, Kengo
2013-01-01
Gene coexpression analysis is a powerful approach to elucidate gene function. We have established and developed this approach using vast amount of publicly available gene expression data measured by microarray techniques. The coexpressed genes are used to estimate gene function of the guide gene or to construct gene coexpression networks. In the case to construct gene networks, researchers should introduce an arbitrary threshold of gene coexpression, because gene coexpression value is continuous value. In the viewpoint to introduce common threshold of gene coexpression, we previously reported rank of Pearson's correlation coefficient (PCC) is more useful than the original PCC value. In this manuscript, we re-assessed the measure of gene coexpression to construct gene coexpression network, and found that mutual rank (MR) of PCC showed better performance than rank of PCC and the original PCC in low false positive rate.
Science Fairs for Science Literacy
NASA Astrophysics Data System (ADS)
Mackey, Katherine; Culbertson, Timothy
2014-03-01
Scientific discovery, technological revolutions, and complex global challenges are commonplace in the modern era. People are bombarded with news about climate change, pandemics, and genetically modified organisms, and scientific literacy has never been more important than in the present day. Yet only 29% of American adults have sufficient understanding to be able to read science stories reported in the popular press [Miller, 2010], and American students consistently rank below other nations in math and science [National Center for Education Statistics, 2012].
Learning of Rule Ensembles for Multiple Attribute Ranking Problems
NASA Astrophysics Data System (ADS)
Dembczyński, Krzysztof; Kotłowski, Wojciech; Słowiński, Roman; Szeląg, Marcin
In this paper, we consider the multiple attribute ranking problem from a Machine Learning perspective. We propose two approaches to statistical learning of an ensemble of decision rules from decision examples provided by the Decision Maker in terms of pairwise comparisons of some objects. The first approach consists in learning a preference function defining a binary preference relation for a pair of objects. The result of application of this function on all pairs of objects to be ranked is then exploited using the Net Flow Score procedure, giving a linear ranking of objects. The second approach consists in learning a utility function for single objects. The utility function also gives a linear ranking of objects. In both approaches, the learning is based on the boosting technique. The presented approaches to Preference Learning share good properties of the decision rule preference model and have good performance in the massive-data learning problems. As Preference Learning and Multiple Attribute Decision Aiding share many concepts and methodological issues, in the introduction, we review some aspects bridging these two fields. To illustrate the two approaches proposed in this paper, we solve with them a toy example concerning the ranking of a set of cars evaluated by multiple attributes. Then, we perform a large data experiment on real data sets. The first data set concerns credit rating. Since recent research in the field of Preference Learning is motivated by the increasing role of modeling preferences in recommender systems and information retrieval, we chose two other massive data sets from this area - one comes from movie recommender system MovieLens, and the other concerns ranking of text documents from 20 Newsgroups data set.
Linear Subspace Ranking Hashing for Cross-Modal Retrieval.
Li, Kai; Qi, Guo-Jun; Ye, Jun; Hua, Kien A
2017-09-01
Hashing has attracted a great deal of research in recent years due to its effectiveness for the retrieval and indexing of large-scale high-dimensional multimedia data. In this paper, we propose a novel ranking-based hashing framework that maps data from different modalities into a common Hamming space where the cross-modal similarity can be measured using Hamming distance. Unlike existing cross-modal hashing algorithms where the learned hash functions are binary space partitioning functions, such as the sign and threshold function, the proposed hashing scheme takes advantage of a new class of hash functions closely related to rank correlation measures which are known to be scale-invariant, numerically stable, and highly nonlinear. Specifically, we jointly learn two groups of linear subspaces, one for each modality, so that features' ranking orders in different linear subspaces maximally preserve the cross-modal similarities. We show that the ranking-based hash function has a natural probabilistic approximation which transforms the original highly discontinuous optimization problem into one that can be efficiently solved using simple gradient descent algorithms. The proposed hashing framework is also flexible in the sense that the optimization procedures are not tied up to any specific form of loss function, which is typical for existing cross-modal hashing methods, but rather we can flexibly accommodate different loss functions with minimal changes to the learning steps. We demonstrate through extensive experiments on four widely-used real-world multimodal datasets that the proposed cross-modal hashing method can achieve competitive performance against several state-of-the-arts with only moderate training and testing time.
Xue, Li C.; Jordan, Rafael A.; EL-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2015-01-01
Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. Dock-Rank uses interface residues predicted by partner-specific sequence homology-based protein–protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. PMID:23873600
Xue, Li C; Jordan, Rafael A; El-Manzalawy, Yasser; Dobbs, Drena; Honavar, Vasant
2014-02-01
Selecting near-native conformations from the immense number of conformations generated by docking programs remains a major challenge in molecular docking. We introduce DockRank, a novel approach to scoring docked conformations based on the degree to which the interface residues of the docked conformation match a set of predicted interface residues. DockRank uses interface residues predicted by partner-specific sequence homology-based protein-protein interface predictor (PS-HomPPI), which predicts the interface residues of a query protein with a specific interaction partner. We compared the performance of DockRank with several state-of-the-art docking scoring functions using Success Rate (the percentage of cases that have at least one near-native conformation among the top m conformations) and Hit Rate (the percentage of near-native conformations that are included among the top m conformations). In cases where it is possible to obtain partner-specific (PS) interface predictions from PS-HomPPI, DockRank consistently outperforms both (i) ZRank and IRAD, two state-of-the-art energy-based scoring functions (improving Success Rate by up to 4-fold); and (ii) Variants of DockRank that use predicted interface residues obtained from several protein interface predictors that do not take into account the binding partner in making interface predictions (improving success rate by up to 39-fold). The latter result underscores the importance of using partner-specific interface residues in scoring docked conformations. We show that DockRank, when used to re-rank the conformations returned by ClusPro, improves upon the original ClusPro rankings in terms of both Success Rate and Hit Rate. DockRank is available as a server at http://einstein.cs.iastate.edu/DockRank/. Copyright © 2013 Wiley Periodicals, Inc.
Sailaukhanuly, Yerbolat; Zhakupbekova, Arai; Amutova, Farida; Carlsen, Lars
2013-01-01
Knowledge of the environmental behavior of chemicals is a fundamental part of the risk assessment process. The present paper discusses various methods of ranking of a series of persistent organic pollutants (POPs) according to the persistence, bioaccumulation and toxicity (PBT) characteristics. Traditionally ranking has been done as an absolute (total) ranking applying various multicriteria data analysis methods like simple additive ranking (SAR) or various utility functions (UFs) based rankings. An attractive alternative to these ranking methodologies appears to be partial order ranking (POR). The present paper compares different ranking methods like SAR, UF and POR. Significant discrepancies between the rankings are noted and it is concluded that partial order ranking, as a method without any pre-assumptions concerning possible relation between the single parameters, appears as the most attractive ranking methodology. In addition to the initial ranking partial order methodology offers a wide variety of analytical tools to elucidate the interplay between the objects to be ranked and the ranking parameters. In the present study is included an analysis of the relative importance of the single P, B and T parameters. Copyright © 2012 Elsevier Ltd. All rights reserved.
CANDO and the infinite drug discovery frontier
Minie, Mark; Chopra, Gaurav; Sethi, Geetika; Horst, Jeremy; White, George; Roy, Ambrish; Hatti, Kaushik; Samudrala, Ram
2014-01-01
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound–proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on basic science methodologies to predict and analyze protein structure, function, and interactions developed by us and others. Our signature comparison and ranking approach yielded benchmarking accuracies of 12–25% for 1439 indications with at least two approved compounds. We prospectively validated 49/82 ‘high value’ predictions from nine studies covering seven indications, with comparable or better activity to existing drugs, which serve as novel repurposed therapeutics. Our approach may be generalized to compounds beyond those approved by the FDA, and can also consider mutations in protein structures to enable personalization. Our platform provides a holistic multiscale modeling framework of complex atomic, molecular, and physiological systems with broader applications in medicine and engineering. PMID:24980786
The Functions and Dysfunctions of College Rankings: An Analysis of Institutional Expenditure
ERIC Educational Resources Information Center
Kim, Jeongeun
2018-01-01
College rankings have become a powerful influence in higher education. While the determinants of educational quality are not clearly defined, college rankings designate an institution's standing in a numerical order based on quantifiable measurements that focus primarily on institutional resources. Previous research has identified the…
Dufresne, Sébastien S; Boulanger-Piette, Antoine; Bossé, Sabrina; Argaw, Anteneh; Hamoudi, Dounia; Marcadet, Laetitia; Gamu, Daniel; Fajardo, Val A; Yagita, Hideo; Penninger, Josef M; Russell Tupling, A; Frenette, Jérôme
2018-04-24
Although there is a strong association between osteoporosis and skeletal muscle atrophy/dysfunction, the functional relevance of a particular biological pathway that regulates synchronously bone and skeletal muscle physiopathology is still elusive. Receptor-activator of nuclear factor κB (RANK), its ligand RANKL and the soluble decoy receptor osteoprotegerin (OPG) are the key regulators of osteoclast differentiation and bone remodelling. We thus hypothesized that RANK/RANKL/OPG, which is a key pathway for bone regulation, is involved in Duchenne muscular dystrophy (DMD) physiopathology. Our results show that muscle-specific RANK deletion (mdx-RANK mko ) in dystrophin deficient mdx mice improves significantly specific force [54% gain in force] of EDL muscles with no protective effect against eccentric contraction-induced muscle dysfunction. In contrast, full-length OPG-Fc injections restore the force of dystrophic EDL muscles [162% gain in force], protect against eccentric contraction-induced muscle dysfunction ex vivo and significantly improve functional performance on downhill treadmill and post-exercise physical activity. Since OPG serves a soluble receptor for RANKL and as a decoy receptor for TRAIL, mdx mice were injected with anti-RANKL and anti-TRAIL antibodies to decipher the dual function of OPG. Injections of anti-RANKL and/or anti-TRAIL increase significantly the force of dystrophic EDL muscle [45% and 17% gains in force, respectively]. In agreement, truncated OPG-Fc that contains only RANKL domains produces similar gains, in terms of force production, than anti-RANKL treatments. To corroborate that full-length OPG-Fc also acts independently of RANK/RANKL pathway, dystrophin/RANK double-deficient mice were treated with full-length OPG-Fc for 10 days. Dystrophic EDL muscles exhibited a significant gain in force relative to untreated dystrophin/RANK double-deficient mice, indicating that the effect of full-length OPG-Fc is in part independent of the RANKL/RANK interaction. The sarco/endoplasmic reticulum Ca 2+ ATPase (SERCA) activity is significantly depressed in dysfunctional and dystrophic muscles and full-length OPG-Fc treatment increased SERCA activity and SERCA-2a expression. These findings demonstrate the superiority of full-length OPG-Fc treatment relative to truncated OPG-Fc, anti-RANKL, anti-TRAIL or muscle RANK deletion in improving dystrophic muscle function, integrity and protection against eccentric contractions. In conclusion, full-length OPG-Fc represents an efficient alternative in the development of new treatments for muscular dystrophy in which a single therapeutic approach may be foreseeable to maintain both bone and skeletal muscle functions.
Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal
2017-01-01
Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
ARS turns fifteen: la quinceañera bonita.
Sen, Chandan K
2013-01-01
ARS was aimed at advancing the erstwhile niche field of redox biology to a more central position in research. Currently, ARS ranks first (impact factor: 8.456) in the field of redox biology. Of 8336 journals listed in Journal Citation Reports, ARS ranks 205th. The next journal in redox biology ranks 449th. ARS ranks 169th of 8336 in immediacy index. The next journal in redox biology ranks 923rd. Thus, ARS is the primary source of hot papers in redox sciences and healthcare. To grow footprint and overall impact, ARS has nearly doubled the annual publication volume from roughly 200 to 400 in one year. Because the manuscript volume represents the denominator of the impact factor calculation, such a sharp increase in volume would be predicted to a proportionally lower impact factor. Because of the robust current upward momentum, ARS will be affected less than that predicted by simple arithmetic and will maintain its top position even after such aggressive volume expansion. As another year passes, the additional manuscripts will get more time to be cited, and therefore the impact factor is expected to bounce back resulting in a much stronger journal with a substantially enhanced overall presence. ARS currently publishes 36 issues annually as two series: ARS-Discoveries, and ARS-Therapeutics. Redox biology does have the potential of major health impact. ARS-Therapeutics is the first and only forum dedicated to highlight that strength. I am grateful to the global redox village for their unreserved support to raise ARS and this fascinating field of redox research and healthcare. Antioxid. Redox Signal. 18, 1-4.
Machine learning for the New York City power grid.
Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon
2012-02-01
Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid.
Direct Comparison of the Precision of the New Hologic Horizon Model With the Old Discovery Model.
Whittaker, LaTarsha G; McNamara, Elizabeth A; Vath, Savoun; Shaw, Emily; Malabanan, Alan O; Parker, Robert A; Rosen, Harold N
2017-11-22
Previous publications suggested that the precision of the new Hologic Horizon densitometer might be better than that of the previous Discovery model, but these observations were confounded by not using the same participants and technologists on both densitometers. We sought to study this issue methodically by measuring in vivo precision in both densitometers using the same patients and technologists. Precision studies for the Horizon and Discovery models were done by acquiring spine, hip, and forearm bone mineral density twice on 30 participants. The set of 4 scans on each participant (2 on the Discovery, 2 on the Horizon) was acquired by the same technologist using the same scanning mode. The pairs of data were used to calculate the least significant change according to the International Society for Clinical Densitometry guidelines. The significance of the difference between least significant changes was assessed using a Wilcoxon signed-rank test of the difference between the mean square error of the absolute value of the differences between paired measurements on the Discovery (Δ-Discovery) and the mean square error of the absolute value of the differences between paired measurements on the Horizon (Δ-Horizon). At virtually all anatomic sites, there was a nonsignificant trend for the precision to be better for the Horizon than for the Discovery. As more vertebrae were excluded from analysis, the precision deteriorated on both densitometers. The precision between densitometers was almost identical when reporting only 1 vertebral body. (1) There was a nonsignificant trend for greater precision on the new Hologic Horizon compared with the older Discovery model. (2) The difference in precision of the spine bone mineral density between the Horizon and the Discovery models decreases as fewer vertebrae are included. (3) These findings are substantially similar to previously published results which had not controlled as well for confounding from using different subjects and technologists. Copyright © 2017 The International Society for Clinical Densitometry. Published by Elsevier Inc. All rights reserved.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS.
Yu, Hwanjo; Kim, Taehoon; Oh, Jinoh; Ko, Ilhwan; Kim, Sungchul; Han, Wook-Shin
2010-04-16
Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user's feedback and efficiently processes the function to return relevant articles in real time.
Enabling multi-level relevance feedback on PubMed by integrating rank learning into DBMS
2010-01-01
Background Finding relevant articles from PubMed is challenging because it is hard to express the user's specific intention in the given query interface, and a keyword query typically retrieves a large number of results. Researchers have applied machine learning techniques to find relevant articles by ranking the articles according to the learned relevance function. However, the process of learning and ranking is usually done offline without integrated with the keyword queries, and the users have to provide a large amount of training documents to get a reasonable learning accuracy. This paper proposes a novel multi-level relevance feedback system for PubMed, called RefMed, which supports both ad-hoc keyword queries and a multi-level relevance feedback in real time on PubMed. Results RefMed supports a multi-level relevance feedback by using the RankSVM as the learning method, and thus it achieves higher accuracy with less feedback. RefMed "tightly" integrates the RankSVM into RDBMS to support both keyword queries and the multi-level relevance feedback in real time; the tight coupling of the RankSVM and DBMS substantially improves the processing time. An efficient parameter selection method for the RankSVM is also proposed, which tunes the RankSVM parameter without performing validation. Thereby, RefMed achieves a high learning accuracy in real time without performing a validation process. RefMed is accessible at http://dm.postech.ac.kr/refmed. Conclusions RefMed is the first multi-level relevance feedback system for PubMed, which achieves a high accuracy with less feedback. It effectively learns an accurate relevance function from the user’s feedback and efficiently processes the function to return relevant articles in real time. PMID:20406504
Diagnostic Peptide Discovery: Prioritization of Pathogen Diagnostic Markers Using Multiple Features
Carmona, Santiago J.; Sartor, Paula A.; Leguizamón, María S.; Campetella, Oscar E.; Agüero, Fernán
2012-01-01
The availability of complete pathogen genomes has renewed interest in the development of diagnostics for infectious diseases. Synthetic peptide microarrays provide a rapid, high-throughput platform for immunological testing of potential B-cell epitopes. However, their current capacity prevent the experimental screening of complete “peptidomes”. Therefore, computational approaches for prediction and/or prioritization of diagnostically relevant peptides are required. In this work we describe a computational method to assess a defined set of molecular properties for each potential diagnostic target in a reference genome. Properties such as sub-cellular localization or expression level were evaluated for the whole protein. At a higher resolution (short peptides), we assessed a set of local properties, such as repetitive motifs, disorder (structured vs natively unstructured regions), trans-membrane spans, genetic polymorphisms (conserved vs. divergent regions), predicted B-cell epitopes, and sequence similarity against human proteins and other potential cross-reacting species (e.g. other pathogens endemic in overlapping geographical locations). A scoring function based on these different features was developed, and used to rank all peptides from a large eukaryotic pathogen proteome. We applied this method to the identification of candidate diagnostic peptides in the protozoan Trypanosoma cruzi, the causative agent of Chagas disease. We measured the performance of the method by analyzing the enrichment of validated antigens in the high-scoring top of the ranking. Based on this measure, our integrative method outperformed alternative prioritizations based on individual properties (such as B-cell epitope predictors alone). Using this method we ranked 10 million 12-mer overlapping peptides derived from the complete T. cruzi proteome. Experimental screening of 190 high-scoring peptides allowed the identification of 37 novel epitopes with diagnostic potential, while none of the low scoring peptides showed significant reactivity. Many of the metrics employed are dependent on standard bioinformatic tools and data, so the method can be easily extended to other pathogen genomes. PMID:23272069
A Relevancy Algorithm for Curating Earth Science Data Around Phenomenon
NASA Technical Reports Server (NTRS)
Maskey, Manil; Ramachandran, Rahul; Li, Xiang; Weigel, Amanda; Bugbee, Kaylin; Gatlin, Patrick; Miller, J. J.
2017-01-01
Earth science data are being collected for various science needs and applications, processed using different algorithms at multiple resolutions and coverages, and then archived at different archiving centers for distribution and stewardship causing difficulty in data discovery. Curation, which typically occurs in museums, art galleries, and libraries, is traditionally defined as the process of collecting and organizing information around a common subject matter or a topic of interest. Curating data sets around topics or areas of interest addresses some of the data discovery needs in the field of Earth science, especially for unanticipated users of data. This paper describes a methodology to automate search and selection of data around specific phenomena. Different components of the methodology including the assumptions, the process, and the relevancy ranking algorithm are described. The paper makes two unique contributions to improving data search and discovery capabilities. First, the paper describes a novel methodology developed for automatically curating data around a topic using Earthscience metadata records. Second, the methodology has been implemented as a standalone web service that is utilized to augment search and usability of data in a variety of tools.
A relevancy algorithm for curating earth science data around phenomenon
NASA Astrophysics Data System (ADS)
Maskey, Manil; Ramachandran, Rahul; Li, Xiang; Weigel, Amanda; Bugbee, Kaylin; Gatlin, Patrick; Miller, J. J.
2017-09-01
Earth science data are being collected for various science needs and applications, processed using different algorithms at multiple resolutions and coverages, and then archived at different archiving centers for distribution and stewardship causing difficulty in data discovery. Curation, which typically occurs in museums, art galleries, and libraries, is traditionally defined as the process of collecting and organizing information around a common subject matter or a topic of interest. Curating data sets around topics or areas of interest addresses some of the data discovery needs in the field of Earth science, especially for unanticipated users of data. This paper describes a methodology to automate search and selection of data around specific phenomena. Different components of the methodology including the assumptions, the process, and the relevancy ranking algorithm are described. The paper makes two unique contributions to improving data search and discovery capabilities. First, the paper describes a novel methodology developed for automatically curating data around a topic using Earth science metadata records. Second, the methodology has been implemented as a stand-alone web service that is utilized to augment search and usability of data in a variety of tools.
NASA Astrophysics Data System (ADS)
Doerr, Timothy; Alves, Gelio; Yu, Yi-Kuo
2006-03-01
Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time. This suggests a way to efficiently find approximate solutions - - find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the fininte number of high- ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks - - peptide sequencing using tandem mass spectrometry data.
Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges
NASA Astrophysics Data System (ADS)
Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.
2013-12-01
Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM program also provides a standards based online metadata editor (OME) for PIs to submit their data to the ARM Data Archive. USGS CSAS metadata Clearinghouse aggregates metadata records from several USGS projects and other partner organizations. The Clearinghouse allows users to search and discover over 100,000 biological and ecological datasets from a single web portal. The Clearinghouse also enabled some new data discovery functions such as enhanced geo-spatial searches based on land and ocean classifications, metadata completeness rankings, data linkage via digital object identifiers (DOIs), and semantically enhanced keyword searches. The Clearinghouse also currently working on enabling a dashboard which allows the data providers to look at various statistics such as number their records accessed via the Clearinghouse, most popular keywords, metadata quality report and DOI creation service. The Clearinghouse also publishes metadata records to broader portals such as NSF DataONE and Data.gov. The author will also present how these capabilities are currently reused by the recent and upcoming data centers such as DOE's NGEE-Arctic project. References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94. [2]Devarakonda, R., Shrestha, B., Palanisamy, G., Hook, L., Killeffer, T., Krassovski, M., ... & Frame, M. (2014, October). OME: Tool for generating and managing metadata to handle BigData. In BigData Conference (pp. 8-10).
ACFIS: a web server for fragment-based drug discovery
Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu
2016-01-01
In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown ‘chemical space’ to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for ‘chemical space’, which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. PMID:27150808
ACFIS: a web server for fragment-based drug discovery.
Hao, Ge-Fei; Jiang, Wen; Ye, Yuan-Nong; Wu, Feng-Xu; Zhu, Xiao-Lei; Guo, Feng-Biao; Yang, Guang-Fu
2016-07-08
In order to foster innovation and improve the effectiveness of drug discovery, there is a considerable interest in exploring unknown 'chemical space' to identify new bioactive compounds with novel and diverse scaffolds. Hence, fragment-based drug discovery (FBDD) was developed rapidly due to its advanced expansive search for 'chemical space', which can lead to a higher hit rate and ligand efficiency (LE). However, computational screening of fragments is always hampered by the promiscuous binding model. In this study, we developed a new web server Auto Core Fragment in silico Screening (ACFIS). It includes three computational modules, PARA_GEN, CORE_GEN and CAND_GEN. ACFIS can generate core fragment structure from the active molecule using fragment deconstruction analysis and perform in silico screening by growing fragments to the junction of core fragment structure. An integrated energy calculation rapidly identifies which fragments fit the binding site of a protein. We constructed a simple interface to enable users to view top-ranking molecules in 2D and the binding mode in 3D for further experimental exploration. This makes the ACFIS a highly valuable tool for drug discovery. The ACFIS web server is free and open to all users at http://chemyang.ccnu.edu.cn/ccb/server/ACFIS/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Thomas, Phillip S.
2017-01-01
We propose a method for solving the vibrational Schrödinger equation with which one can compute spectra for molecules with more than ten atoms. It uses sum-of-product (SOP) basis functions stored in a canonical polyadic tensor format and generated by evaluating matrix-vector products. By doing a sequence of partial optimizations, in each of which the factors in a SOP basis function for a single coordinate are optimized, the rank of the basis functions is reduced as matrix-vector products are computed. This is better than using an alternating least squares method to reduce the rank, as is done in the reduced-rank block power method. Partial optimization is better because it speeds up the calculation by about an order of magnitude and allows one to significantly reduce the memory cost. We demonstrate the effectiveness of the new method by computing vibrational spectra of two molecules, ethylene oxide (C2H4O) and cyclopentadiene (C5H6), with 7 and 11 atoms, respectively. PMID:28571348
Thomas, Phillip S; Carrington, Tucker
2017-05-28
We propose a method for solving the vibrational Schrödinger equation with which one can compute spectra for molecules with more than ten atoms. It uses sum-of-product (SOP) basis functions stored in a canonical polyadic tensor format and generated by evaluating matrix-vector products. By doing a sequence of partial optimizations, in each of which the factors in a SOP basis function for a single coordinate are optimized, the rank of the basis functions is reduced as matrix-vector products are computed. This is better than using an alternating least squares method to reduce the rank, as is done in the reduced-rank block power method. Partial optimization is better because it speeds up the calculation by about an order of magnitude and allows one to significantly reduce the memory cost. We demonstrate the effectiveness of the new method by computing vibrational spectra of two molecules, ethylene oxide (C 2 H 4 O) and cyclopentadiene (C 5 H 6 ), with 7 and 11 atoms, respectively.
Nightingale, Julia Anne; Osmond, Clive
2017-09-01
Outcome data for UK cystic fibrosis centres are publicly available in an annual report, which ranks centres by median FEV 1 % predicted. We wished to assess whether there are differences in lung function outcomes between adult centres that might imply differing standards of care. UK Registry data from 4761 subjects at 34 anonymised adult centres were used to calculate mean FEV 1 % and rate of change of lung function for 2007-13. These measures were used to rank centres and compare outcomes. There are minor differences between centres for mean FEV 1 % for some years of the study and for rate of change of lung function over the study period. However, rankings are critically dependent on the outcome measure chosen and centre variation becomes negligible once patient population characteristics are taken into account. We have demonstrated that the ranking of centres is biased and any apparent difference in respiratory outcomes is unlikely to be related to differing standards of care between centres. Copyright © 2017 European Cystic Fibrosis Society. Published by Elsevier B.V. All rights reserved.
Approximate Computing Techniques for Iterative Graph Algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh
Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
New insights into old methods for identifying causal rare variants.
Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi
2011-11-29
The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
The fundamental unit of pain is the cell.
Reichling, David B; Green, Paul G; Levine, Jon D
2013-12-01
The molecular/genetic era has seen the discovery of a staggering number of molecules implicated in pain mechanisms [18,35,61,69,96,133,150,202,224]. This has stimulated pharmaceutical and biotechnology companies to invest billions of dollars to develop drugs that enhance or inhibit the function of many these molecules. Unfortunately this effort has provided a remarkably small return on this investment. Inevitably, transformative progress in this field will require a better understanding of the functional links among the ever-growing ranks of "pain molecules," as well as their links with an even larger number of molecules with which they interact. Importantly, all of these molecules exist side-by-side, within a functional unit, the cell, and its adjacent matrix of extracellular molecules. To paraphrase a recent editorial in Science magazine [223], although we live in the Golden age of Genetics, the fundamental unit of biology is still arguably the cell, and the cell is the critical structural and functional setting in which the function of pain-related molecules must be understood. This review summarizes our current understanding of the nociceptor as a cell-biological unit that responds to a variety of extracellular inputs with a complex and highly organized interaction of signaling molecules. We also discuss the insights that this approach is providing into peripheral mechanisms of chronic pain and sex dependence in pain. Copyright © 2013 International Association for the Study of Pain. Published by Elsevier B.V. All rights reserved.
Siebenkäs, Alrun; Schumacher, Jens; Roscher, Christiane
2015-03-27
Functional traits are often used as species-specific mean trait values in comparative plant ecology or trait-based predictions of ecosystem processes, assuming that interspecific differences are greater than intraspecific trait variation and that trait-based ranking of species is consistent across environments. Although this assumption is increasingly challenged, there is a lack of knowledge regarding to what degree the extent of intraspecific trait variation in response to varying environmental conditions depends on the considered traits and the characteristics of the studied species to evaluate the consequences for trait-based species ranking. We studied functional traits of eight perennial grassland species classified into different functional groups (forbs vs. grasses) and varying in their inherent growth stature (tall vs. small) in a common garden experiment with different environments crossing three levels of nutrient availability and three levels of light availability over 4 months of treatment applications. Grasses and forbs differed in almost all above- and belowground traits, while trait differences related to growth stature were generally small. The traits showing the strongest responses to resource availability were similarly for grasses and forbs those associated with allocation and resource uptake. The strength of trait variation in response to varying resource availability differed among functional groups (grasses > forbs) and species of varying growth stature (small-statured > tall-statured species) in many aboveground traits, but only to a lower extent in belowground traits. These differential responses altered trait-based species ranking in many aboveground traits, such as specific leaf area, tissue nitrogen and carbon concentrations and above-belowground allocation (leaf area ratio and root : shoot ratio) at varying resource supply, while trait-based species ranking was more consistent in belowground traits. Our study shows that species grouping according to functional traits is valid, but trait-based species ranking depends on environmental conditions, thus limiting the applicability of species-specific mean trait values in ecological studies. Published by Oxford University Press on behalf of the Annals of Botany Company.
Extreme learning machine for ranking: generalization analysis and applications.
Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin
2014-05-01
The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. Copyright © 2014 Elsevier Ltd. All rights reserved.
7 CFR 633.5 - Application procedures.
Code of Federal Regulations, 2011 CFR
2011-01-01
... ranking criteria and limit the approval of requests for agreements in accordance with the ranking scheme... of matching funds, significance of wetland functions and values, and estimated success of protection...
Resolution of ranking hierarchies in directed networks.
Letizia, Elisa; Barucca, Paolo; Lillo, Fabrizio
2018-01-01
Identifying hierarchies and rankings of nodes in directed graphs is fundamental in many applications such as social network analysis, biology, economics, and finance. A recently proposed method identifies the hierarchy by finding the ordered partition of nodes which minimises a score function, termed agony. This function penalises the links violating the hierarchy in a way depending on the strength of the violation. To investigate the resolution of ranking hierarchies we introduce an ensemble of random graphs, the Ranked Stochastic Block Model. We find that agony may fail to identify hierarchies when the structure is not strong enough and the size of the classes is small with respect to the whole network. We analytically characterise the resolution threshold and we show that an iterated version of agony can partly overcome this resolution limit.
Resolution of ranking hierarchies in directed networks
Barucca, Paolo; Lillo, Fabrizio
2018-01-01
Identifying hierarchies and rankings of nodes in directed graphs is fundamental in many applications such as social network analysis, biology, economics, and finance. A recently proposed method identifies the hierarchy by finding the ordered partition of nodes which minimises a score function, termed agony. This function penalises the links violating the hierarchy in a way depending on the strength of the violation. To investigate the resolution of ranking hierarchies we introduce an ensemble of random graphs, the Ranked Stochastic Block Model. We find that agony may fail to identify hierarchies when the structure is not strong enough and the size of the classes is small with respect to the whole network. We analytically characterise the resolution threshold and we show that an iterated version of agony can partly overcome this resolution limit. PMID:29394278
Hollenbeak, Christopher S
2005-10-15
While risk-adjusted outcomes are often used to compare the performance of hospitals and physicians, the most appropriate functional form for the risk adjustment process is not always obvious for continuous outcomes such as costs. Semi-log models are used most often to correct skewness in cost data, but there has been limited research to determine whether the log transformation is sufficient or whether another transformation is more appropriate. This study explores the most appropriate functional form for risk-adjusting the cost of coronary artery bypass graft (CABG) surgery. Data included patients undergoing CABG surgery at four hospitals in the midwest and were fit to a Box-Cox model with random coefficients (BCRC) using Markov chain Monte Carlo methods. Marginal likelihoods and Bayes factors were computed to perform model comparison of alternative model specifications. Rankings of hospital performance were created from the simulation output and the rankings produced by Bayesian estimates were compared to rankings produced by standard models fit using classical methods. Results suggest that, for these data, the most appropriate functional form is not logarithmic, but corresponds to a Box-Cox transformation of -1. Furthermore, Bayes factors overwhelmingly rejected the natural log transformation. However, the hospital ranking induced by the BCRC model was not different from the ranking produced by maximum likelihood estimates of either the linear or semi-log model. Copyright (c) 2005 John Wiley & Sons, Ltd.
Multiple graph regularized protein domain ranking.
Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin
2012-11-19
Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.
Multiple graph regularized protein domain ranking
2012-01-01
Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. PMID:23157331
Predictive teratology: teratogenic risk-hazard identification partnered in the discovery process.
Augustine-Rauch, K A
2008-11-01
Unexpected teratogenicity is ranked as one of the most prevalent causes for toxicity-related attrition of drug candidates. Without proactive assessment, the liability tends to be identified relatively late in drug development, following significant investment in compound and engagement in pre clinical and clinical studies. When unexpected teratogenicity occurs in pre-clinical development, three principle questions arise: Can clinical trials that include women of child bearing populations be initiated? Will all compounds in this pharmacological class produce the same liability? Could this effect be related to the chemical structure resulting in undesirable off-target adverse effects? The first question is typically addressed at the time of the unexpected finding and involves considering the nature of the teratogenicity, whether or not maternal toxicity could have had a role in onset, human exposure margins and therapeutic indication. The latter two questions can be addressed proactively, earlier in the discovery process as drug target profiling and lead compound optimization is taking place. Such proactive approaches include thorough assessment of the literature for identification of potential liabilities and follow-up work that can be conducted on the level of target expression and functional characterization using molecular biology and developmental model systems. Developmental model systems can also be applied in the form of in vitro teratogenicity screens, and show potential for effective hazard identification or issue resolution on the level of characterizing teratogenic mechanism. This review discusses approaches that can be applied for proactive assessment of compounds for teratogenic liability.
Weaver, K. E.; Chaovalitwongse, W. A.; Novotny, E. J.; Poliakov, A.; Grabowski, T. G.; Ojemann, J. G.
2013-01-01
Successful resection of cortical tissue engendering seizure activity is efficacious for the treatment of refractory, focal epilepsy. The pre-operative localization of the seizure focus is therefore critical to yielding positive, post-operative outcomes. In a small proportion of focal epilepsy patients presenting with normal MRI, identification of the seizure focus is significantly more challenging. We examined the capacity of resting state functional MRI (rsfMRI) to identify the seizure focus in a group of four non-lesion, focal (NLF) epilepsy individuals. We predicted that computing patterns of local functional connectivity in and around the epileptogenic zone combined with a specific reference to the corresponding region within the contralateral hemisphere would reliably predict the location of the seizure focus. We first averaged voxel-wise regional homogeneity (ReHo) across regions of interest (ROIs) from a standardized, probabilistic atlas for each NLF subject as well as 16 age- and gender-matched controls. To examine contralateral effects, we computed a ratio of the mean pair-wise correlations of all voxels within a ROI with the corresponding contralateral region (IntraRegional Connectivity – IRC). For each subject, ROIs were ranked (from lowest to highest) on ReHo, IRC, and the mean of the two values. At the group level, we observed a significant decrease in the rank for ROI harboring the seizure focus for the ReHo rankings as well as for the mean rank. At the individual level, the seizure focus ReHo rank was within bottom 10% lowest ranked ROIs for all four NLF epilepsy patients and three out of the four for the IRC rankings. However, when the two ranks were combined (averaging across ReHo and IRC ranks and scalars), the seizure focus ROI was either the lowest or second lowest ranked ROI for three out of the four epilepsy subjects. This suggests that rsfMRI may serve as an adjunct pre-surgical tool, facilitating the identification of the seizure focus in focal epilepsy. PMID:23641233
ERIC Educational Resources Information Center
Chung, King; Killion, Mead C.; Christensen, Laurel A.
2007-01-01
Purpose: To determine the rankings of 6 input-output functions for understanding low-level, conversational, and high-level speech in multitalker babble without manipulating volume control for listeners with normal hearing, flat sensorineural hearing loss, and mildly sloping sensorineural hearing loss. Method: Peak clipping, compression limiting,…
Blumstein, Daniel T; Chung, Lawrance K; Smith, Jennifer E
2013-05-22
Play has been defined as apparently functionless behaviour, yet since play is costly, models of adaptive evolution predict that it should have some beneficial function (or functions) that outweigh its costs. We provide strong evidence for a long-standing, but poorly supported hypothesis: that early social play is practice for later dominance relationships. We calculated the relative dominance rank by observing the directional outcome of playful interactions in juvenile and yearling yellow-bellied marmots (Marmota flaviventris) and found that these rank relationships were correlated with later dominance ranks calculated from agonistic interactions, however, the strength of this relationship attenuated over time. While play may have multiple functions, one of them may be to establish later dominance relationships in a minimally costly way.
Mining dark information resources to develop new informatics capabilities to support science
NASA Astrophysics Data System (ADS)
Ramachandran, Rahul; Maskey, Manil; Bugbee, Kaylin
2016-04-01
Dark information resources are digital resources that organizations collect, process, and store for regular business or operational activities but fail to realize their potential for other purposes. The challenge for any organization is to recognize, identify and effectively exploit these dark information stores. Metadata catalogs at different data centers store dark information resources consisting of structured information, free form descriptions of data and browse images. These information resources are never fully exploited beyond a few fields used for search and discovery. For example, the NASA Earth science catalog holds greater than 6000 data collections, 127 million records for individual files and 67 million browse images. We believe that the information contained in the metadata catalogs and the browse images can be utilized beyond their original design intent to provide new data discovery and exploration pathways to support science and education communities. In this paper we present two research applications using information stored in the metadata catalog in a completely novel way. The first application is designing a data curation service. The objective of the data curation service is to augment the existing data search capabilities. Given a specific atmospheric phenomenon, the data curation service returns the user a ranked list of relevant data sets. Different fields in the metadata records including textual descriptions are mined. A specialized relevancy ranking algorithm has been developed that uses a "bag of words" to define phenomena along with an ensemble of known approaches such as the Jaccard Coefficient, Cosine Similarity and Zone ranking to rank the data sets. This approach is also extended to map from the data set level to data file variable level. The second application is focused on providing a service where a user can search and discover browse images containing specific phenomena from the vast catalog. This service will aid researchers in uncovering interesting event in the data for case study analysis. The challenge of this second application is to bridge the semantic gap between the low level image pixel values and the semantic concept perceived by a user when he or she sees an image. A deep learning algorithm, specifically the Convolution Neural Network (CNN), has been trained and tested to identify three types of Earth science phenomena - Hurricanes, Dust, and Smoke/Haze in MODIS imagery. Latest results from both the applications will be presented in this paper.
Functional complexity and ecosystem stability: an experimental approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Van Voris, P.; O'Neill, R.V.; Shugart, H.H.
1978-01-01
The complexity-stability hypothesis was experimentally tested using intact terrestrial microcosms. Functional complexity was defined as the number and significance of component interactions (i.e., population interactions, physical-chemical reactions, biological turnover rates) influenced by nonlinearities, feedbacks, and time delays. It was postulated that functional complexity could be nondestructively measured through analysis of a signal generated from the system. Power spectral analysis of hourly CO/sub 2/ efflux, from eleven old-field microcosms, was analyzed for the number of low frequency peaks and used to rank the functional complexity of each system. Ranking of ecosystem stability was based on the capacity of the system tomore » retain essential nutrients and was measured by net loss of Ca after the system was stressed. Rank correlation supported the hypothesis that increasing ecosystem functional complexity leads to increasing ecosystem stability. The results indicated that complex functional dynamics can serve to stabilize the system. The results also demonstrated that microcosms are useful tools for system-level investigations.« less
1982-09-01
21 functions. The legal category covers the business, commer- cial and contract law fields, including patents and royalties, technical data, claims...Course Rankings Question # Subject Area Median Rank 55,56,59 Contract Management Theory 7.0 1 53,54 Contract Law 6.5 2 So Contracting & Acquis. Mgt...respondents ranked Contract Management Theory as the most useful course among all courses in the AFIT CAM curriculum. Graduates ranked Contract Law as
2016-01-01
A mere hyperbolic law, like the Zipf’s law power function, is often inadequate to describe rank-size relationships. An alternative theoretical distribution is proposed based on theoretical physics arguments starting from the Yule-Simon distribution. A modeling is proposed leading to a universal form. A theoretical suggestion for the “best (or optimal) distribution”, is provided through an entropy argument. The ranking of areas through the number of cities in various countries and some sport competition ranking serves for the present illustrations. PMID:27812192
PageRank and rank-reversal dependence on the damping factor
NASA Astrophysics Data System (ADS)
Son, S.-W.; Christensen, C.; Grassberger, P.; Paczuski, M.
2012-12-01
PageRank (PR) is an algorithm originally developed by Google to evaluate the importance of web pages. Considering how deeply rooted Google's PR algorithm is to gathering relevant information or to the success of modern businesses, the question of rank stability and choice of the damping factor (a parameter in the algorithm) is clearly important. We investigate PR as a function of the damping factor d on a network obtained from a domain of the World Wide Web, finding that rank reversal happens frequently over a broad range of PR (and of d). We use three different correlation measures, Pearson, Spearman, and Kendall, to study rank reversal as d changes, and we show that the correlation of PR vectors drops rapidly as d changes from its frequently cited value, d0=0.85. Rank reversal is also observed by measuring the Spearman and Kendall rank correlation, which evaluate relative ranks rather than absolute PR. Rank reversal happens not only in directed networks containing rank sinks but also in a single strongly connected component, which by definition does not contain any sinks. We relate rank reversals to rank pockets and bottlenecks in the directed network structure. For the network studied, the relative rank is more stable by our measures around d=0.65 than at d=d0.
Appearance is a function of the face.
Borah, Gregory L; Rankin, Marlene K
2010-03-01
Increasingly, third-party insurers deny coverage to patients with posttraumatic and congenital facial deformities because these are not seen as "functional." Recent facial transplants have demonstrated that severely deformed patients are willing to undergo potentially life-threatening surgery in search of a normal physiognomy. Scant quantitative research exists that objectively documents appearance as a primary "function" of the face. This study was designed to establish a population-based definition of the functions of the human face, rank importance of the face among various anatomical areas, and determine the risk value the average person places on a normal appearance. Voluntary adult subjects (n = 210) in three states aged 18 to 75 years were recruited using a quota sampling technique. Subjects completed study questionnaires of demography and bias using the Gamble Chance of Death Questionnaire and the Rosenberg Self-Esteem Scale. The face ranked as the most important anatomical area for functional reconstruction. Appearance was the fifth most important function of the face, after breathing, sight, speech, and eating. Normal facial appearance was rated as very important for one to be a functioning member of American society (p = 0.01) by 49 percent. One in seven subjects (13 percent) would accept a 30 to 45 percent risk of death to obtain a "normal" face. Normal appearance is a primary function of the face, based on a large, culturally diverse population sample across the lifespan. Normal appearance ranks above smell and expression as a function. Restoration of facial appearance is ranked the most important anatomical area for repair. Normal facial appearance is very important for one to be a functional member of American society.
A Gaussian-based rank approximation for subspace clustering
NASA Astrophysics Data System (ADS)
Xu, Fei; Peng, Chong; Hu, Yunhong; He, Guoping
2018-04-01
Low-rank representation (LRR) has been shown successful in seeking low-rank structures of data relationships in a union of subspaces. Generally, LRR and LRR-based variants need to solve the nuclear norm-based minimization problems. Beyond the success of such methods, it has been widely noted that the nuclear norm may not be a good rank approximation because it simply adds all singular values of a matrix together and thus large singular values may dominant the weight. This results in far from satisfactory rank approximation and may degrade the performance of lowrank models based on the nuclear norm. In this paper, we propose a novel nonconvex rank approximation based on the Gaussian distribution function, which has demanding properties to be a better rank approximation than the nuclear norm. Then a low-rank model is proposed based on the new rank approximation with application to motion segmentation. Experimental results have shown significant improvements and verified the effectiveness of our method.
Pooled genome wide association detects association upstream of FCRL3 with Graves' disease.
Khong, Jwu Jin; Burdon, Kathryn P; Lu, Yi; Laurie, Kate; Leonardos, Lefta; Baird, Paul N; Sahebjada, Srujana; Walsh, John P; Gajdatsy, Adam; Ebeling, Peter R; Hamblin, Peter Shane; Wong, Rosemary; Forehan, Simon P; Fourlanos, Spiros; Roberts, Anthony P; Doogue, Matthew; Selva, Dinesh; Montgomery, Grant W; Macgregor, Stuart; Craig, Jamie E
2016-11-18
Graves' disease is an autoimmune thyroid disease of complex inheritance. Multiple genetic susceptibility loci are thought to be involved in Graves' disease and it is therefore likely that these can be identified by genome wide association studies. This study aimed to determine if a genome wide association study, using a pooling methodology, could detect genomic loci associated with Graves' disease. Nineteen of the top ranking single nucleotide polymorphisms including HLA-DQA1 and C6orf10, were clustered within the Major Histo-compatibility Complex region on chromosome 6p21, with rs1613056 reaching genome wide significance (p = 5 × 10 -8 ). Technical validation of top ranking non-Major Histo-compatablity complex single nucleotide polymorphisms with individual genotyping in the discovery cohort revealed four single nucleotide polymorphisms with p ≤ 10 -4 . Rs17676303 on chromosome 1q23.1, located upstream of FCRL3, showed evidence of association with Graves' disease across the discovery, replication and combined cohorts. A second single nucleotide polymorphism rs9644119 downstream of DPYSL2 showed some evidence of association supported by finding in the replication cohort that warrants further study. Pooled genome wide association study identified a genetic variant upstream of FCRL3 as a susceptibility locus for Graves' disease in addition to those identified in the Major Histo-compatibility Complex. A second locus downstream of DPYSL2 is potentially a novel genetic variant in Graves' disease that requires further confirmation.
Equity weights in the allocation of health care: the rank-dependent QALY model.
Bleichrodt, Han; Diecidue, Enrico; Quiggin, John
2004-01-01
This paper introduces the rank-dependent quality-adjusted life-years (QALY) model, a new method to aggregate QALYs in economic evaluations of health care. The rank-dependent QALY model permits the formalization of influential concepts of equity in the allocation of health care, such as the fair innings approach, and it includes as special cases many of the social welfare functions that have been proposed in the literature. An important advantage of the rank-dependent QALY model is that it offers a straightforward procedure to estimate equity weights for QALYs. We characterize the rank-dependent QALY model and argue that its central condition has normative appeal.
LCK rank of locally conformally Kähler manifolds with potential
NASA Astrophysics Data System (ADS)
Ornea, Liviu; Verbitsky, Misha
2016-09-01
An LCK manifold with potential is a quotient of a Kähler manifold X equipped with a positive Kähler potential f, such that the monodromy group acts on X by holomorphic homotheties and multiplies f by a character. The LCK rank is the rank of the image of this character, considered as a function from the monodromy group to real numbers. We prove that an LCK manifold with potential can have any rank between 1 and b1(M) . Moreover, LCK manifolds with proper potential (ones with rank 1) are dense. Two errata to our previous work are given in the last section.
Advances in Using Opensearch for Earth Science Data Discovery and Interoperability
NASA Astrophysics Data System (ADS)
Newman, D. J.; Mitchell, A. E.
2014-12-01
As per www.opensearch.org: OpenSearch is a collection of simple formats for the sharing of search results A number of organizations (NASA, ESA, CEOS) have began to adopt this standard as a means of allowing both the discovery of earth science data and the aggregation of results from disparate data archives. OpenSearch has proven to be simpler and more effective at achieving these goals than previous efforts (Catalog Service for the web for example). This talk will outline: The basic ideas behind OpenSearch The ways in which we have extended the basic specification to accomodate the Earth Science use case (two-step searching, relevancy ranking, facets) A case-study of the above in action (CWICSmart + IDN OpenSearch + CWIC OpenSearch) The potential for interoperability this simple standard affords A discussion of where we can go in the future
In-silico guided discovery of novel CCR9 antagonists
NASA Astrophysics Data System (ADS)
Zhang, Xin; Cross, Jason B.; Romero, Jan; Heifetz, Alexander; Humphries, Eric; Hall, Katie; Wu, Yuchuan; Stucka, Sabrina; Zhang, Jing; Chandonnet, Haoqun; Lippa, Blaise; Ryan, M. Dominic; Baber, J. Christian
2018-03-01
Antagonism of CCR9 is a promising mechanism for treatment of inflammatory bowel disease, including ulcerative colitis and Crohn's disease. There is limited experimental data on CCR9 and its ligands, complicating efforts to identify new small molecule antagonists. We present here results of a successful virtual screening and rational hit-to-lead campaign that led to the discovery and initial optimization of novel CCR9 antagonists. This work uses a novel data fusion strategy to integrate the output of multiple computational tools, such as 2D similarity search, shape similarity, pharmacophore searching, and molecular docking, as well as the identification and incorporation of privileged chemokine fragments. The application of various ranking strategies, which combined consensus and parallel selection methods to achieve a balance of enrichment and novelty, resulted in 198 virtual screening hits in total, with an overall hit rate of 18%. Several hits were developed into early leads through targeted synthesis and purchase of analogs.
Turk, Samo; Kovac, Andreja; Boniface, Audrey; Bostock, Julieanne M; Chopra, Ian; Blanot, Didier; Gobec, Stanislav
2009-03-01
The ATP-dependent Mur ligases (MurC, MurD, MurE and MurF) successively add L-Ala, D-Glu, meso-A(2)pm or L-Lys, and D-Ala-D-Ala to the nucleotide precursor UDP-MurNAc, and they represent promising targets for antibacterial drug discovery. We have used the molecular docking programme eHiTS for the virtual screening of 1990 compounds from the National Cancer Institute 'Diversity Set' on MurD and MurF. The 50 top-scoring compounds from screening on each enzyme were selected for experimental biochemical evaluation. Our approach of virtual screening and subsequent in vitro biochemical evaluation of the best ranked compounds has provided four novel MurD inhibitors (best IC(50)=10 microM) and one novel MurF inhibitor (IC(50)=63 microM).
Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong
2011-09-01
One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.
Socioecological predictors of immune defences in wild spotted hyenas
Flies, Andrew S.; Mansfield, Linda S.; Flies, Emily J.; Grant, Chris K.; Holekamp, Kay E.
2016-01-01
Summary Social rank can profoundly affect many aspects of mammalian reproduction and stress physiology, but little is known about how immune function is affected by rank and other socio-ecological factors in free-living animals.In this study we examine the effects of sex, social rank, and reproductive status on immune function in long-lived carnivores that are routinely exposed to a plethora of pathogens, yet rarely show signs of disease.Here we show that two types of immune defenses, complement-mediated bacterial killing capacity (BKC) and total IgM, are positively correlated with social rank in wild hyenas, but that a third type, total IgG, does not vary with rank.Female spotted hyenas, which are socially dominant to males in this species, have higher BKC, and higher IgG and IgM concentrations, than do males.Immune defenses are lower in lactating than pregnant females, suggesting the immune defenses may be energetically costly.Serum cortisol and testosterone concentrations are not reliable predictors of basic immune defenses in wild female spotted hyenas.These results suggest that immune defenses are costly and multiple socioecological variables are important determinants of basic immune defenses among wild hyenas. Effects of these variables should be accounted for when attempting to understand disease ecology and immune function. PMID:27833242
Blumstein, Daniel T.; Chung, Lawrance K.; Smith, Jennifer E.
2013-01-01
Play has been defined as apparently functionless behaviour, yet since play is costly, models of adaptive evolution predict that it should have some beneficial function (or functions) that outweigh its costs. We provide strong evidence for a long-standing, but poorly supported hypothesis: that early social play is practice for later dominance relationships. We calculated the relative dominance rank by observing the directional outcome of playful interactions in juvenile and yearling yellow-bellied marmots (Marmota flaviventris) and found that these rank relationships were correlated with later dominance ranks calculated from agonistic interactions, however, the strength of this relationship attenuated over time. While play may have multiple functions, one of them may be to establish later dominance relationships in a minimally costly way. PMID:23536602
NASA Astrophysics Data System (ADS)
Doerr, Timothy P.; Alves, Gelio; Yu, Yi-Kuo
2005-08-01
Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions-find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks-peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.
Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.
Hu, Qin; Victor, Jonathan D
2016-09-01
Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.
Code of Federal Regulations, 2010 CFR
2010-07-01
... paragraphs (a) or (b) of this section, the ranking deputy (or an equivalent official) in such unit who is... directs otherwise. Except as otherwise provided by law, if there is no ranking deputy available, the... designate the ranking deputy (or an equivalent official) in the unit who is available to act as head. If...
RNA interference for functional genomics and improvement of cotton (Gossypium species)
USDA-ARS?s Scientific Manuscript database
RNA interference (RNAi), is a powerful new technology in the discovery of genetic sequence functions, and has become a valuable tool for functional genomics of cotton (Gossypium ssp.). The rapid adoption of RNAi has replaced previous antisense technology. RNAi has aided in the discovery of function ...
Gonçalves, Joana P; Aires, Ricardo S; Francisco, Alexandre P; Madeira, Sara C
2012-01-01
Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots.
Gonçalves, Joana P.; Aires, Ricardo S.; Francisco, Alexandre P.; Madeira, Sara C.
2012-01-01
Explaining regulatory mechanisms is crucial to understand complex cellular responses leading to system perturbations. Some strategies reverse engineer regulatory interactions from experimental data, while others identify functional regulatory units (modules) under the assumption that biological systems yield a modular organization. Most modular studies focus on network structure and static properties, ignoring that gene regulation is largely driven by stimulus-response behavior. Expression time series are key to gain insight into dynamics, but have been insufficiently explored by current methods, which often (1) apply generic algorithms unsuited for expression analysis over time, due to inability to maintain the chronology of events or incorporate time dependency; (2) ignore local patterns, abundant in most interesting cases of transcriptional activity; (3) neglect physical binding or lack automatic association of regulators, focusing mainly on expression patterns; or (4) limit the discovery to a predefined number of modules. We propose Regulatory Snapshots, an integrative mining approach to identify regulatory modules over time by combining transcriptional control with response, while overcoming the above challenges. Temporal biclustering is first used to reveal transcriptional modules composed of genes showing coherent expression profiles over time. Personalized ranking is then applied to prioritize prominent regulators targeting the modules at each time point using a network of documented regulatory associations and the expression data. Custom graphics are finally depicted to expose the regulatory activity in a module at consecutive time points (snapshots). Regulatory Snapshots successfully unraveled modules underlying yeast response to heat shock and human epithelial-to-mesenchymal transition, based on regulations documented in the YEASTRACT and JASPAR databases, respectively, and available expression data. Regulatory players involved in functionally enriched processes related to these biological events were identified. Ranking scores further suggested ability to discern the primary role of a gene (target or regulator). Prototype is available at: http://kdbio.inesc-id.pt/software/regulatorysnapshots. PMID:22563474
Bayesian Inference of Natural Rankings in Incomplete Competition Networks
Park, Juyong; Yook, Soon-Hyung
2014-01-01
Competition between a complex system's constituents and a corresponding reward mechanism based on it have profound influence on the functioning, stability, and evolution of the system. But determining the dominance hierarchy or ranking among the constituent parts from the strongest to the weakest – essential in determining reward and penalty – is frequently an ambiguous task due to the incomplete (partially filled) nature of competition networks. Here we introduce the “Natural Ranking,” an unambiguous ranking method applicable to a round robin tournament, and formulate an analytical model based on the Bayesian formula for inferring the expected mean and error of the natural ranking of nodes from an incomplete network. We investigate its potential and uses in resolving important issues of ranking by applying it to real-world competition networks. PMID:25163528
Bayesian Inference of Natural Rankings in Incomplete Competition Networks
NASA Astrophysics Data System (ADS)
Park, Juyong; Yook, Soon-Hyung
2014-08-01
Competition between a complex system's constituents and a corresponding reward mechanism based on it have profound influence on the functioning, stability, and evolution of the system. But determining the dominance hierarchy or ranking among the constituent parts from the strongest to the weakest - essential in determining reward and penalty - is frequently an ambiguous task due to the incomplete (partially filled) nature of competition networks. Here we introduce the ``Natural Ranking,'' an unambiguous ranking method applicable to a round robin tournament, and formulate an analytical model based on the Bayesian formula for inferring the expected mean and error of the natural ranking of nodes from an incomplete network. We investigate its potential and uses in resolving important issues of ranking by applying it to real-world competition networks.
Maryland's high cancer mortality rate: a review of contributing demographic factors.
Freedman, D M
1999-01-01
For many years, Maryland has ranked among the top states in cancer mortality. This study analyzed mortality data from the National Center for Health Statistics (CDC-Wonder) to help explain Maryland's cancer rate and rank. Age-adjusted rates are based on deaths per 100,000 population from 1991 through 1995. Rates and ranks overall, and stratified by age, are calculated for total cancer mortality, as well as for four major sites: lung, breast, prostate, and colorectal. Because states differ in their racial/gender mix, race/gender rates among states are also compared. Although Maryland ranks seventh in overall cancer mortality, its rates and rank by race and gender subpopulation are less high. For those under 75, white men ranked 26th, black men ranked 20th, and black and white women ranked 12th and 10th, respectively. Maryland's overall rank, as with any state, is a function of the rates of its racial and gender subpopulations and the relative size of these groups in the state. Many of the disparities between Maryland's overall high cancer rank and its lower rank by subpopulation also characterize the major cancer sites. Although a stratified presentation of cancer rates and ranks may be more favorable to Maryland, it should not be used to downplay the attention cancer mortality in Maryland deserves.
Amini, Ata; Shrimpton, Paul J; Muggleton, Stephen H; Sternberg, Michael J E
2007-12-01
Despite the increased recent use of protein-ligand and protein-protein docking in the drug discovery process due to the increases in computational power, the difficulty of accurately ranking the binding affinities of a series of ligands or a series of proteins docked to a protein receptor remains largely unsolved. This problem is of major concern in lead optimization procedures and has lead to the development of scoring functions tailored to rank the binding affinities of a series of ligands to a specific system. However, such methods can take a long time to develop and their transferability to other systems remains open to question. Here we demonstrate that given a suitable amount of background information a new approach using support vector inductive logic programming (SVILP) can be used to produce system-specific scoring functions. Inductive logic programming (ILP) learns logic-based rules for a given dataset that can be used to describe properties of each member of the set in a qualitative manner. By combining ILP with support vector machine regression, a quantitative set of rules can be obtained. SVILP has previously been used in a biological context to examine datasets containing a series of singular molecular structures and properties. Here we describe the use of SVILP to produce binding affinity predictions of a series of ligands to a particular protein. We also for the first time examine the applicability of SVILP techniques to datasets consisting of protein-ligand complexes. Our results show that SVILP performs comparably with other state-of-the-art methods on five protein-ligand systems as judged by similar cross-validated squares of their correlation coefficients. A McNemar test comparing SVILP to CoMFA and CoMSIA across the five systems indicates our method to be significantly better on one occasion. The ability to graphically display and understand the SVILP-produced rules is demonstrated and this feature of ILP can be used to derive hypothesis for future ligand design in lead optimization procedures. The approach can readily be extended to evaluate the binding affinities of a series of protein-protein complexes. (c) 2007 Wiley-Liss, Inc.
Entitymetrics: Measuring the Impact of Entities
Ding, Ying; Song, Min; Han, Jia; Yu, Qi; Yan, Erjia; Lin, Lili; Chambers, Tamy
2013-01-01
This paper proposes entitymetrics to measure the impact of knowledge units. Entitymetrics highlight the importance of entities embedded in scientific literature for further knowledge discovery. In this paper, we use Metformin, a drug for diabetes, as an example to form an entity-entity citation network based on literature related to Metformin. We then calculate the network features and compare the centrality ranks of biological entities with results from Comparative Toxicogenomics Database (CTD). The comparison demonstrates the usefulness of entitymetrics to detect most of the outstanding interactions manually curated in CTD. PMID:24009660
Rank-preserving regression: a more robust rank regression model against outliers.
Chen, Tian; Kowalski, Jeanne; Chen, Rui; Wu, Pan; Zhang, Hui; Feng, Changyong; Tu, Xin M
2016-08-30
Mean-based semi-parametric regression models such as the popular generalized estimating equations are widely used to improve robustness of inference over parametric models. Unfortunately, such models are quite sensitive to outlying observations. The Wilcoxon-score-based rank regression (RR) provides more robust estimates over generalized estimating equations against outliers. However, the RR and its extensions do not sufficiently address missing data arising in longitudinal studies. In this paper, we propose a new approach to address outliers under a different framework based on the functional response models. This functional-response-model-based alternative not only addresses limitations of the RR and its extensions for longitudinal data, but, with its rank-preserving property, even provides more robust estimates than these alternatives. The proposed approach is illustrated with both real and simulated data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Chie, Wei-Chu; Blazeby, Jane M; Hsiao, Chin-Fu; Chiu, Herng-Chia; Poon, Ronnie T; Mikoshiba, Naoko; Al-Kadhim, Gillian; Heaton, Nigel; Calara, Jozer; Collins, Peter; Caddick, Katharine; Costantini, Anna; Vilgrain, Valerie
2017-10-01
The aim of this study is to explore the possible effects of clinical and cultural characteristics of hepatocellular carcinoma on patients' health-related quality of life (HRQoL). Patients with hepatocellular carcinoma from Asian and European countries completed the EORTC QLQ-C30 and the EORTC QLQ-HCC18. Comparisons were made using Student's t-test and Wilcoxon rank-sum test with method of false discovery to correct multiple comparisons. Multiway analysis of variance and model selection were used to assess the effects of clinical characteristics and geographic areas. Two hundred and twenty-seven patients with hepatocellular carcinoma completed questionnaires. After adjusting for demographic and clinical characteristics, Asian patients still had significantly better HRQoL scores in emotional functioning, insomnia, (QLQ-C30) and in sexual interest (QLQ-HCC18). We also found an interaction in physical functioning (QLQ-C30) and fatigue (QLQ-HCC18) between geographic region and marital status, married European had worse HRQoL scores than Asian singles. Both clinical characteristics and geographic areas affected the HRQoL in with hepatocellular carcinoma. Cultural differences and clinical differences in the pattern of disease due to active surveillance of Asian countries may explain the results. © 2016 John Wiley & Sons Australia, Ltd.
An Efficient Rank Based Approach for Closest String and Closest Substring
2012-01-01
This paper aims to present a new genetic approach that uses rank distance for solving two known NP-hard problems, and to compare rank distance with other distance measures for strings. The two NP-hard problems we are trying to solve are closest string and closest substring. For each problem we build a genetic algorithm and we describe the genetic operations involved. Both genetic algorithms use a fitness function based on rank distance. We compare our algorithms with other genetic algorithms that use different distance measures, such as Hamming distance or Levenshtein distance, on real DNA sequences. Our experiments show that the genetic algorithms based on rank distance have the best results. PMID:22675483
Exponential Family Functional data analysis via a low-rank model.
Li, Gen; Huang, Jianhua Z; Shen, Haipeng
2018-05-08
In many applications, non-Gaussian data such as binary or count are observed over a continuous domain and there exists a smooth underlying structure for describing such data. We develop a new functional data method to deal with this kind of data when the data are regularly spaced on the continuous domain. Our method, referred to as Exponential Family Functional Principal Component Analysis (EFPCA), assumes the data are generated from an exponential family distribution, and the matrix of the canonical parameters has a low-rank structure. The proposed method flexibly accommodates not only the standard one-way functional data, but also two-way (or bivariate) functional data. In addition, we introduce a new cross validation method for estimating the latent rank of a generalized data matrix. We demonstrate the efficacy of the proposed methods using a comprehensive simulation study. The proposed method is also applied to a real application of the UK mortality study, where data are binomially distributed and two-way functional across age groups and calendar years. The results offer novel insights into the underlying mortality pattern. © 2018, The International Biometric Society.
Lachin, John M
2011-11-10
The power of a chi-square test, and thus the required sample size, are a function of the noncentrality parameter that can be obtained as the limiting expectation of the test statistic under an alternative hypothesis specification. Herein, we apply this principle to derive simple expressions for two tests that are commonly applied to discrete ordinal data. The Wilcoxon rank sum test for the equality of distributions in two groups is algebraically equivalent to the Mann-Whitney test. The Kruskal-Wallis test applies to multiple groups. These tests are equivalent to a Cochran-Mantel-Haenszel mean score test using rank scores for a set of C-discrete categories. Although various authors have assessed the power function of the Wilcoxon and Mann-Whitney tests, herein it is shown that the power of these tests with discrete observations, that is, with tied ranks, is readily provided by the power function of the corresponding Cochran-Mantel-Haenszel mean scores test for two and R > 2 groups. These expressions yield results virtually identical to those derived previously for rank scores and also apply to other score functions. The Cochran-Armitage test for trend assesses whether there is an monotonically increasing or decreasing trend in the proportions with a positive outcome or response over the C-ordered categories of an ordinal independent variable, for example, dose. Herein, it is shown that the power of the test is a function of the slope of the response probabilities over the ordinal scores assigned to the groups that yields simple expressions for the power of the test. Copyright © 2011 John Wiley & Sons, Ltd.
Männel, Barbara; Jaiteh, Mariama; Zeifman, Alexey; Randakova, Alena; Möller, Dorothee; Hübner, Harald; Gmeiner, Peter; Carlsson, Jens
2017-10-20
Functionally selective ligands stabilize conformations of G protein-coupled receptors (GPCRs) that induce a preference for signaling via a subset of the intracellular pathways activated by the endogenous agonists. The possibility to fine-tune the functional activity of a receptor provides opportunities to develop drugs that selectively signal via pathways associated with a therapeutic effect and avoid those causing side effects. Animal studies have indicated that ligands displaying functional selectivity at the D 2 dopamine receptor (D 2 R) could be safer and more efficacious drugs against neuropsychiatric diseases. In this work, computational design of functionally selective D 2 R ligands was explored using structure-based virtual screening. Molecular docking of known functionally selective ligands to a D 2 R homology model indicated that such compounds were anchored by interactions with the orthosteric site and extended into a common secondary pocket. A tailored virtual library with close to 13 000 compounds bearing 2,3-dichlorophenylpiperazine, a privileged orthosteric scaffold, connected to diverse chemical moieties via a linker was docked to the D 2 R model. Eighteen top-ranked compounds that occupied both the orthosteric and allosteric site were synthesized, leading to the discovery of 16 partial agonists. A majority of the ligands had comparable maximum effects in the G protein and β-arrestin recruitment assays, but a subset displayed preference for a single pathway. In particular, compound 4 stimulated β-arrestin recruitment (EC 50 = 320 nM, E max = 16%) but had no detectable G protein signaling. The use of structure-based screening and virtual libraries to discover GPCR ligands with tailored functional properties will be discussed.
Damm-Ganamet, Kelly L; Bembenek, Scott D; Venable, Jennifer W; Castro, Glenda G; Mangelschots, Lieve; Peeters, Daniëlle C G; Mcallister, Heather M; Edwards, James P; Disepio, Daniel; Mirzadegan, Taraneh
2016-05-12
Here, we report a high-throughput virtual screening (HTVS) study using phosphoinositide 3-kinase (both PI3Kγ and PI3Kδ). Our initial HTVS results of the Janssen corporate database identified small focused libraries with hit rates at 50% inhibition showing a 50-fold increase over those from a HTS (high-throughput screen). Further, applying constraints based on "chemically intuitive" hydrogen bonds and/or positional requirements resulted in a substantial improvement in the hit rates (versus no constraints) and reduced docking time. While we find that docking scoring functions are not capable of providing a reliable relative ranking of a set of compounds, a prioritization of groups of compounds (e.g., low, medium, and high) does emerge, which allows for the chemistry efforts to be quickly focused on the most viable candidates. Thus, this illustrates that it is not always necessary to have a high correlation between a computational score and the experimental data to impact the drug discovery process.
Objective criteria ranking framework for renewable energy policy decisions in Nigeria
NASA Astrophysics Data System (ADS)
K, Nwofor O.; N, Dike V.
2016-08-01
We present a framework that seeks to improve the objectivity of renewable energy policy decisions in Nigeria. It consists of expert ranking of resource abundance, resource efficiency and resource environmental comfort in the choice of renewable energy options for large scale power generation. The rankings are converted to a more objective function called Resource Appraisal Function (RAF) using dependence operators derived from logical relationships amongst the various criteria. The preferred option is that with the highest average RAF coupled with the least RAF variance. The method can be extended to more options, more criteria, and more opinions and can be adapted for similar decisions in education, environment and health sectors.
NASA Astrophysics Data System (ADS)
Rai, Prashant; Sargsyan, Khachik; Najm, Habib; Hermes, Matthew R.; Hirata, So
2017-09-01
A new method is proposed for a fast evaluation of high-dimensional integrals of potential energy surfaces (PES) that arise in many areas of quantum dynamics. It decomposes a PES into a canonical low-rank tensor format, reducing its integral into a relatively short sum of products of low-dimensional integrals. The decomposition is achieved by the alternating least squares (ALS) algorithm, requiring only a small number of single-point energy evaluations. Therefore, it eradicates a force-constant evaluation as the hotspot of many quantum dynamics simulations and also possibly lifts the curse of dimensionality. This general method is applied to the anharmonic vibrational zero-point and transition energy calculations of molecules using the second-order diagrammatic vibrational many-body Green's function (XVH2) theory with a harmonic-approximation reference. In this application, high dimensional PES and Green's functions are both subjected to a low-rank decomposition. Evaluating the molecular integrals over a low-rank PES and Green's functions as sums of low-dimensional integrals using the Gauss-Hermite quadrature, this canonical-tensor-decomposition-based XVH2 (CT-XVH2) achieves an accuracy of 0.1 cm-1 or higher and nearly an order of magnitude speedup as compared with the original algorithm using force constants for water and formaldehyde.
Krämer, Andreas; Shah, Sohela; Rebres, Robert Anthony; Tang, Susan; Richards, Daniel Rene
2017-08-11
Next-generation sequencing is widely used to identify disease-causing variants in patients with rare genetic disorders. Identifying those variants from whole-genome or exome data can be both scientifically challenging and time consuming. A significant amount of time is spent on variant annotation, and interpretation. Fully or partly automated solutions are therefore needed to streamline and scale this process. We describe Phenotype Driven Ranking (PDR), an algorithm integrated into Ingenuity Variant Analysis, that uses observed patient phenotypes to prioritize diseases and genes in order to expedite causal-variant discovery. Our method is based on a network of phenotype-disease-gene relationships derived from the QIAGEN Knowledge Base, which allows for efficient computational association of phenotypes to implicated diseases, and also enables scoring and ranking. We have demonstrated the utility and performance of PDR by applying it to a number of clinical rare-disease cases, where the true causal gene was known beforehand. It is also shown that PDR compares favorably to a representative alternative tool.
OCEAN: Optimized Cross rEActivity estimatioN.
Czodrowski, Paul; Bolick, Wolf-Guido
2016-10-24
The prediction of molecular targets is highly beneficial during the drug discovery process, be it for off-target elucidation or deconvolution of phenotypic screens. Here, we present OCEAN, a target prediction tool exclusively utilizing publically available ChEMBL data. OCEAN uses a heuristics approach based on a validation set containing almost 1000 drug ← → target relationships. New ChEMBL data (ChEMBL20 as well as ChEMBL21) released after the validation was used for a prospective OCEAN performance check. The success rates of OCEAN to predict correctly the targets within the TOP10 ranks are 77% for recently marketed drugs and 62% for all new ChEMBL20 compounds and 51% for all new ChEMBL21 compounds. OCEAN is also capable of identifying polypharmacological compounds; the success rate for molecules simultaneously hitting at least two targets is 64% to be correctly predicted within the TOP10 ranks. The source code of OCEAN can be found at http://www.github.com/rdkit/OCEAN.
Peña, Alejandro; Del Carratore, Francesco; Cummings, Matthew; Takano, Eriko; Breitling, Rainer
2017-12-18
The rapid increase of publicly available microbial genome sequences has highlighted the presence of hundreds of thousands of biosynthetic gene clusters (BGCs) encoding valuable secondary metabolites. The experimental characterization of new BGCs is extremely laborious and struggles to keep pace with the in silico identification of potential BGCs. Therefore, the prioritisation of promising candidates among computationally predicted BGCs represents a pressing need. Here, we propose an output ordering and prioritisation system (OOPS) which helps sorting identified BGCs by a wide variety of custom-weighted biological and biochemical criteria in a flexible and user-friendly interface. OOPS facilitates a judicious prioritisation of BGCs using G+C content, coding sequence length, gene number, cluster self-similarity and codon bias parameters, as well as enabling the user to rank BGCs based upon BGC type, novelty, and taxonomic distribution. Effective prioritisation of BGCs will help to reduce experimental attrition rates and improve the breadth of bioactive metabolites characterized.
e-GRASP: an integrated evolutionary and GRASP resource for exploring disease associations.
Karim, Sajjad; NourEldin, Hend Fakhri; Abusamra, Heba; Salem, Nada; Alhathli, Elham; Dudley, Joel; Sanderford, Max; Scheinfeldt, Laura B; Chaudhary, Adeel G; Al-Qahtani, Mohammed H; Kumar, Sudhir
2016-10-17
Genome-wide association studies (GWAS) have become a mainstay of biological research concerned with discovering genetic variation linked to phenotypic traits and diseases. Both discrete and continuous traits can be analyzed in GWAS to discover associations between single nucleotide polymorphisms (SNPs) and traits of interest. Associations are typically determined by estimating the significance of the statistical relationship between genetic loci and the given trait. However, the prioritization of bona fide, reproducible genetic associations from GWAS results remains a central challenge in identifying genomic loci underlying common complex diseases. Evolutionary-aware meta-analysis of the growing GWAS literature is one way to address this challenge and to advance from association to causation in the discovery of genotype-phenotype relationships. We have created an evolutionary GWAS resource to enable in-depth query and exploration of published GWAS results. This resource uses the publically available GWAS results annotated in the GRASP2 database. The GRASP2 database includes results from 2082 studies, 177 broad phenotype categories, and ~8.87 million SNP-phenotype associations. For each SNP in e-GRASP, we present information from the GRASP2 database for convenience as well as evolutionary information (e.g., rate and timespan). Users can, therefore, identify not only SNPs with highly significant phenotype-association P-values, but also SNPs that are highly replicated and/or occur at evolutionarily conserved sites that are likely to be functionally important. Additionally, we provide an evolutionary-adjusted SNP association ranking (E-rank) that uses cross-species evolutionary conservation scores and population allele frequencies to transform P-values in an effort to enhance the discovery of SNPs with a greater probability of biologically meaningful disease associations. By adding an evolutionary dimension to the GWAS results available in the GRASP2 database, our e-GRASP resource will enable a more effective exploration of SNPs not only by the statistical significance of trait associations, but also by the number of studies in which associations have been replicated, and the evolutionary context of the associated mutations. Therefore, e-GRASP will be a valuable resource for aiding researchers in the identification of bona fide, reproducible genetic associations from GWAS results. This resource is freely available at http://www.mypeg.info/egrasp .
Thermolysis of phenethyl phenyl ether: A model of ether linkages in low rank coal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Britt, P.F.; Buchanan, A.C. III; Malcolm, E.A.
Currently, an area of interest and frustration for coal chemists has been the direct liquefaction of low rank coal. Although low rank coals are more reactive than bituminous coals, they are more difficult to liquefy and offer lower liquefaction yields under conditions optimized for bituminous coals. Solomon, Serio, and co-workers have shown that: in the pyrolysis and liquefaction of low rank coals, a low temperature cross-linking reaction associated with oxygen functional groups occurs before tar evolution. A variety of pretreatments (demineralization, alkylation, and ion-exchange) have been shown to reduce these retrogressive reactions and increase tar yields, but the actual chemicalmore » reactions responsible for these processes have not been defined. In order to gain insight into the thermochemical reactions leading to cross-linking in low rank coal, we have undertaken a study of the pyrolysis of oxygen containing coal model compounds. Solid state NMR studies suggest that the alkyl aryl ether linkage may be present in modest amounts in low rank coal. Therefore, in this paper, we will investigate the thermolysis of phenethyl phenyl ether (PPE) as a model of 0-aryl ether linkages found in low rank coal, lignites, and lignin, an evolutionary precursor of coal. Our results have uncovered a new reaction channel that can account for 25% of the products formed. The impact of reaction conditions, including restricted mass transport, on this new reaction pathway and the role of oxygen functional groups in cross-linking reactions will be investigated.« less
Salomon, Joshua A
2003-01-01
Background In survey studies on health-state valuations, ordinal ranking exercises often are used as precursors to other elicitation methods such as the time trade-off (TTO) or standard gamble, but the ranking data have not been used in deriving cardinal valuations. This study reconsiders the role of ordinal ranks in valuing health and introduces a new approach to estimate interval-scaled valuations based on aggregate ranking data. Methods Analyses were undertaken on data from a previously published general population survey study in the United Kingdom that included rankings and TTO values for hypothetical states described using the EQ-5D classification system. The EQ-5D includes five domains (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) with three possible levels on each. Rank data were analysed using a random utility model, operationalized through conditional logit regression. In the statistical model, probabilities of observed rankings were related to the latent utilities of different health states, modeled as a linear function of EQ-5D domain scores, as in previously reported EQ-5D valuation functions. Predicted valuations based on the conditional logit model were compared to observed TTO values for the 42 states in the study and to predictions based on a model estimated directly from the TTO values. Models were evaluated using the intraclass correlation coefficient (ICC) between predictions and mean observations, and the root mean squared error of predictions at the individual level. Results Agreement between predicted valuations from the rank model and observed TTO values was very high, with an ICC of 0.97, only marginally lower than for predictions based on the model estimated directly from TTO values (ICC = 0.99). Individual-level errors were also comparable in the two models, with root mean squared errors of 0.503 and 0.496 for the rank-based and TTO-based predictions, respectively. Conclusions Modeling health-state valuations based on ordinal ranks can provide results that are similar to those obtained from more widely analyzed valuation techniques such as the TTO. The information content in aggregate ranking data is not currently exploited to full advantage. The possibility of estimating cardinal valuations from ordinal ranks could also simplify future data collection dramatically and facilitate wider empirical study of health-state valuations in diverse settings and population groups. PMID:14687419
Brenner, Darren R.; Amos, Christopher I.; Brhane, Yonathan; Timofeeva, Maria N.; Caporaso, Neil; Wang, Yufei; Christiani, David C.; Bickeböller, Heike; Yang, Ping; Albanes, Demetrius; Stevens, Victoria L.; Gapstur, Susan; McKay, James; Boffetta, Paolo; Zaridze, David; Szeszenia-Dabrowska, Neonilia; Lissowska, Jolanta; Rudnai, Peter; Fabianova, Eleonora; Mates, Dana; Bencko, Vladimir; Foretova, Lenka; Janout, Vladimir; Krokan, Hans E.; Skorpen, Frank; Gabrielsen, Maiken E.; Vatten, Lars; Njølstad, Inger; Chen, Chu; Goodman, Gary; Lathrop, Mark; Vooder, Tõnu; Välk, Kristjan; Nelis, Mari; Metspalu, Andres; Broderick, Peter; Eisen, Timothy; Wu, Xifeng; Zhang, Di; Chen, Wei; Spitz, Margaret R.; Wei, Yongyue; Su, Li; Xie, Dong; She, Jun; Matsuo, Keitaro; Matsuda, Fumihiko; Ito, Hidemi; Risch, Angela; Heinrich, Joachim; Rosenberger, Albert; Muley, Thomas; Dienemann, Hendrik; Field, John K.; Raji, Olaide; Chen, Ying; Gosney, John; Liloglou, Triantafillos; Davies, Michael P.A.; Marcus, Michael; McLaughlin, John; Orlow, Irene; Han, Younghun; Li, Yafang; Zong, Xuchen; Johansson, Mattias; Liu, Geoffrey; Tworoger, Shelley S.; Le Marchand, Loic; Henderson, Brian E.; Wilkens, Lynne R.; Dai, Juncheng; Shen, Hongbing; Houlston, Richard S.; Landi, Maria T.; Brennan, Paul; Hung, Rayjean J.
2015-01-01
Large-scale genome-wide association studies (GWAS) have likely uncovered all common variants at the GWAS significance level. Additional variants within the suggestive range (0.0001> P > 5×10−8) are, however, still of interest for identifying causal associations. This analysis aimed to apply novel variant prioritization approaches to identify additional lung cancer variants that may not reach the GWAS level. Effects were combined across studies with a total of 33456 controls and 6756 adenocarcinoma (AC; 13 studies), 5061 squamous cell carcinoma (SCC; 12 studies) and 2216 small cell lung cancer cases (9 studies). Based on prior information such as variant physical properties and functional significance, we applied stratified false discovery rates, hierarchical modeling and Bayesian false discovery probabilities for variant prioritization. We conducted a fine mapping analysis as validation of our methods by examining top-ranking novel variants in six independent populations with a total of 3128 cases and 2966 controls. Three novel loci in the suggestive range were identified based on our Bayesian framework analyses: KCNIP4 at 4p15.2 (rs6448050, P = 4.6×10−7) and MTMR2 at 11q21 (rs10501831, P = 3.1×10−6) with SCC, as well as GAREM at 18q12.1 (rs11662168, P = 3.4×10−7) with AC. Use of our prioritization methods validated two of the top three loci associated with SCC (P = 1.05×10−4 for KCNIP4, represented by rs9799795) and AC (P = 2.16×10−4 for GAREM, represented by rs3786309) in the independent fine mapping populations. This study highlights the utility of using prior functional data for sequence variants in prioritization analyses to search for robust signals in the suggestive range. PMID:26363033
Pérez, Germán M; Salomón, Luis A; Montero-Cabrera, Luis A; de la Vega, José M García; Mascini, Marcello
2016-05-01
A novel heuristic using an iterative select-and-purge strategy is proposed. It combines statistical techniques for sampling and classification by rigid molecular docking through an inverse virtual screening scheme. This approach aims to the de novo discovery of short peptides that may act as docking receptors for small target molecules when there are no data available about known association complexes between them. The algorithm performs an unbiased stochastic exploration of the sample space, acting as a binary classifier when analyzing the entire peptides population. It uses a novel and effective criterion for weighting the likelihood of a given peptide to form an association complex with a particular ligand molecule based on amino acid sequences. The exploratory analysis relies on chemical information of peptides composition, sequence patterns, and association free energies (docking scores) in order to converge to those peptides forming the association complexes with higher affinities. Statistical estimations support these results providing an association probability by improving predictions accuracy even in cases where only a fraction of all possible combinations are sampled. False positives/false negatives ratio was also improved with this method. A simple rigid-body docking approach together with the proper information about amino acid sequences was used. The methodology was applied in a retrospective docking study to all 8000 possible tripeptide combinations using the 20 natural amino acids, screened against a training set of 77 different ligands with diverse functional groups. Afterward, all tripeptides were screened against a test set of 82 ligands, also containing different functional groups. Results show that our integrated methodology is capable of finding a representative group of the top-scoring tripeptides. The associated probability of identifying the best receptor or a group of the top-ranked receptors is more than double and about 10 times higher, respectively, when compared to classical random sampling methods.
Brenner, Darren R; Amos, Christopher I; Brhane, Yonathan; Timofeeva, Maria N; Caporaso, Neil; Wang, Yufei; Christiani, David C; Bickeböller, Heike; Yang, Ping; Albanes, Demetrius; Stevens, Victoria L; Gapstur, Susan; McKay, James; Boffetta, Paolo; Zaridze, David; Szeszenia-Dabrowska, Neonilia; Lissowska, Jolanta; Rudnai, Peter; Fabianova, Eleonora; Mates, Dana; Bencko, Vladimir; Foretova, Lenka; Janout, Vladimir; Krokan, Hans E; Skorpen, Frank; Gabrielsen, Maiken E; Vatten, Lars; Njølstad, Inger; Chen, Chu; Goodman, Gary; Lathrop, Mark; Vooder, Tõnu; Välk, Kristjan; Nelis, Mari; Metspalu, Andres; Broderick, Peter; Eisen, Timothy; Wu, Xifeng; Zhang, Di; Chen, Wei; Spitz, Margaret R; Wei, Yongyue; Su, Li; Xie, Dong; She, Jun; Matsuo, Keitaro; Matsuda, Fumihiko; Ito, Hidemi; Risch, Angela; Heinrich, Joachim; Rosenberger, Albert; Muley, Thomas; Dienemann, Hendrik; Field, John K; Raji, Olaide; Chen, Ying; Gosney, John; Liloglou, Triantafillos; Davies, Michael P A; Marcus, Michael; McLaughlin, John; Orlow, Irene; Han, Younghun; Li, Yafang; Zong, Xuchen; Johansson, Mattias; Liu, Geoffrey; Tworoger, Shelley S; Le Marchand, Loic; Henderson, Brian E; Wilkens, Lynne R; Dai, Juncheng; Shen, Hongbing; Houlston, Richard S; Landi, Maria T; Brennan, Paul; Hung, Rayjean J
2015-11-01
Large-scale genome-wide association studies (GWAS) have likely uncovered all common variants at the GWAS significance level. Additional variants within the suggestive range (0.0001> P > 5×10(-8)) are, however, still of interest for identifying causal associations. This analysis aimed to apply novel variant prioritization approaches to identify additional lung cancer variants that may not reach the GWAS level. Effects were combined across studies with a total of 33456 controls and 6756 adenocarcinoma (AC; 13 studies), 5061 squamous cell carcinoma (SCC; 12 studies) and 2216 small cell lung cancer cases (9 studies). Based on prior information such as variant physical properties and functional significance, we applied stratified false discovery rates, hierarchical modeling and Bayesian false discovery probabilities for variant prioritization. We conducted a fine mapping analysis as validation of our methods by examining top-ranking novel variants in six independent populations with a total of 3128 cases and 2966 controls. Three novel loci in the suggestive range were identified based on our Bayesian framework analyses: KCNIP4 at 4p15.2 (rs6448050, P = 4.6×10(-7)) and MTMR2 at 11q21 (rs10501831, P = 3.1×10(-6)) with SCC, as well as GAREM at 18q12.1 (rs11662168, P = 3.4×10(-7)) with AC. Use of our prioritization methods validated two of the top three loci associated with SCC (P = 1.05×10(-4) for KCNIP4, represented by rs9799795) and AC (P = 2.16×10(-4) for GAREM, represented by rs3786309) in the independent fine mapping populations. This study highlights the utility of using prior functional data for sequence variants in prioritization analyses to search for robust signals in the suggestive range. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Gene expression allelic imbalance in ovine brown adipose tissue impacts energy homeostasis
Ghazanfar, Shila; Vuocolo, Tony; Morrison, Janna L.; Nicholas, Lisa M.; McMillen, Isabella C.; Yang, Jean Y. H.; Buckley, Michael J.
2017-01-01
Heritable trait variation within a population of organisms is largely governed by DNA variations that impact gene transcription and protein function. Identifying genetic variants that affect complex functional traits is a primary aim of population genetics studies, especially in the context of human disease and agricultural production traits. The identification of alleles directly altering mRNA expression and thereby biological function is challenging due to difficulty in isolating direct effects of cis-acting genetic variations from indirect trans-acting genetic effects. Allele specific gene expression or allelic imbalance in gene expression (AI) occurring at heterozygous loci provides an opportunity to identify genes directly impacted by cis-acting genetic variants as indirect trans-acting effects equally impact the expression of both alleles. However, the identification of genes showing AI in the context of the expression of all genes remains a challenge due to a variety of technical and statistical issues. The current study focuses on the discovery of genes showing AI using single nucleotide polymorphisms as allelic reporters. By developing a computational and statistical process that addressed multiple analytical challenges, we ranked 5,809 genes for evidence of AI using RNA-Seq data derived from brown adipose tissue samples from a cohort of late gestation fetal lambs and then identified a conservative subgroup of 1,293 genes. Thus, AI was extensive, representing approximately 25% of the tested genes. Genes associated with AI were enriched for multiple Gene Ontology (GO) terms relating to lipid metabolism, mitochondrial function and the extracellular matrix. These functions suggest that cis-acting genetic variations causing AI in the population are preferentially impacting genes involved in energy homeostasis and tissue remodelling. These functions may contribute to production traits likely to be under genetic selection in the population. PMID:28665992
Bennett, Robert M; Russell, Jon; Cappelleri, Joseph C; Bushmakin, Andrew G; Zlateva, Gergana; Sadosky, Alesia
2010-06-28
The purpose of this study was to determine whether some of the clinical features of fibromyalgia (FM) that patients would like to see improved aggregate into definable clusters. Seven hundred and eighty-eight patients with clinically confirmed FM and baseline pain > or =40 mm on a 100 mm visual analogue scale ranked 5 FM clinical features that the subjects would most like to see improved after treatment (one for each priority quintile) from a list of 20 developed during focus groups. For each subject, clinical features were transformed into vectors with rankings assigned values 1-5 (lowest to highest ranking). Logistic analysis was used to create a distance matrix and hierarchical cluster analysis was applied to identify cluster structure. The frequency of cluster selection was determined, and cluster importance was ranked using cluster scores derived from rankings of the clinical features. Multidimensional scaling was used to visualize and conceptualize cluster relationships. Six clinical features clusters were identified and named based on their key characteristics. In order of selection frequency, the clusters were Pain (90%; 4 clinical features), Fatigue (89%; 4 clinical features), Domestic (42%; 4 clinical features), Impairment (29%; 3 functions), Affective (21%; 3 clinical features), and Social (9%; 2 functional). The "Pain Cluster" was ranked of greatest importance by 54% of subjects, followed by Fatigue, which was given the highest ranking by 28% of subjects. Multidimensional scaling mapped these clusters to two dimensions: Status (bounded by Physical and Emotional domains), and Setting (bounded by Individual and Group interactions). Common clinical features of FM could be grouped into 6 clusters (Pain, Fatigue, Domestic, Impairment, Affective, and Social) based on patient perception of relevance to treatment. Furthermore, these 6 clusters could be charted in the 2 dimensions of Status and Setting, thus providing a unique perspective for interpretation of FM symptomatology.
Walsh, Neville G.; Cantrill, David J.; Holmes, Gareth D.; Murphy, Daniel J.
2017-01-01
In Australia, Poaceae tribe Poeae are represented by 19 genera and 99 species, including economically and environmentally important native and introduced pasture grasses [e.g. Poa (Tussock-grasses) and Lolium (Ryegrasses)]. We used this tribe, which are well characterised in regards to morphological diversity and evolutionary relationships, to test the efficacy of DNA barcoding methods. A reference library was generated that included 93.9% of species in Australia (408 individuals, x¯ = 3.7 individuals per species). Molecular data were generated for official plant barcoding markers (rbcL, matK) and the nuclear ribosomal internal transcribed spacer (ITS) region. We investigated accuracy of specimen identifications using distance- (nearest neighbour, best-close match, and threshold identification) and tree-based (maximum likelihood, Bayesian inference) methods and applied species discovery methods (automatic barcode gap discovery, Poisson tree processes) based on molecular data to assess congruence with recognised species. Across all methods, success rate for specimen identification of genera was high (87.5–99.5%) and of species was low (25.6–44.6%). Distance- and tree-based methods were equally ineffective in providing accurate identifications for specimens to species rank (26.1–44.6% and 25.6–31.3%, respectively). The ITS marker achieved the highest success rate for specimen identification at both generic and species ranks across the majority of methods. For distance-based analyses the best-close match method provided the greatest accuracy for identification of individuals with a high percentage of “correct” (97.6%) and a low percentage of “incorrect” (0.3%) generic identifications, based on the ITS marker. For tribe Poeae, and likely for other grass lineages, sequence data in the standard DNA barcode markers are not variable enough for accurate identification of specimens to species rank. For recently diverged grass species similar challenges are encountered in the application of genetic and morphological data to species delimitations, with taxonomic signal limited by extensive infra-specific variation and shared polymorphisms among species in both data types. PMID:29084279
Birch, Joanne L; Walsh, Neville G; Cantrill, David J; Holmes, Gareth D; Murphy, Daniel J
2017-01-01
In Australia, Poaceae tribe Poeae are represented by 19 genera and 99 species, including economically and environmentally important native and introduced pasture grasses [e.g. Poa (Tussock-grasses) and Lolium (Ryegrasses)]. We used this tribe, which are well characterised in regards to morphological diversity and evolutionary relationships, to test the efficacy of DNA barcoding methods. A reference library was generated that included 93.9% of species in Australia (408 individuals, [Formula: see text] = 3.7 individuals per species). Molecular data were generated for official plant barcoding markers (rbcL, matK) and the nuclear ribosomal internal transcribed spacer (ITS) region. We investigated accuracy of specimen identifications using distance- (nearest neighbour, best-close match, and threshold identification) and tree-based (maximum likelihood, Bayesian inference) methods and applied species discovery methods (automatic barcode gap discovery, Poisson tree processes) based on molecular data to assess congruence with recognised species. Across all methods, success rate for specimen identification of genera was high (87.5-99.5%) and of species was low (25.6-44.6%). Distance- and tree-based methods were equally ineffective in providing accurate identifications for specimens to species rank (26.1-44.6% and 25.6-31.3%, respectively). The ITS marker achieved the highest success rate for specimen identification at both generic and species ranks across the majority of methods. For distance-based analyses the best-close match method provided the greatest accuracy for identification of individuals with a high percentage of "correct" (97.6%) and a low percentage of "incorrect" (0.3%) generic identifications, based on the ITS marker. For tribe Poeae, and likely for other grass lineages, sequence data in the standard DNA barcode markers are not variable enough for accurate identification of specimens to species rank. For recently diverged grass species similar challenges are encountered in the application of genetic and morphological data to species delimitations, with taxonomic signal limited by extensive infra-specific variation and shared polymorphisms among species in both data types.
Time-Aware Service Ranking Prediction in the Internet of Things Environment
Huang, Yuze; Huang, Jiwei; Cheng, Bo; He, Shuqing; Chen, Junliang
2017-01-01
With the rapid development of the Internet of things (IoT), building IoT systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedures of building IoT systems, QoS-aware service selection is an important concern, which requires the ranking of a set of functionally similar services according to their QoS values. In reality, however, it is quite expensive and even impractical to evaluate all geographically-dispersed IoT services at a single client to obtain such a ranking. Nevertheless, distributed measurement and ranking aggregation have to deal with the high dynamics of QoS values and the inconsistency of partial rankings. To address these challenges, we propose a time-aware service ranking prediction approach named TSRPred for obtaining the global ranking from the collection of partial rankings. Specifically, a pairwise comparison model is constructed to describe the relationships between different services, where the partial rankings are obtained by time series forecasting on QoS values. The comparisons of IoT services are formulated by random walks, and thus, the global ranking can be obtained by sorting the steady-state probabilities of the underlying Markov chain. Finally, the efficacy of TSRPred is validated by simulation experiments based on large-scale real-world datasets. PMID:28448451
Time-Aware Service Ranking Prediction in the Internet of Things Environment.
Huang, Yuze; Huang, Jiwei; Cheng, Bo; He, Shuqing; Chen, Junliang
2017-04-27
With the rapid development of the Internet of things (IoT), building IoT systems with high quality of service (QoS) has become an urgent requirement in both academia and industry. During the procedures of building IoT systems, QoS-aware service selection is an important concern, which requires the ranking of a set of functionally similar services according to their QoS values. In reality, however, it is quite expensive and even impractical to evaluate all geographically-dispersed IoT services at a single client to obtain such a ranking. Nevertheless, distributed measurement and ranking aggregation have to deal with the high dynamics of QoS values and the inconsistency of partial rankings. To address these challenges, we propose a time-aware service ranking prediction approach named TSRPred for obtaining the global ranking from the collection of partial rankings. Specifically, a pairwise comparison model is constructed to describe the relationships between different services, where the partial rankings are obtained by time series forecasting on QoS values. The comparisons of IoT services are formulated by random walks, and thus, the global ranking can be obtained by sorting the steady-state probabilities of the underlying Markov chain. Finally, the efficacy of TSRPred is validated by simulation experiments based on large-scale real-world datasets.
Lua, Rhonald C; Wilson, Stephen J; Konecki, Daniel M; Wilkins, Angela D; Venner, Eric; Morgan, Daniel H; Lichtarge, Olivier
2016-01-04
The structure and function of proteins underlie most aspects of biology and their mutational perturbations often cause disease. To identify the molecular determinants of function as well as targets for drugs, it is central to characterize the important residues and how they cluster to form functional sites. The Evolutionary Trace (ET) achieves this by ranking the functional and structural importance of the protein sequence positions. ET uses evolutionary distances to estimate functional distances and correlates genotype variations with those in the fitness phenotype. Thus, ET ranks are worse for sequence positions that vary among evolutionarily closer homologs but better for positions that vary mostly among distant homologs. This approach identifies functional determinants, predicts function, guides the mutational redesign of functional and allosteric specificity, and interprets the action of coding sequence variations in proteins, people and populations. Now, the UET database offers pre-computed ET analyses for the protein structure databank, and on-the-fly analysis of any protein sequence. A web interface retrieves ET rankings of sequence positions and maps results to a structure to identify functionally important regions. This UET database integrates several ways of viewing the results on the protein sequence or structure and can be found at http://mammoth.bcm.tmc.edu/uet/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Rethinking the Discovery Function of Proof within the Context of Proofs and Refutations
ERIC Educational Resources Information Center
Komatsu, Kotaro; Tsujiyama, Yosuke; Sakamaki, Aruta
2014-01-01
Proof and proving are important components of school mathematics and have multiple functions in mathematical practice. Among these functions of proof, this paper focuses on the discovery function that refers to invention of a new statement or conjecture by reflecting on or utilizing a constructed proof. Based on two cases in which eighth and ninth…
Research & market strategy: how choice of drug discovery approach can affect market position.
Sams-Dodd, Frank
2007-04-01
In principal, drug discovery approaches can be grouped into target- and function-based, with the respective aims of developing either a target-selective drug or a drug that produces a specific biological effect irrespective of its mode of action. Most analyses of drug discovery approaches focus on productivity, whereas the strategic implications of the choice of drug discovery approach on market position and ability to maintain market exclusivity are rarely considered. However, a comparison of approaches from the perspective of market position indicates that the functional approach is superior for the development of novel, innovative treatments.
A solution to Schroder's equation in several variables
Bridges, Robert A.
2016-03-04
For this paper, let φ be an analytic self-map of the n -ball, having 0 as the attracting fixed point and having full-rank near 0. We consider the generalized Schroder's equation, F °φ=φ'(0) kF with ka positive integer and prove there is always a solution F with linearly independent component functions, but that such an F cannot have full rank except possibly when k=1. Furthermore, when k=1 (Schroder's equation), necessary and sufficient conditions on φ are given to ensure F has full rank near 0 without the added assumption of diagonalizability as needed in the 2003 Cowen/MacCluer paper. In responsemore » to Enoch's 2007 paper, it is proven that any formal power series solution indeed represents an analytic function on the whole unit ball. Finally, how exactly resonance can lead to an obstruction of a full rank solution is discussed as well as some consequences of having solutions to Schroder's equation.« less
Robust subspace clustering via joint weighted Schatten-p norm and Lq norm minimization
NASA Astrophysics Data System (ADS)
Zhang, Tao; Tang, Zhenmin; Liu, Qing
2017-05-01
Low-rank representation (LRR) has been successfully applied to subspace clustering. However, the nuclear norm in the standard LRR is not optimal for approximating the rank function in many real-world applications. Meanwhile, the L21 norm in LRR also fails to characterize various noises properly. To address the above issues, we propose an improved LRR method, which achieves low rank property via the new formulation with weighted Schatten-p norm and Lq norm (WSPQ). Specifically, the nuclear norm is generalized to be the Schatten-p norm and different weights are assigned to the singular values, and thus it can approximate the rank function more accurately. In addition, Lq norm is further incorporated into WSPQ to model different noises and improve the robustness. An efficient algorithm based on the inexact augmented Lagrange multiplier method is designed for the formulated problem. Extensive experiments on face clustering and motion segmentation clearly demonstrate the superiority of the proposed WSPQ over several state-of-the-art methods.
Rank Order Entropy: why one metric is not enough
McLellan, Margaret R.; Ryan, M. Dominic; Breneman, Curt M.
2011-01-01
The use of Quantitative Structure-Activity Relationship models to address problems in drug discovery has a mixed history, generally resulting from the mis-application of QSAR models that were either poorly constructed or used outside of their domains of applicability. This situation has motivated the development of a variety of model performance metrics (r2, PRESS r2, F-tests, etc) designed to increase user confidence in the validity of QSAR predictions. In a typical workflow scenario, QSAR models are created and validated on training sets of molecules using metrics such as Leave-One-Out or many-fold cross-validation methods that attempt to assess their internal consistency. However, few current validation methods are designed to directly address the stability of QSAR predictions in response to changes in the information content of the training set. Since the main purpose of QSAR is to quickly and accurately estimate a property of interest for an untested set of molecules, it makes sense to have a means at hand to correctly set user expectations of model performance. In fact, the numerical value of a molecular prediction is often less important to the end user than knowing the rank order of that set of molecules according to their predicted endpoint values. Consequently, a means for characterizing the stability of predicted rank order is an important component of predictive QSAR. Unfortunately, none of the many validation metrics currently available directly measure the stability of rank order prediction, making the development of an additional metric that can quantify model stability a high priority. To address this need, this work examines the stabilities of QSAR rank order models created from representative data sets, descriptor sets, and modeling methods that were then assessed using Kendall Tau as a rank order metric, upon which the Shannon Entropy was evaluated as a means of quantifying rank-order stability. Random removal of data from the training set, also known as Data Truncation Analysis (DTA), was used as a means for systematically reducing the information content of each training set while examining both rank order performance and rank order stability in the face of training set data loss. The premise for DTA ROE model evaluation is that the response of a model to incremental loss of training information will be indicative of the quality and sufficiency of its training set, learning method, and descriptor types to cover a particular domain of applicability. This process is termed a “rank order entropy” evaluation, or ROE. By analogy with information theory, an unstable rank order model displays a high level of implicit entropy, while a QSAR rank order model which remains nearly unchanged during training set reductions would show low entropy. In this work, the ROE metric was applied to 71 data sets of different sizes, and was found to reveal more information about the behavior of the models than traditional metrics alone. Stable, or consistently performing models, did not necessarily predict rank order well. Models that performed well in rank order did not necessarily perform well in traditional metrics. In the end, it was shown that ROE metrics suggested that some QSAR models that are typically used should be discarded. ROE evaluation helps to discern which combinations of data set, descriptor set, and modeling methods lead to usable models in prioritization schemes, and provides confidence in the use of a particular model within a specific domain of applicability. PMID:21875058
Spencer, Amy V; Cox, Angela; Lin, Wei-Yu; Easton, Douglas F; Michailidou, Kyriaki; Walters, Kevin
2016-04-01
There is a large amount of functional genetic data available, which can be used to inform fine-mapping association studies (in diseases with well-characterised disease pathways). Single nucleotide polymorphism (SNP) prioritization via Bayes factors is attractive because prior information can inform the effect size or the prior probability of causal association. This approach requires the specification of the effect size. If the information needed to estimate a priori the probability density for the effect sizes for causal SNPs in a genomic region isn't consistent or isn't available, then specifying a prior variance for the effect sizes is challenging. We propose both an empirical method to estimate this prior variance, and a coherent approach to using SNP-level functional data, to inform the prior probability of causal association. Through simulation we show that when ranking SNPs by our empirical Bayes factor in a fine-mapping study, the causal SNP rank is generally as high or higher than the rank using Bayes factors with other plausible values of the prior variance. Importantly, we also show that assigning SNP-specific prior probabilities of association based on expert prior functional knowledge of the disease mechanism can lead to improved causal SNPs ranks compared to ranking with identical prior probabilities of association. We demonstrate the use of our methods by applying the methods to the fine mapping of the CASP8 region of chromosome 2 using genotype data from the Collaborative Oncological Gene-Environment Study (COGS) Consortium. The data we analysed included approximately 46,000 breast cancer case and 43,000 healthy control samples. © 2016 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Kusumawati, Rosita; Subekti, Retno
2017-04-01
Fuzzy bi-objective linear programming (FBOLP) model is bi-objective linear programming model in fuzzy number set where the coefficients of the equations are fuzzy number. This model is proposed to solve portfolio selection problem which generate an asset portfolio with the lowest risk and the highest expected return. FBOLP model with normal fuzzy numbers for risk and expected return of stocks is transformed into linear programming (LP) model using magnitude ranking function.
Rai, Prashant; Sargsyan, Khachik; Najm, Habib; ...
2017-03-07
Here, a new method is proposed for a fast evaluation of high-dimensional integrals of potential energy surfaces (PES) that arise in many areas of quantum dynamics. It decomposes a PES into a canonical low-rank tensor format, reducing its integral into a relatively short sum of products of low-dimensional integrals. The decomposition is achieved by the alternating least squares (ALS) algorithm, requiring only a small number of single-point energy evaluations. Therefore, it eradicates a force-constant evaluation as the hotspot of many quantum dynamics simulations and also possibly lifts the curse of dimensionality. This general method is applied to the anharmonic vibrationalmore » zero-point and transition energy calculations of molecules using the second-order diagrammatic vibrational many-body Green's function (XVH2) theory with a harmonic-approximation reference. In this application, high dimensional PES and Green's functions are both subjected to a low-rank decomposition. Evaluating the molecular integrals over a low-rank PES and Green's functions as sums of low-dimensional integrals using the Gauss–Hermite quadrature, this canonical-tensor-decomposition-based XVH2 (CT-XVH2) achieves an accuracy of 0.1 cm -1 or higher and nearly an order of magnitude speedup as compared with the original algorithm using force constants for water and formaldehyde.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rai, Prashant; Sargsyan, Khachik; Najm, Habib
Here, a new method is proposed for a fast evaluation of high-dimensional integrals of potential energy surfaces (PES) that arise in many areas of quantum dynamics. It decomposes a PES into a canonical low-rank tensor format, reducing its integral into a relatively short sum of products of low-dimensional integrals. The decomposition is achieved by the alternating least squares (ALS) algorithm, requiring only a small number of single-point energy evaluations. Therefore, it eradicates a force-constant evaluation as the hotspot of many quantum dynamics simulations and also possibly lifts the curse of dimensionality. This general method is applied to the anharmonic vibrationalmore » zero-point and transition energy calculations of molecules using the second-order diagrammatic vibrational many-body Green's function (XVH2) theory with a harmonic-approximation reference. In this application, high dimensional PES and Green's functions are both subjected to a low-rank decomposition. Evaluating the molecular integrals over a low-rank PES and Green's functions as sums of low-dimensional integrals using the Gauss–Hermite quadrature, this canonical-tensor-decomposition-based XVH2 (CT-XVH2) achieves an accuracy of 0.1 cm -1 or higher and nearly an order of magnitude speedup as compared with the original algorithm using force constants for water and formaldehyde.« less
2015-01-01
Molecular docking is a powerful tool used in drug discovery and structural biology for predicting the structures of ligand–receptor complexes. However, the accuracy of docking calculations can be limited by factors such as the neglect of protein reorganization in the scoring function; as a result, ligand screening can produce a high rate of false positive hits. Although absolute binding free energy methods still have difficulty in accurately rank-ordering binders, we believe that they can be fruitfully employed to distinguish binders from nonbinders and reduce the false positive rate. Here we study a set of ligands that dock favorably to a newly discovered, potentially allosteric site on the flap of HIV-1 protease. Fragment binding to this site stabilizes a closed form of protease, which could be exploited for the design of allosteric inhibitors. Twenty-three top-ranked protein–ligand complexes from AutoDock were subject to the free energy screening using two methods, the recently developed binding energy analysis method (BEDAM) and the standard double decoupling method (DDM). Free energy calculations correctly identified most of the false positives (≥83%) and recovered all the confirmed binders. The results show a gap averaging ≥3.7 kcal/mol, separating the binders and the false positives. We present a formula that decomposes the binding free energy into contributions from the receptor conformational macrostates, which provides insights into the roles of different binding modes. Our binding free energy component analysis further suggests that improving the treatment for the desolvation penalty associated with the unfulfilled polar groups could reduce the rate of false positive hits in docking. The current study demonstrates that the combination of docking with free energy methods can be very useful for more accurate ligand screening against valuable drug targets. PMID:25189630
NASA Astrophysics Data System (ADS)
Wilson, B. D.; McGibbney, L. J.; Mattmann, C. A.; Ramirez, P.; Joyce, M.; Whitehall, K. D.
2015-12-01
Quantifying scientific relevancy is of increasing importance to NASA and the research community. Scientific relevancy may be defined by mapping the impacts of a particular NASA mission, instrument, and/or retrieved variables to disciplines such as climate predictions, natural hazards detection and mitigation processes, education, and scientific discoveries. Related to relevancy, is the ability to expose data with similar attributes. This in turn depends upon the ability for us to extract latent, implicit document features from scientific data and resources and make them explicit, accessible and useable for search activities amongst others. This paper presents MemexGATE; a server side application, command line interface and computing environment for running large scale metadata extraction, general architecture text engineering, document classification and indexing tasks over document resources such as social media streams, scientific literature archives, legal documentation, etc. This work builds on existing experiences using MemexGATE (funded, developed and validated through the DARPA Memex Progrjam PI Mattmann) for extracting and leveraging latent content features from document resources within the Materials Research domain. We extend the software functionality capability to the domain of scientific literature with emphasis on the expansion of gazetteer lists, named entity rules, natural language construct labeling (e.g. synonym, antonym, hyponym, etc.) efforts to enable extraction of latent content features from data hosted by wide variety of scientific literature vendors (AGU Meeting Abstract Database, Springer, Wiley Online, Elsevier, etc.) hosting earth science literature. Such literature makes both implicit and explicit references to NASA datasets and relationships between such concepts stored across EOSDIS DAAC's hence we envisage that a significant part of this effort will also include development and understanding of relevancy signals which can ultimately be utilized for improved search and relevancy ranking across scientific literature.
Lab-on-a-chip platform for high throughput drug discovery with DNA-encoded chemical libraries
NASA Astrophysics Data System (ADS)
Grünzner, S.; Reddavide, F. V.; Steinfelder, C.; Cui, M.; Busek, M.; Klotzbach, U.; Zhang, Y.; Sonntag, F.
2017-02-01
The fast development of DNA-encoded chemical libraries (DECL) in the past 10 years has received great attention from pharmaceutical industries. It applies the selection approach for small molecular drug discovery. Because of the limited choices of DNA-compatible chemical reactions, most DNA-encoded chemical libraries have a narrow structural diversity and low synthetic yield. There is also a poor correlation between the ranking of compounds resulted from analyzing the sequencing data and the affinity measured through biochemical assays. By combining DECL with dynamical chemical library, the resulting DNA-encoded dynamic library (EDCCL) explores the thermodynamic equilibrium of reversible reactions as well as the advantages of DNA encoded compounds for manipulation/detection, thus leads to enhanced signal-to-noise ratio of the selection process and higher library quality. However, the library dynamics are caused by the weak interactions between the DNA strands, which also result in relatively low affinity of the bidentate interaction, as compared to a stable DNA duplex. To take advantage of both stably assembled dual-pharmacophore libraries and EDCCLs, we extended the concept of EDCCLs to heat-induced EDCCLs (hi-EDCCLs), in which the heat-induced recombination process of stable DNA duplexes and affinity capture are carried out separately. To replace the extremely laborious and repetitive manual process, a fully automated device will facilitate the use of DECL in drug discovery. Herein we describe a novel lab-on-a-chip platform for high throughput drug discovery with hi-EDCCL. A microfluidic system with integrated actuation was designed which is able to provide a continuous sample circulation by reducing the volume to a minimum. It consists of a cooled and a heated chamber for constant circulation. The system is capable to generate stable temperatures above 75 °C in the heated chamber to melt the double strands of the DNA and less than 15 °C in the cooled chamber, to reanneal the reshuffled library. In the binding chamber (the cooled chamber) specific retaining structures are integrated. These hold back beads functionalized with the target protein, while the chamber is continuously flushed with library molecules. Afterwards the whole system can be flushed with buffer to wash out unspecific bound molecules. Finally the protein-loaded beads with attached molecules can be eluted for further investigation.
García, J B; Tormo, José R
2003-06-01
A new tool, HPLC Studio, was developed for the comparison of high-performance liquid chromatography (HPLC) chromatograms from microbial extracts. The new utility makes it possible to create a virtual chromatogram by mixing up to 20 individual chromatograms. The virtual chromatogram is the first step in establishing a ranking of the microbial fermentation conditions based on either the area or diversity of HPLC peaks. The utility was used to maximize the diversity of secondary metabolites tested from a microorganism and therefore increase the chances of finding new lead compounds in a drug discovery program.
kpLogo: positional k-mer analysis reveals hidden specificity in biological sequences
2017-01-01
Abstract Motifs of only 1–4 letters can play important roles when present at key locations within macromolecules. Because existing motif-discovery tools typically miss these position-specific short motifs, we developed kpLogo, a probability-based logo tool for integrated detection and visualization of position-specific ultra-short motifs from a set of aligned sequences. kpLogo also overcomes the limitations of conventional motif-visualization tools in handling positional interdependencies and utilizing ranked or weighted sequences increasingly available from high-throughput assays. kpLogo can be found at http://kplogo.wi.mit.edu/. PMID:28460012
Morgan, A J; Parker, S
2007-03-01
Edward Jenner's discovery of vaccination must rank as one of the most important medical advances of all time and is a prominent example of the power of rational enquiry being brought to bear during the Age of Enlightenment in 18th century Europe. In the modern era many millions of lives are saved each year by vaccines that work essentially on the same principles that were established by Edward Jenner more than 200 years ago. His country home in Berkeley, Gloucestershire, is where he carried out his work and where he spent most of his life. The building is now a museum in which the life and times of Jenner are commemorated including not only the discovery of smallpox vaccination but also his other important scientific contributions to natural history and medicine. The trustees of the Edward Jenner museum are committed to promoting the museum as a real and "virtual" educational centre that is both entertaining and informative.
Apprenticeships, Collaboration and Scientific Discovery in Academic Field Studies
NASA Astrophysics Data System (ADS)
Madden, Derek Scott; Grayson, Diane J.; Madden, Erinn H.; Milewski, Antoni V.; Snyder, Cathy Ann
2012-11-01
Teachers may use apprenticeships and collaboration as instructional strategies that help students to make authentic scientific discoveries as they work as amateur researchers in academic field studies. This concept was examined with 643 students, ages 14-72, who became proficient at field research through cognitive apprenticeships with the Smithsonian Institute, School for Field Studies and Earthwatch. Control student teams worked from single research goals and sets of methods, while experimental teams varied goals, methods, and collaborative activities in Kenya, Costa Rica, Panama, and Ecuador. Results from studies indicate that students who conducted local pilot studies, collaborative symposia, and ongoing, long-term fieldwork generated significantly more data than did control groups. Research reports of the experimental groups were ranked highest by experts, and contributed the most data to international science journals. Data and anecdotal information in this report indicate that structured collaboration in local long-term studies using apprenticeships may increase the potential for students' academic field studies contribution of new information to science.
Morgan, A J; Parker, S
2007-01-01
Edward Jenner's discovery of vaccination must rank as one of the most important medical advances of all time and is a prominent example of the power of rational enquiry being brought to bear during the Age of Enlightenment in 18th century Europe. In the modern era many millions of lives are saved each year by vaccines that work essentially on the same principles that were established by Edward Jenner more than 200 years ago. His country home in Berkeley, Gloucestershire, is where he carried out his work and where he spent most of his life. The building is now a museum in which the life and times of Jenner are commemorated including not only the discovery of smallpox vaccination but also his other important scientific contributions to natural history and medicine. The trustees of the Edward Jenner museum are committed to promoting the museum as a real and “virtual” educational centre that is both entertaining and informative. PMID:17302886
An Electromagnetically-Controlled Precision Orbital Tracking Vehicle (POTV)
1992-12-01
assume that C > B > A. Then 0 1(t) is purely sinusoidal. tk2 (t) is also sinusoidal because the forcing function z(t) is sinusoidal. 03 (t) is more...an unpredictable -manner. The problem arises from the rank deficiency of the G input matrix as shown below. Remember we have shown already that its...rank can never exceed five because rows two, four, and six are linearly dependent. The rank deficiency arises from the "translational part" of the input
Dudley, Joel T.; Chen, Rong; Sanderford, Maxwell; Butte, Atul J.; Kumar, Sudhir
2012-01-01
Genome-wide disease association studies contrast genetic variation between disease cohorts and healthy populations to discover single nucleotide polymorphisms (SNPs) and other genetic markers revealing underlying genetic architectures of human diseases. Despite scores of efforts over the past decade, many reproducible genetic variants that explain substantial proportions of the heritable risk of common human diseases remain undiscovered. We have conducted a multispecies genomic analysis of 5,831 putative human risk variants for more than 230 disease phenotypes reported in 2,021 studies. We find that the current approaches show a propensity for discovering disease-associated SNPs (dSNPs) at conserved genomic positions because the effect size (odds ratio) and allelic P value of genetic association of an SNP relates strongly to the evolutionary conservation of their genomic position. We propose a new measure for ranking SNPs that integrates evolutionary conservation scores and the P value (E-rank). Using published data from a large case-control study, we demonstrate that E-rank method prioritizes SNPs with a greater likelihood of bona fide and reproducible genetic disease associations, many of which may explain greater proportions of genetic variance. Therefore, long-term evolutionary histories of genomic positions offer key practical utility in reassessing data from existing disease association studies, and in the design and analysis of future studies aimed at revealing the genetic basis of common human diseases. PMID:22389448
Geuijen, Cecilia A W; Clijsters-van der Horst, Marieke; Cox, Freek; Rood, Pauline M L; Throsby, Mark; Jongeneelen, Mandy A C; Backus, Harold H J; van Deventer, Els; Kruisbeek, Ada M; Goudsmit, Jaap; de Kruif, John
2005-07-01
Application of antibody phage display to the identification of cell surface antigens with restricted expression patterns is often complicated by the inability to demonstrate specific binding to a certain cell type. The specificity of an antibody can only be properly assessed when the antibody is of sufficient high affinity to detect low-density antigens on cell surfaces. Therefore, a robust and simple assay for the prediction of relative antibody affinities was developed and compared to data obtained using surface plasmon resonance (SPR) technology. A panel of eight anti-CD46 antibody fragments with different affinities was selected from phage display libraries and reformatted into complete human IgG1 molecules. SPR was used to determine K(D) values for these antibodies. The association and dissociation of the antibodies for binding to CD46 expressed on cell surfaces were analysed using FACS-based assays. We show that ranking of the antibodies based on FACS data correlates well with ranking based on K(D) values as measured by SPR and can therefore be used to discriminate between high- and low-affinity antibodies. Finally, we show that a low-affinity antibody may only detect high expression levels of a surface marker while failing to detect lower expression levels of this molecule, which may lead to a false interpretation of antibody specificity.
NASA Astrophysics Data System (ADS)
Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.
2016-12-01
Big oceanographic data have been produced, archived and made available online, but finding the right data for scientific research and application development is still a significant challenge. A long-standing problem in data discovery is how to find the interrelationships between keywords and data, as well as the intrarelationships of the two individually. Most previous research attempted to solve this problem by building domain-specific ontology either manually or through automatic machine learning techniques. The former is costly, labor intensive and hard to keep up-to-date, while the latter is prone to noise and may be difficult for human to understand. Large-scale user behavior data modelling represents a largely untapped, unique, and valuable source for discovering semantic relationships among domain-specific vocabulary. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, user behaviors, and existing ontology. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the proposed search engine helps both scientists and general users search with better ranking results, recommendation, and ontology navigation.
Imbalanced target prediction with pattern discovery on clinical data repositories.
Chan, Tak-Ming; Li, Yuxi; Chiau, Choo-Chiap; Zhu, Jane; Jiang, Jie; Huo, Yong
2017-04-20
Clinical data repositories (CDR) have great potential to improve outcome prediction and risk modeling. However, most clinical studies require careful study design, dedicated data collection efforts, and sophisticated modeling techniques before a hypothesis can be tested. We aim to bridge this gap, so that clinical domain users can perform first-hand prediction on existing repository data without complicated handling, and obtain insightful patterns of imbalanced targets for a formal study before it is conducted. We specifically target for interpretability for domain users where the model can be conveniently explained and applied in clinical practice. We propose an interpretable pattern model which is noise (missing) tolerant for practice data. To address the challenge of imbalanced targets of interest in clinical research, e.g., deaths less than a few percent, the geometric mean of sensitivity and specificity (G-mean) optimization criterion is employed, with which a simple but effective heuristic algorithm is developed. We compared pattern discovery to clinically interpretable methods on two retrospective clinical datasets. They contain 14.9% deaths in 1 year in the thoracic dataset and 9.1% deaths in the cardiac dataset, respectively. In spite of the imbalance challenge shown on other methods, pattern discovery consistently shows competitive cross-validated prediction performance. Compared to logistic regression, Naïve Bayes, and decision tree, pattern discovery achieves statistically significant (p-values < 0.01, Wilcoxon signed rank test) favorable averaged testing G-means and F1-scores (harmonic mean of precision and sensitivity). Without requiring sophisticated technical processing of data and tweaking, the prediction performance of pattern discovery is consistently comparable to the best achievable performance. Pattern discovery has demonstrated to be robust and valuable for target prediction on existing clinical data repositories with imbalance and noise. The prediction results and interpretable patterns can provide insights in an agile and inexpensive way for the potential formal studies.
Spaceborne power systems preference analyses. Volume 1: Summary
NASA Technical Reports Server (NTRS)
Smith, J. H.; Feinberg, A.; Miles, R. F., Jr.
1985-01-01
Sixteen alternative spaceborne nuclear power system concepts were ranked using multiattribute decision analysis to identify promising concepts for further technology development. Four groups interviewed were: safety, systems definition and design, technology assessment, and mission analysis. The ranking results were consistent from group and for different utility function models for individuals.
Ligand-based receptor tyrosine kinase partial agonists: New paradigm for cancer drug discovery?
Riese, David J
2011-02-01
INTRODUCTION: Receptor tyrosine kinases (RTKs) are validated targets for oncology drug discovery and several RTK antagonists have been approved for the treatment of human malignancies. Nonetheless, the discovery and development of RTK antagonists has lagged behind the discovery and development of agents that target G-protein coupled receptors. In part, this is because it has been difficult to discover analogs of naturally-occurring RTK agonists that function as antagonists. AREAS COVERED: Here we describe ligands of ErbB receptors that function as partial agonists for these receptors, thereby enabling these ligands to antagonize the activity of full agonists for these receptors. We provide insights into the mechanisms by which these ligands function as antagonists. We discuss how information concerning these mechanisms can be translated into screens for novel small molecule- and antibody-based antagonists of ErbB receptors and how such antagonists hold great potential as targeted cancer chemotherapeutics. EXPERT OPINION: While there have been a number of important key findings into this field, the identification of the structural basis of ligand functional specificity is still of the greatest importance. While it is true that, with some notable exceptions, peptide hormones and growth factors have not proven to be good platforms for oncology drug discovery; addressing the fundamental issues of antagonistic partial agonists for receptor tyrosine kinases has the potential to steer oncology drug discovery in new directions. Mechanism based approaches are now emerging to enable the discovery of RTK partial agonists that may antagonize both agonist-dependent and -independent RTK signaling and may hold tremendous promise as targeted cancer chemotherapeutics.
ERIC Educational Resources Information Center
Brown, Herbert C.
1974-01-01
The role of discovery in the advance of the science of chemistry and the factors that are currently operating to handicap that function are considered. Examples are drawn from the author's work with boranes. The thesis that exploratory research and discovery should be encouraged is stressed. (DT)
How to infer relative fitness from a sample of genomic sequences.
Dayarian, Adel; Shraiman, Boris I
2014-07-01
Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
2011-01-01
Background Since the classic Hopkins and Groom druggable genome review in 2002, there have been a number of publications updating both the hypothetical and successful human drug target statistics. However, listings of research targets that define the area between these two extremes are sparse because of the challenges of collating published information at the necessary scale. We have addressed this by interrogating databases, populated by expert curation, of bioactivity data extracted from patents and journal papers over the last 30 years. Results From a subset of just over 27,000 documents we have extracted a set of compound-to-target relationships for biochemical in vitro binding-type assay data for 1,736 human proteins and 1,654 gene identifiers. These are linked to 1,671,951 compound records derived from 823,179 unique chemical structures. The distribution showed a compounds-per-target average of 964 with a maximum of 42,869 (Factor Xa). The list includes non-targets, failed targets and cross-screening targets. The top-278 most actively pursued targets cover 90% of the compounds. We further investigated target ranking by determining the number of molecular frameworks and scaffolds. These were compared to the compound counts as alternative measures of chemical diversity on a per-target basis. Conclusions The compounds-per-protein listing generated in this work (provided as a supplementary file) represents the major proportion of the human drug target landscape defined by published data. We supplemented the simple ranking by the number of compounds assayed with additional rankings by molecular topology. These showed significant differences and provide complementary assessments of chemical tractability. PMID:21569515
Zhang, ZhiZhuo; Chang, Cheng Wei; Hugo, Willy; Cheung, Edwin; Sung, Wing-Kin
2013-03-01
Although de novo motifs can be discovered through mining over-represented sequence patterns, this approach misses some real motifs and generates many false positives. To improve accuracy, one solution is to consider some additional binding features (i.e., position preference and sequence rank preference). This information is usually required from the user. This article presents a de novo motif discovery algorithm called SEME (sampling with expectation maximization for motif elicitation), which uses pure probabilistic mixture model to model the motif's binding features and uses expectation maximization (EM) algorithms to simultaneously learn the sequence motif, position, and sequence rank preferences without asking for any prior knowledge from the user. SEME is both efficient and accurate thanks to two important techniques: the variable motif length extension and importance sampling. Using 75 large-scale synthetic datasets, 32 metazoan compendium benchmark datasets, and 164 chromatin immunoprecipitation sequencing (ChIP-Seq) libraries, we demonstrated the superior performance of SEME over existing programs in finding transcription factor (TF) binding sites. SEME is further applied to a more difficult problem of finding the co-regulated TF (coTF) motifs in 15 ChIP-Seq libraries. It identified significantly more correct coTF motifs and, at the same time, predicted coTF motifs with better matching to the known motifs. Finally, we show that the learned position and sequence rank preferences of each coTF reveals potential interaction mechanisms between the primary TF and the coTF within these sites. Some of these findings were further validated by the ChIP-Seq experiments of the coTFs. The application is available online.
Sex-ratio biasing towards daughters among lower-ranking co-wives in Rwanda.
Pollet, Thomas V; Fawcett, Tim W; Buunk, Abraham P; Nettle, Daniel
2009-12-23
There is considerable debate as to whether human females bias the sex ratio of their offspring as a function of their own condition. We apply the Trivers-Willard prediction-that mothers in poor condition will overproduce daughters-to a novel measure of condition, namely wife rank within a polygynous marriage. Using a large-scale sample of over 95 000 Rwandan mothers, we show that lower-ranking polygynous wives do indeed have significantly more daughters than higher-ranking polygynous wives and monogamously married women. This effect remains when controlling for potential confounds such as maternal age. We discuss these results in reference to previous work on sex-ratio adjustment in humans.
Miller, J; Fuller, M; Vinod, S; Suchowerska, N; Holloway, L
2009-06-01
A Clinician's discrimination between radiation therapy treatment plans is traditionally a subjective process, based on experience and existing protocols. A more objective and quantitative approach to distinguish between treatment plans is to use radiobiological or dosimetric objective functions, based on radiobiological or dosimetric models. The efficacy of models is not well understood, nor is the correlation of the rank of plans resulting from the use of models compared to the traditional subjective approach. One such radiobiological model is the Normal Tissue Complication Probability (NTCP). Dosimetric models or indicators are more accepted in clinical practice. In this study, three radiobiological models, Lyman NTCP, critical volume NTCP and relative seriality NTCP, and three dosimetric models, Mean Lung Dose (MLD) and the Lung volumes irradiated at 10Gy (V10) and 20Gy (V20), were used to rank a series of treatment plans using, harm to normal (Lung) tissue as the objective criterion. None of the models considered in this study showed consistent correlation with the Radiation Oncologists plan ranking. If radiobiological or dosimetric models are to be used in objective functions for lung treatments, based on this study it is recommended that the Lyman NTCP model be used because it will provide most consistency with traditional clinician ranking.
Wang, Ya-Xuan; Gao, Ying-Lian; Liu, Jin-Xing; Kong, Xiang-Zhen; Li, Hai-Jun
2017-09-01
Identifying differentially expressed genes from the thousands of genes is a challenging task. Robust principal component analysis (RPCA) is an efficient method in the identification of differentially expressed genes. RPCA method uses nuclear norm to approximate the rank function. However, theoretical studies showed that the nuclear norm minimizes all singular values, so it may not be the best solution to approximate the rank function. The truncated nuclear norm is defined as the sum of some smaller singular values, which may achieve a better approximation of the rank function than nuclear norm. In this paper, a novel method is proposed by replacing nuclear norm of RPCA with the truncated nuclear norm, which is named robust principal component analysis regularized by truncated nuclear norm (TRPCA). The method decomposes the observation matrix of genomic data into a low-rank matrix and a sparse matrix. Because the significant genes can be considered as sparse signals, the differentially expressed genes are viewed as the sparse perturbation signals. Thus, the differentially expressed genes can be identified according to the sparse matrix. The experimental results on The Cancer Genome Atlas data illustrate that the TRPCA method outperforms other state-of-the-art methods in the identification of differentially expressed genes.
Machine Learning Estimation of Atom Condensed Fukui Functions.
Zhang, Qingyou; Zheng, Fangfang; Zhao, Tanfeng; Qu, Xiaohui; Aires-de-Sousa, João
2016-02-01
To enable the fast estimation of atom condensed Fukui functions, machine learning algorithms were trained with databases of DFT pre-calculated values for ca. 23,000 atoms in organic molecules. The problem was approached as the ranking of atom types with the Bradley-Terry (BT) model, and as the regression of the Fukui function. Random Forests (RF) were trained to predict the condensed Fukui function, to rank atoms in a molecule, and to classify atoms as high/low Fukui function. Atomic descriptors were based on counts of atom types in spheres around the kernel atom. The BT coefficients assigned to atom types enabled the identification (93-94 % accuracy) of the atom with the highest Fukui function in pairs of atoms in the same molecule with differences ≥0.1. In whole molecules, the atom with the top Fukui function could be recognized in ca. 50 % of the cases and, on the average, about 3 of the top 4 atoms could be recognized in a shortlist of 4. Regression RF yielded predictions for test sets with R(2) =0.68-0.69, improving the ability of BT coefficients to rank atoms in a molecule. Atom classification (as high/low Fukui function) was obtained with RF with sensitivity of 55-61 % and specificity of 94-95 %. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Proctor, Enola K.; Hasche, Leslie; Morrow-Howell, Nancy; Shumway, Martha; Snell, Grace
2009-01-01
Objective Depression often co-occurs with other conditions that may pose competing demands to depression care, particularly in later life. This study examined older adults’ perceptions of depression among cooccurring social, medical, and functional problems and compared the priority of depression with that of other problems. Methods The study’s purposeful sample comprised 49 adults age 60 or older with a history of depression and in publicly funded community long-term care. Fourpart, mixed-methods interviews sought to capture participants’ perceptions of life problems as well as the priority they placed on depression. Methods included standardized depression screening, semistructured qualitative interviews, listing of problems, and qualitative and quantitative analysis of problem rankings. Results Most participants identified health, functional, and psychosocial problems co-occurring with depressive symptoms. Depression was ranked low among the co-occurring conditions; 6% ranked depression as the most important of their problems, whereas 45% ranked it last. Relative rank scores for problems were remarkably similar, with the notable exception of depression, which was ranked lowest of all problems. Participants did not see depression as a high priority compared with co-occurring problems, particularly psychosocial ones. Conclusions Effective and durable improvements to mental health care must be shaped by an understanding of client perceptions and priorities. Motivational interviewing, health education, and assessment of treatment priorities may be necessary in helping older adults value and accept depression care. Nonspecialty settings of care may effectively link depression treatment to other services, thereby increasing receptivity to mental health services. PMID:18511588
Minimizing the semantic gap in biomedical content-based image retrieval
NASA Astrophysics Data System (ADS)
Guan, Haiying; Antani, Sameer; Long, L. Rodney; Thoma, George R.
2010-03-01
A major challenge in biomedical Content-Based Image Retrieval (CBIR) is to achieve meaningful mappings that minimize the semantic gap between the high-level biomedical semantic concepts and the low-level visual features in images. This paper presents a comprehensive learning-based scheme toward meeting this challenge and improving retrieval quality. The article presents two algorithms: a learning-based feature selection and fusion algorithm and the Ranking Support Vector Machine (Ranking SVM) algorithm. The feature selection algorithm aims to select 'good' features and fuse them using different similarity measurements to provide a better representation of the high-level concepts with the low-level image features. Ranking SVM is applied to learn the retrieval rank function and associate the selected low-level features with query concepts, given the ground-truth ranking of the training samples. The proposed scheme addresses four major issues in CBIR to improve the retrieval accuracy: image feature extraction, selection and fusion, similarity measurements, the association of the low-level features with high-level concepts, and the generation of the rank function to support high-level semantic image retrieval. It models the relationship between semantic concepts and image features, and enables retrieval at the semantic level. We apply it to the problem of vertebra shape retrieval from a digitized spine x-ray image set collected by the second National Health and Nutrition Examination Survey (NHANES II). The experimental results show an improvement of up to 41.92% in the mean average precision (MAP) over conventional image similarity computation methods.
2014-01-01
Background Measures of similarity for chemical molecules have been developed since the dawn of chemoinformatics. Molecular similarity has been measured by a variety of methods including molecular descriptor based similarity, common molecular fragments, graph matching and 3D methods such as shape matching. Similarity measures are widespread in practice and have proven to be useful in drug discovery. Because of our interest in electrostatics and high throughput ligand-based virtual screening, we sought to exploit the information contained in atomic coordinates and partial charges of a molecule. Results A new molecular descriptor based on partial charges is proposed. It uses the autocorrelation function and linear binning to encode all atoms of a molecule into two rotation-translation invariant vectors. Combined with a scoring function, the descriptor allows to rank-order a database of compounds versus a query molecule. The proposed implementation is called ACPC (AutoCorrelation of Partial Charges) and released in open source. Extensive retrospective ligand-based virtual screening experiments were performed and other methods were compared with in order to validate the method and associated protocol. Conclusions While it is a simple method, it performed remarkably well in experiments. At an average speed of 1649 molecules per second, it reached an average median area under the curve of 0.81 on 40 different targets; hence validating the proposed protocol and implementation. PMID:24887178
Pharmacophore-Based Similarity Scoring for DOCK
2015-01-01
Pharmacophore modeling incorporates geometric and chemical features of known inhibitors and/or targeted binding sites to rationally identify and design new drug leads. In this study, we have encoded a three-dimensional pharmacophore matching similarity (FMS) scoring function into the structure-based design program DOCK. Validation and characterization of the method are presented through pose reproduction, crossdocking, and enrichment studies. When used alone, FMS scoring dramatically improves pose reproduction success to 93.5% (∼20% increase) and reduces sampling failures to 3.7% (∼6% drop) compared to the standard energy score (SGE) across 1043 protein–ligand complexes. The combined FMS+SGE function further improves success to 98.3%. Crossdocking experiments using FMS and FMS+SGE scoring, for six diverse protein families, similarly showed improvements in success, provided proper pharmacophore references are employed. For enrichment, incorporating pharmacophores during sampling and scoring, in most cases, also yield improved outcomes when docking and rank-ordering libraries of known actives and decoys to 15 systems. Retrospective analyses of virtual screenings to three clinical drug targets (EGFR, IGF-1R, and HIVgp41) using X-ray structures of known inhibitors as pharmacophore references are also reported, including a customized FMS scoring protocol to bias on selected regions in the reference. Overall, the results and fundamental insights gained from this study should benefit the docking community in general, particularly researchers using the new FMS method to guide computational drug discovery with DOCK. PMID:25229837
Robust Visual Tracking via Online Discriminative and Low-Rank Dictionary Learning.
Zhou, Tao; Liu, Fanghui; Bhaskar, Harish; Yang, Jie
2017-09-12
In this paper, we propose a novel and robust tracking framework based on online discriminative and low-rank dictionary learning. The primary aim of this paper is to obtain compact and low-rank dictionaries that can provide good discriminative representations of both target and background. We accomplish this by exploiting the recovery ability of low-rank matrices. That is if we assume that the data from the same class are linearly correlated, then the corresponding basis vectors learned from the training set of each class shall render the dictionary to become approximately low-rank. The proposed dictionary learning technique incorporates a reconstruction error that improves the reliability of classification. Also, a multiconstraint objective function is designed to enable active learning of a discriminative and robust dictionary. Further, an optimal solution is obtained by iteratively computing the dictionary, coefficients, and by simultaneously learning the classifier parameters. Finally, a simple yet effective likelihood function is implemented to estimate the optimal state of the target during tracking. Moreover, to make the dictionary adaptive to the variations of the target and background during tracking, an online update criterion is employed while learning the new dictionary. Experimental results on a publicly available benchmark dataset have demonstrated that the proposed tracking algorithm performs better than other state-of-the-art trackers.
Efficiently Selecting the Best Web Services
NASA Astrophysics Data System (ADS)
Goncalves, Marlene; Vidal, Maria-Esther; Regalado, Alfredo; Yacoubi Ayadi, Nadia
Emerging technologies and linking data initiatives have motivated the publication of a large number of datasets, and provide the basis for publishing Web services and tools to manage the available data. This wealth of resources opens a world of possibilities to satisfy user requests. However, Web services may have similar functionality and assess different performance; therefore, it is required to identify among the Web services that satisfy a user request, the ones with the best quality. In this paper we propose a hybrid approach that combines reasoning tasks with ranking techniques to aim at the selection of the Web services that best implement a user request. Web service functionalities are described in terms of input and output attributes annotated with existing ontologies, non-functionality is represented as Quality of Services (QoS) parameters, and user requests correspond to conjunctive queries whose sub-goals impose restrictions on the functionality and quality of the services to be selected. The ontology annotations are used in different reasoning tasks to infer service implicit properties and to augment the size of the service search space. Furthermore, QoS parameters are considered by a ranking metric to classify the services according to how well they meet a user non-functional condition. We assume that all the QoS parameters of the non-functional condition are equally important, and apply the Top-k Skyline approach to select the k services that best meet this condition. Our proposal relies on a two-fold solution which fires a deductive-based engine that performs different reasoning tasks to discover the services that satisfy the requested functionality, and an efficient implementation of the Top-k Skyline approach to compute the top-k services that meet the majority of the QoS constraints. Our Top-k Skyline solution exploits the properties of the Skyline Frequency metric and identifies the top-k services by just analyzing a subset of the services that meet the non-functional condition. We report on the effects of the proposed reasoning tasks, the quality of the top-k services selected by the ranking metric, and the performance of the proposed ranking techniques. Our results suggest that the number of services can be augmented by up two orders of magnitude. In addition, our ranking techniques are able to identify services that have the best values in at least half of the QoS parameters, while the performance is improved.
NASA Astrophysics Data System (ADS)
Fenner, Trevor; Kaufmann, Eric; Levene, Mark; Loizou, George
Human dynamics and sociophysics suggest statistical models that may explain and provide us with better insight into social phenomena. Contextual and selection effects tend to produce extreme values in the tails of rank-ordered distributions of both census data and district-level election outcomes. Models that account for this nonlinearity generally outperform linear models. Fitting nonlinear functions based on rank-ordering census and election data therefore improves the fit of aggregate voting models. This may help improve ecological inference, as well as election forecasting in majoritarian systems. We propose a generative multiplicative decrease model that gives rise to a rank-order distribution and facilitates the analysis of the recent UK EU referendum results. We supply empirical evidence that the beta-like survival function, which can be generated directly from our model, is a close fit to the referendum results, and also may have predictive value when covariate data are available.
Postprocessing of docked protein-ligand complexes using implicit solvation models.
Lindström, Anton; Edvinsson, Lotta; Johansson, Andreas; Andersson, C David; Andersson, Ida E; Raubacher, Florian; Linusson, Anna
2011-02-28
Molecular docking plays an important role in drug discovery as a tool for the structure-based design of small organic ligands for macromolecules. Possible applications of docking are identification of the bioactive conformation of a protein-ligand complex and the ranking of different ligands with respect to their strength of binding to a particular target. We have investigated the effect of implicit water on the postprocessing of binding poses generated by molecular docking using MM-PB/GB-SA (molecular mechanics Poisson-Boltzmann and generalized Born surface area) methodology. The investigation was divided into three parts: geometry optimization, pose selection, and estimation of the relative binding energies of docked protein-ligand complexes. Appropriate geometry optimization afforded more accurate binding poses for 20% of the complexes investigated. The time required for this step was greatly reduced by minimizing the energy of the binding site using GB solvation models rather than minimizing the entire complex using the PB model. By optimizing the geometries of docking poses using the GB(HCT+SA) model then calculating their free energies of binding using the PB implicit solvent model, binding poses similar to those observed in crystal structures were obtained. Rescoring of these poses according to their calculated binding energies resulted in improved correlations with experimental binding data. These correlations could be further improved by applying the postprocessing to several of the most highly ranked poses rather than focusing exclusively on the top-scored pose. The postprocessing protocol was successfully applied to the analysis of a set of Factor Xa inhibitors and a set of glycopeptide ligands for the class II major histocompatibility complex (MHC) A(q) protein. These results indicate that the protocol for the postprocessing of docked protein-ligand complexes developed in this paper may be generally useful for structure-based design in drug discovery.
SU-F-R-24: Identifying Prognostic Imaging Biomarkers in Early Stage Lung Cancer Using Radiomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zeng, X; Wu, J; Cui, Y
2016-06-15
Purpose: Patients diagnosed with early stage lung cancer have favorable outcomes when treated with surgery or stereotactic radiotherapy. However, a significant proportion (∼20%) of patients will develop metastatic disease and eventually die of the disease. The purpose of this work is to identify quantitative imaging biomarkers from CT for predicting overall survival in early stage lung cancer. Methods: In this institutional review board-approved HIPPA-compliant retrospective study, we retrospectively analyzed the diagnostic CT scans of 110 patients with early stage lung cancer. Data from 70 patients were used for training/discovery purposes, while those of remaining 40 patients were used for independentmore » validation. We extracted 191 radiomic features, including statistical, histogram, morphological, and texture features. Cox proportional hazard regression model, coupled with the least absolute shrinkage and selection operator (LASSO), was used to predict overall survival based on the radiomic features. Results: The optimal prognostic model included three image features from the Law’s feature and wavelet texture. In the discovery cohort, this model achieved a concordance index or CI=0.67, and it separated the low-risk from high-risk groups in predicting overall survival (hazard ratio=2.72, log-rank p=0.007). In the independent validation cohort, this radiomic signature achieved a CI=0.62, and significantly stratified the low-risk and high-risk groups in terms of overall survival (hazard ratio=2.20, log-rank p=0.042). Conclusion: We identified CT imaging characteristics associated with overall survival in early stage lung cancer. If prospectively validated, this could potentially help identify high-risk patients who might benefit from adjuvant systemic therapy.« less
Genomes2Drugs: Identifies Target Proteins and Lead Drugs from Proteome Data
Toomey, David; Hoppe, Heinrich C.; Brennan, Marian P.; Nolan, Kevin B.; Chubb, Anthony J.
2009-01-01
Background Genome sequencing and bioinformatics have provided the full hypothetical proteome of many pathogenic organisms. Advances in microarray and mass spectrometry have also yielded large output datasets of possible target proteins/genes. However, the challenge remains to identify new targets for drug discovery from this wealth of information. Further analysis includes bioinformatics and/or molecular biology tools to validate the findings. This is time consuming and expensive, and could fail to yield novel drugs if protein purification and crystallography is impossible. To pre-empt this, a researcher may want to rapidly filter the output datasets for proteins that show good homology to proteins that have already been structurally characterised or proteins that are already targets for known drugs. Critically, those researchers developing novel antibiotics need to select out the proteins that show close homology to any human proteins, as future inhibitors are likely to cross-react with the host protein, causing off-target toxicity effects later in clinical trials. Methodology/Principal Findings To solve many of these issues, we have developed a free online resource called Genomes2Drugs which ranks sequences to identify proteins that are (i) homologous to previously crystallized proteins or (ii) targets of known drugs, but are (iii) not homologous to human proteins. When tested using the Plasmodium falciparum malarial genome the program correctly enriched the ranked list of proteins with known drug target proteins. Conclusions/Significance Genomes2Drugs rapidly identifies proteins that are likely to succeed in drug discovery pipelines. This free online resource helps in the identification of potential drug targets. Importantly, the program further highlights proteins that are likely to be inhibited by FDA-approved drugs. These drugs can then be rapidly moved into Phase IV clinical studies under ‘change-of-application’ patents. PMID:19593435
MicroRNA expression in benign breast tissue and risk of subsequent invasive breast cancer.
Rohan, Thomas; Ye, Kenny; Wang, Yihong; Glass, Andrew G; Ginsberg, Mindy; Loudig, Olivier
2018-01-01
MicroRNAs are endogenous, small non-coding RNAs that control gene expression by directing their target mRNAs for degradation and/or posttranscriptional repression. Abnormal expression of microRNAs is thought to contribute to the development and progression of cancer. A history of benign breast disease (BBD) is associated with increased risk of subsequent breast cancer. However, no large-scale study has examined the association between microRNA expression in BBD tissue and risk of subsequent invasive breast cancer (IBC). We conducted discovery and validation case-control studies nested in a cohort of 15,395 women diagnosed with BBD in a large health plan between 1971 and 2006 and followed to mid-2015. Cases were women with BBD who developed subsequent IBC; controls were matched 1:1 to cases on age, age at diagnosis of BBD, and duration of plan membership. The discovery stage (316 case-control pairs) entailed use of the Illumina MicroRNA Expression Profiling Assay (in duplicate) to identify breast cancer-associated microRNAs. MicroRNAs identified at this stage were ranked by the strength of the correlation between Illumina array and quantitative PCR results for 15 case-control pairs. The top ranked 14 microRNAs entered the validation stage (165 case-control pairs) which was conducted using quantitative PCR (in triplicate). In both stages, linear regression was used to evaluate the association between the mean expression level of each microRNA (response variable) and case-control status (independent variable); paired t-tests were also used in the validation stage. None of the 14 validation stage microRNAs was associated with breast cancer risk. The results of this study suggest that microRNA expression in benign breast tissue does not influence the risk of subsequent IBC.
MicroRNA expression in benign breast tissue and risk of subsequent invasive breast cancer
Ye, Kenny; Wang, Yihong; Ginsberg, Mindy; Loudig, Olivier
2018-01-01
MicroRNAs are endogenous, small non-coding RNAs that control gene expression by directing their target mRNAs for degradation and/or posttranscriptional repression. Abnormal expression of microRNAs is thought to contribute to the development and progression of cancer. A history of benign breast disease (BBD) is associated with increased risk of subsequent breast cancer. However, no large-scale study has examined the association between microRNA expression in BBD tissue and risk of subsequent invasive breast cancer (IBC). We conducted discovery and validation case-control studies nested in a cohort of 15,395 women diagnosed with BBD in a large health plan between 1971 and 2006 and followed to mid-2015. Cases were women with BBD who developed subsequent IBC; controls were matched 1:1 to cases on age, age at diagnosis of BBD, and duration of plan membership. The discovery stage (316 case-control pairs) entailed use of the Illumina MicroRNA Expression Profiling Assay (in duplicate) to identify breast cancer-associated microRNAs. MicroRNAs identified at this stage were ranked by the strength of the correlation between Illumina array and quantitative PCR results for 15 case-control pairs. The top ranked 14 microRNAs entered the validation stage (165 case-control pairs) which was conducted using quantitative PCR (in triplicate). In both stages, linear regression was used to evaluate the association between the mean expression level of each microRNA (response variable) and case-control status (independent variable); paired t-tests were also used in the validation stage. None of the 14 validation stage microRNAs was associated with breast cancer risk. The results of this study suggest that microRNA expression in benign breast tissue does not influence the risk of subsequent IBC. PMID:29432432
Web service discovery among large service pools utilising semantic similarity and clustering
NASA Astrophysics Data System (ADS)
Chen, Fuzan; Li, Minqiang; Wu, Harris; Xie, Lingli
2017-03-01
With the rapid development of electronic business, Web services have attracted much attention in recent years. Enterprises can combine individual Web services to provide new value-added services. An emerging challenge is the timely discovery of close matches to service requests among large service pools. In this study, we first define a new semantic similarity measure combining functional similarity and process similarity. We then present a service discovery mechanism that utilises the new semantic similarity measure for service matching. All the published Web services are pre-grouped into functional clusters prior to the matching process. For a user's service request, the discovery mechanism first identifies matching services clusters and then identifies the best matching Web services within these matching clusters. Experimental results show that the proposed semantic discovery mechanism performs better than a conventional lexical similarity-based mechanism.
Dubuc, Constance; Coyne, Sean P.; Maestripieri, Dario
2013-01-01
The adaptive function of male masturbation is still poorly understood, despite its high prevalence in humans and other animals. In non-human primates, male masturbation is most frequent among anthropoid monkeys and apes living in multimale-multifemale groups with a promiscuous mating system. In these species, male masturbation may be a non-functional by-product of high sexual arousal or be adaptive by providing advantages in terms of sperm competition or by decreasing the risk of sexually transmitted infections. We investigated the possible functional significance of male masturbation using behavioral data collected on 21 free-ranging male rhesus macaques (Macaca mulatta) at the peak of the mating season. We found some evidence that masturbation is linked to low mating opportunities: regardless of rank, males were most likely to be observed masturbating on days in which they were not observed mating, and lower-ranking males mated less and tended to masturbate more frequently than higher-ranking males. These results echo the findings obtained for two other species of macaques, but contrast those obtained in red colobus monkeys (Procolobus badius) and Cape ground squirrels (Xerus inauris). Interestingly, however, male masturbation events ended with ejaculation in only 15% of the observed masturbation time, suggesting that new hypotheses are needed to explain masturbation in this species. More studies are needed to establish whether male masturbation is adaptive and whether it serves similar or different functions in different sexually promiscuous species. PMID:24187414
Dubuc, Constance; Coyne, Sean P; Maestripieri, Dario
2013-11-01
The adaptive function of male masturbation is still poorly understood, despite its high prevalence in humans and other animals. In non-human primates, male masturbation is most frequent among anthropoid monkeys and apes living in multimale-multifemale groups with a promiscuous mating system. In these species, male masturbation may be a non-functional by-product of high sexual arousal or be adaptive by providing advantages in terms of sperm competition or by decreasing the risk of sexually transmitted infections. We investigated the possible functional significance of male masturbation using behavioral data collected on 21 free-ranging male rhesus macaques ( Macaca mulatta ) at the peak of the mating season. We found some evidence that masturbation is linked to low mating opportunities: regardless of rank, males were most likely to be observed masturbating on days in which they were not observed mating, and lower-ranking males mated less and tended to masturbate more frequently than higher-ranking males. These results echo the findings obtained for two other species of macaques, but contrast those obtained in red colobus monkeys ( Procolobus badius ) and Cape ground squirrels ( Xerus inauris ). Interestingly, however, male masturbation events ended with ejaculation in only 15% of the observed masturbation time, suggesting that new hypotheses are needed to explain masturbation in this species. More studies are needed to establish whether male masturbation is adaptive and whether it serves similar or different functions in different sexually promiscuous species.
García-Algarra, Javier; Pastor, Juan Manuel; Iriondo, José María
2017-01-01
Background Network analysis has become a relevant approach to analyze cascading species extinctions resulting from perturbations on mutualistic interactions as a result of environmental change. In this context, it is essential to be able to point out key species, whose stability would prevent cascading extinctions, and the consequent loss of ecosystem function. In this study, we aim to explain how the k-core decomposition sheds light on the understanding the robustness of bipartite mutualistic networks. Methods We defined three k-magnitudes based on the k-core decomposition: k-radius, k-degree, and k-risk. The first one, k-radius, quantifies the distance from a node to the innermost shell of the partner guild, while k-degree provides a measure of centrality in the k-shell based decomposition. k-risk is a way to measure the vulnerability of a network to the loss of a particular species. Using these magnitudes we analyzed 89 mutualistic networks involving plant pollinators or seed dispersers. Two static extinction procedures were implemented in which k-degree and k-risk were compared against other commonly used ranking indexes, as for example MusRank, explained in detail in Material and Methods. Results When extinctions take place in both guilds, k-risk is the best ranking index if the goal is to identify the key species to preserve the giant component. When species are removed only in the primary class and cascading extinctions are measured in the secondary class, the most effective ranking index to identify the key species to preserve the giant component is k-degree. However, MusRank index was more effective when the goal is to identify the key species to preserve the greatest species richness in the second class. Discussion The k-core decomposition offers a new topological view of the structure of mutualistic networks. The new k-radius, k-degree and k-risk magnitudes take advantage of its properties and provide new insight into the structure of mutualistic networks. The k-risk and k-degree ranking indexes are especially effective approaches to identify key species to preserve when conservation practitioners focus on the preservation of ecosystem functionality over species richness. PMID:28533969
García-Algarra, Javier; Pastor, Juan Manuel; Iriondo, José María; Galeano, Javier
2017-01-01
Network analysis has become a relevant approach to analyze cascading species extinctions resulting from perturbations on mutualistic interactions as a result of environmental change. In this context, it is essential to be able to point out key species, whose stability would prevent cascading extinctions, and the consequent loss of ecosystem function. In this study, we aim to explain how the k -core decomposition sheds light on the understanding the robustness of bipartite mutualistic networks. We defined three k -magnitudes based on the k -core decomposition: k -radius, k -degree, and k -risk. The first one, k -radius, quantifies the distance from a node to the innermost shell of the partner guild, while k -degree provides a measure of centrality in the k -shell based decomposition. k -risk is a way to measure the vulnerability of a network to the loss of a particular species. Using these magnitudes we analyzed 89 mutualistic networks involving plant pollinators or seed dispersers. Two static extinction procedures were implemented in which k -degree and k -risk were compared against other commonly used ranking indexes, as for example MusRank, explained in detail in Material and Methods. When extinctions take place in both guilds, k -risk is the best ranking index if the goal is to identify the key species to preserve the giant component. When species are removed only in the primary class and cascading extinctions are measured in the secondary class, the most effective ranking index to identify the key species to preserve the giant component is k -degree. However, MusRank index was more effective when the goal is to identify the key species to preserve the greatest species richness in the second class. The k -core decomposition offers a new topological view of the structure of mutualistic networks. The new k -radius, k -degree and k -risk magnitudes take advantage of its properties and provide new insight into the structure of mutualistic networks. The k -risk and k -degree ranking indexes are especially effective approaches to identify key species to preserve when conservation practitioners focus on the preservation of ecosystem functionality over species richness.
ERIC Educational Resources Information Center
Tien, Flora F.; Blackburn, Robert T.
1996-01-01
A study explored the relationship between the traditional system of college faculty rank and faculty research productivity from the perspectives of behavioral reinforcement theory and selection function. Six hypotheses were generated and tested, using data from a 1989 national faculty survey. Results failed to support completely either the…
22 CFR 11.11 - Mid-level Foreign Service officer career candidate appointments.
Code of Federal Regulations, 2014 CFR
2014-04-01
... cannot reasonably be met from within the ranks of the career service, including by special training of... rank-order register for the class and functional specialty for which the candidate has been found... aptitude for learning them. A candidate may be appointed without first having passed an examination in a...
22 CFR 11.11 - Mid-level Foreign Service officer career candidate appointments.
Code of Federal Regulations, 2012 CFR
2012-04-01
... cannot reasonably be met from within the ranks of the career service, including by special training of... rank-order register for the class and functional specialty for which the candidate has been found... aptitude for learning them. A candidate may be appointed without first having passed an examination in a...
22 CFR 11.11 - Mid-level Foreign Service officer career candidate appointments.
Code of Federal Regulations, 2011 CFR
2011-04-01
... cannot reasonably be met from within the ranks of the career service, including by special training of... rank-order register for the class and functional specialty for which the candidate has been found... aptitude for learning them. A candidate may be appointed without first having passed an examination in a...
Siochi, R
2012-06-01
To develop a quality initiative discovery framework using process improvement techniques, software tools and operating principles. Process deviations are entered into a radiotherapy incident reporting database. Supervisors use an in-house Event Analysis System (EASy) to discuss incidents with staff. Major incidents are analyzed with an in-house Fault Tree Analysis (FTA). A meta-Analysis is performed using association, text mining, key word clustering, and differential frequency analysis. A key operating principle encourages the creation of forcing functions via rapid application development. 504 events have been logged this past year. The results for the key word analysis indicate that the root cause for the top ranked key words was miscommunication. This was also the root cause found from association analysis, where 24% of the time that an event involved a physician it also involved a nurse. Differential frequency analysis revealed that sharp peaks at week 27 were followed by 3 major incidents, two of which were dose related. The peak was largely due to the front desk which caused distractions in other areas. The analysis led to many PI projects but there is still a major systematic issue with the use of forms. The solution we identified is to implement Smart Forms to perform error checking and interlocking. Our first initiative replaced our daily QA checklist with a form that uses custom validation routines, preventing therapists from proceeding with treatments until out of tolerance conditions are corrected. PITSTOP has increased the number of quality initiatives in our department, and we have discovered or confirmed common underlying causes of a variety of seemingly unrelated errors. It has motivated the replacement of all forms with smart forms. © 2012 American Association of Physicists in Medicine.
Zafar, Atif; Ahmad, Sabahuddin; Rizvi, Asim; Ahmad, Masood
2015-01-01
Schistosomiasis is a major endemic disease known for excessive mortality and morbidity in developing countries. Because praziquantel is the only drug available for its treatment, the risk of drug resistance emphasizes the need to discover new drugs for this disease. Cathepsin SmCL1 is the critical target for drug design due to its essential role in the digestion of host proteins for growth and development of Schistosoma mansoni. Inhibiting the function of SmCL1 could control the wide spread of infections caused by S. mansoni in humans. With this objective, a homology modeling approach was used to obtain theoretical three-dimensional (3D) structure of SmCL1. In order to find the potential inhibitors of SmCL1, a plethora of in silico techniques were employed to screen non-peptide inhibitors against SmCL1 via structure-based drug discovery protocol. Receiver operating characteristic (ROC) curve analysis and molecular dynamics (MD) simulation were performed on the results of docked protein-ligand complexes to identify top ranking molecules against the modelled 3D structure of SmCL1. MD simulation results suggest the phytochemical Simalikalactone-D as a potential lead against SmCL1, whose pharmacophore model may be useful for future screening of potential drug molecules. To conclude, this is the first report to discuss the virtual screening of non-peptide inhibitors against SmCL1 of S. mansoni, with significant therapeutic potential. Results presented herein provide a valuable contribution to identify the significant leads and further derivatize them to suitable drug candidates for antischistosomal therapy. PMID:25933436
The evolution of tribospheny and the antiquity of mammalian clades.
Woodburne, Michael O; Rich, Thomas H; Springer, Mark S
2003-08-01
The evolution of tribosphenic molars is a key innovation in the history of Mammalia. Tribospheny allows for both shearing and grinding occlusal functions. Marsupials and placentals are advanced tribosphenic mammals (i.e., Theria) that show additional modifications of the tribosphenic dentition including loss of the distal metacristid and development of double-rank postvallum/prevallid shear. The recent discovery of Eomaia [Nature 416 (2002) 816], regarded as the oldest eutherian mammal, implies that the marsupial-placental split is at least 125 million years old. The conventional scenario for the evolution of tribosphenic and therian mammals hypothesizes that each group evolved once, in the northern hemisphere, and is based on a predominantly Laurasian fossil record. With the recent discovery of the oldest tribosphenic mammal (Ambondro) from the Mesozoic of Gondwana, Flynn et al. [Nature 401 (1999) 57] suggested that tribospheny evolved in Gondwana rather than in Laurasia. Luo et al. [Nature 409 (2001) 53; Acta Palaeontol. Pol. 47 (2002) 1] argued for independent origins of tribospheny in northern (Boreosphenida) and southern (Australosphenida) hemisphere clades, with the latter including Ambondro, ausktribosphenids, and monotremes. Here, we present cladistic evidence for a single origin of tribosphenic molars. Further, Ambondro may be a stem eutherian, making the split between marsupials and placentals at least 167 m.y. old. To test this hypothesis, we used the relaxed molecular clock approach of Thorne/Kishino with amino acid data sets for BRCA1 [J. Mammal. Evol. 8 (2001) 239] and the IGF2 receptor [Mammal. Genome 12 (2001) 513]. Point estimates for the marsupial-placental split were 182-190 million years based on BRCA1 and 185-187 million years based on the IGF2 receptor. These estimates are fully compatible with the results of our cladistic analyses.
Penrod, Nadia M; Moore, Jason H
2014-02-05
The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. We use this approach to prioritize genes as drug target candidates in a set of ER⁺ breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER⁺ breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use.
2014-01-01
Background The demand for novel molecularly targeted drugs will continue to rise as we move forward toward the goal of personalizing cancer treatment to the molecular signature of individual tumors. However, the identification of targets and combinations of targets that can be safely and effectively modulated is one of the greatest challenges facing the drug discovery process. A promising approach is to use biological networks to prioritize targets based on their relative positions to one another, a property that affects their ability to maintain network integrity and propagate information-flow. Here, we introduce influence networks and demonstrate how they can be used to generate influence scores as a network-based metric to rank genes as potential drug targets. Results We use this approach to prioritize genes as drug target candidates in a set of ER + breast tumor samples collected during the course of neoadjuvant treatment with the aromatase inhibitor letrozole. We show that influential genes, those with high influence scores, tend to be essential and include a higher proportion of essential genes than those prioritized based on their position (i.e. hubs or bottlenecks) within the same network. Additionally, we show that influential genes represent novel biologically relevant drug targets for the treatment of ER + breast cancers. Moreover, we demonstrate that gene influence differs between untreated tumors and residual tumors that have adapted to drug treatment. In this way, influence scores capture the context-dependent functions of genes and present the opportunity to design combination treatment strategies that take advantage of the tumor adaptation process. Conclusions Influence networks efficiently find essential genes as promising drug targets and combinations of targets to inform the development of molecularly targeted drugs and their use. PMID:24495353
Automated Docking Screens: A Feasibility Study
2009-01-01
Molecular docking is the most practical approach to leverage protein structure for ligand discovery, but the technique retains important liabilities that make it challenging to deploy on a large scale. We have therefore created an expert system, DOCK Blaster, to investigate the feasibility of full automation. The method requires a PDB code, sometimes with a ligand structure, and from that alone can launch a full screen of large libraries. A critical feature is self-assessment, which estimates the anticipated reliability of the automated screening results using pose fidelity and enrichment. Against common benchmarks, DOCK Blaster recapitulates the crystal ligand pose within 2 Å rmsd 50−60% of the time; inferior to an expert, but respectrable. Half the time the ligand also ranked among the top 5% of 100 physically matched decoys chosen on the fly. Further tests were undertaken culminating in a study of 7755 eligible PDB structures. In 1398 cases, the redocked ligand ranked in the top 5% of 100 property-matched decoys while also posing within 2 Å rmsd, suggesting that unsupervised prospective docking is viable. DOCK Blaster is available at http://blaster.docking.org. PMID:19719084
Automated docking screens: a feasibility study.
Irwin, John J; Shoichet, Brian K; Mysinger, Michael M; Huang, Niu; Colizzi, Francesco; Wassam, Pascal; Cao, Yiqun
2009-09-24
Molecular docking is the most practical approach to leverage protein structure for ligand discovery, but the technique retains important liabilities that make it challenging to deploy on a large scale. We have therefore created an expert system, DOCK Blaster, to investigate the feasibility of full automation. The method requires a PDB code, sometimes with a ligand structure, and from that alone can launch a full screen of large libraries. A critical feature is self-assessment, which estimates the anticipated reliability of the automated screening results using pose fidelity and enrichment. Against common benchmarks, DOCK Blaster recapitulates the crystal ligand pose within 2 A rmsd 50-60% of the time; inferior to an expert, but respectrable. Half the time the ligand also ranked among the top 5% of 100 physically matched decoys chosen on the fly. Further tests were undertaken culminating in a study of 7755 eligible PDB structures. In 1398 cases, the redocked ligand ranked in the top 5% of 100 property-matched decoys while also posing within 2 A rmsd, suggesting that unsupervised prospective docking is viable. DOCK Blaster is available at http://blaster.docking.org .
An ensemble rank learning approach for gene prioritization.
Lee, Po-Feng; Soo, Von-Wun
2013-01-01
Several different computational approaches have been developed to solve the gene prioritization problem. We intend to use the ensemble boosting learning techniques to combine variant computational approaches for gene prioritization in order to improve the overall performance. In particular we add a heuristic weighting function to the Rankboost algorithm according to: 1) the absolute ranks generated by the adopted methods for a certain gene, and 2) the ranking relationship between all gene-pairs from each prioritization result. We select 13 known prostate cancer genes in OMIM database as training set and protein coding gene data in HGNC database as test set. We adopt the leave-one-out strategy for the ensemble rank boosting learning. The experimental results show that our ensemble learning approach outperforms the four gene-prioritization methods in ToppGene suite in the ranking results of the 13 known genes in terms of mean average precision, ROC and AUC measures.
Juang, K W; Lee, D Y; Ellsworth, T R
2001-01-01
The spatial distribution of a pollutant in contaminated soils is usually highly skewed. As a result, the sample variogram often differs considerably from its regional counterpart and the geostatistical interpolation is hindered. In this study, rank-order geostatistics with standardized rank transformation was used for the spatial interpolation of pollutants with a highly skewed distribution in contaminated soils when commonly used nonlinear methods, such as logarithmic and normal-scored transformations, are not suitable. A real data set of soil Cd concentrations with great variation and high skewness in a contaminated site of Taiwan was used for illustration. The spatial dependence of ranks transformed from Cd concentrations was identified and kriging estimation was readily performed in the standardized-rank space. The estimated standardized rank was back-transformed into the concentration space using the middle point model within a standardized-rank interval of the empirical distribution function (EDF). The spatial distribution of Cd concentrations was then obtained. The probability of Cd concentration being higher than a given cutoff value also can be estimated by using the estimated distribution of standardized ranks. The contour maps of Cd concentrations and the probabilities of Cd concentrations being higher than the cutoff value can be simultaneously used for delineation of hazardous areas of contaminated soils.
The Alliance Hypothesis for Human Friendship
DeScioli, Peter; Kurzban, Robert
2009-01-01
Background Exploration of the cognitive systems underlying human friendship will be advanced by identifying the evolved functions these systems perform. Here we propose that human friendship is caused, in part, by cognitive mechanisms designed to assemble support groups for potential conflicts. We use game theory to identify computations about friends that can increase performance in multi-agent conflicts. This analysis suggests that people would benefit from: 1) ranking friends, 2) hiding friend-ranking, and 3) ranking friends according to their own position in partners' rankings. These possible tactics motivate the hypotheses that people possess egocentric and allocentric representations of the social world, that people are motivated to conceal this information, and that egocentric friend-ranking is determined by allocentric representations of partners' friend-rankings (more than others' traits). Methodology/Principal Findings We report results from three studies that confirm predictions derived from the alliance hypothesis. Our main empirical finding, replicated in three studies, was that people's rankings of their ten closest friends were predicted by their own perceived rank among their partners' other friends. This relationship remained strong after controlling for a variety of factors such as perceived similarity, familiarity, and benefits. Conclusions/Significance Our results suggest that the alliance hypothesis merits further attention as a candidate explanation for human friendship. PMID:19492066
Docking screens: right for the right reasons?
Kolb, Peter; Irwin, John J
2009-01-01
Whereas docking screens have emerged as the most practical way to use protein structure for ligand discovery, an inconsistent track record raises questions about how well docking actually works. In its favor, a growing number of publications report the successful discovery of new ligands, often supported by experimental affinity data and controls for artifacts. Few reports, however, actually test the underlying structural hypotheses that docking makes. To be successful and not just lucky, prospective docking must not only rank a true ligand among the top scoring compounds, it must also correctly orient the ligand so the score it receives is biophysically sound. If the correct binding pose is not predicted, a skeptic might well infer that the discovery was serendipitous. Surveying over 15 years of the docking literature, we were surprised to discover how rarely sufficient evidence is presented to establish whether docking actually worked for the right reasons. The paucity of experimental tests of theoretically predicted poses undermines confidence in a technique that has otherwise become widely accepted. Of course, solving a crystal structure is not always possible, and even when it is, it can be a lot of work, and is not readily accessible to all groups. Even when a structure can be determined, investigators may prefer to gloss over an erroneous structural prediction to better focus on their discovery. Still, the absence of a direct test of theory by experiment is a loss for method developers seeking to understand and improve docking methods. We hope this review will motivate investigators to solve structures and compare them with their predictions whenever possible, to advance the field.
Bayesian Inference of High-Dimensional Dynamical Ocean Models
NASA Astrophysics Data System (ADS)
Lin, J.; Lermusiaux, P. F. J.; Lolla, S. V. T.; Gupta, A.; Haley, P. J., Jr.
2015-12-01
This presentation addresses a holistic set of challenges in high-dimension ocean Bayesian nonlinear estimation: i) predict the probability distribution functions (pdfs) of large nonlinear dynamical systems using stochastic partial differential equations (PDEs); ii) assimilate data using Bayes' law with these pdfs; iii) predict the future data that optimally reduce uncertainties; and (iv) rank the known and learn the new model formulations themselves. Overall, we allow the joint inference of the state, equations, geometry, boundary conditions and initial conditions of dynamical models. Examples are provided for time-dependent fluid and ocean flows, including cavity, double-gyre and Strait flows with jets and eddies. The Bayesian model inference, based on limited observations, is illustrated first by the estimation of obstacle shapes and positions in fluid flows. Next, the Bayesian inference of biogeochemical reaction equations and of their states and parameters is presented, illustrating how PDE-based machine learning can rigorously guide the selection and discovery of complex ecosystem models. Finally, the inference of multiscale bottom gravity current dynamics is illustrated, motivated in part by classic overflows and dense water formation sites and their relevance to climate monitoring and dynamics. This is joint work with our MSEAS group at MIT.
Xu, Youjun; Wang, Shiwei; Hu, Qiwan; Gao, Shuaishi; Ma, Xiaomin; Zhang, Weilin; Shen, Yihang; Chen, Fangjin; Lai, Luhua; Pei, Jianfeng
2018-05-10
CavityPlus is a web server that offers protein cavity detection and various functional analyses. Using protein three-dimensional structural information as the input, CavityPlus applies CAVITY to detect potential binding sites on the surface of a given protein structure and rank them based on ligandability and druggability scores. These potential binding sites can be further analysed using three submodules, CavPharmer, CorrSite, and CovCys. CavPharmer uses a receptor-based pharmacophore modelling program, Pocket, to automatically extract pharmacophore features within cavities. CorrSite identifies potential allosteric ligand-binding sites based on motion correlation analyses between cavities. CovCys automatically detects druggable cysteine residues, which is especially useful to identify novel binding sites for designing covalent allosteric ligands. Overall, CavityPlus provides an integrated platform for analysing comprehensive properties of protein binding cavities. Such analyses are useful for many aspects of drug design and discovery, including target selection and identification, virtual screening, de novo drug design, and allosteric and covalent-binding drug design. The CavityPlus web server is freely available at http://repharma.pku.edu.cn/cavityplus or http://www.pkumdl.cn/cavityplus.
STOPGAP: a database for systematic target opportunity assessment by genetic association predictions.
Shen, Judong; Song, Kijoung; Slater, Andrew J; Ferrero, Enrico; Nelson, Matthew R
2017-09-01
We developed the STOPGAP (Systematic Target OPportunity assessment by Genetic Association Predictions) database, an extensive catalog of human genetic associations mapped to effector gene candidates. STOPGAP draws on a variety of publicly available GWAS associations, linkage disequilibrium (LD) measures, functional genomic and variant annotation sources. Algorithms were developed to merge the association data, partition associations into non-overlapping LD clusters, map variants to genes and produce a variant-to-gene score used to rank the relative confidence among potential effector genes. This database can be used for a multitude of investigations into the genes and genetic mechanisms underlying inter-individual variation in human traits, as well as supporting drug discovery applications. Shell, R, Perl and Python scripts and STOPGAP R data files (version 2.5.1 at publication) are available at https://github.com/StatGenPRD/STOPGAP . Some of the most useful STOPGAP fields can be queried through an R Shiny web application at http://stopgapwebapp.com . matthew.r.nelson@gsk.com. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Linking Terrigenous Sediment Delivery to Declines in Coral ...
Worldwide coral reef conditions continue to decline despite the valuable socioeconomic benefits of these ecosystems. There is growing recognition that quantifying reefs in terms reflecting what stakeholders value is vital for comparing inherent tradeoffs among coastal management decisions. Terrestrial sediment runoff ranks high as a stressor to coral reefs and is a key concern in Puerto Rico where reefs are among the most threatened in the Caribbean. This research aimed to identify the degree to which sediment runoff impacts production of coral reef ecosystem services and the potential for watershed management actions to improve these services. Ecosystem service production functions were applied to map and translate metrics of ecological reef condition into ecosystem service production under a gradient of increasing sediment delivery. We found that higher sediment delivery decreased provisioning of most ecosystem services, including ecosystem integrity, bioprospecting discovery, and reef-based recreational opportunities and fisheries production. However, shoreline protection and services with a strong contribution from non-reef habitats (e.g., mangroves, seagrasses) were higher in locations with high sediment delivery, although there was a strong inshore effect suggesting the influence of distance to shore, depth, and inshore habitats. Differences among services may indicate potential tradeoffs and the need to consider habitat connectivity, nursery habitat, acce
Intrinsic classes in the Union of European Football Associations soccer team ranking
NASA Astrophysics Data System (ADS)
Ausloos, Marcel
2014-11-01
A strong structural regularity of classes is found in soccer teams ranked by the Union of European Football Associations (UEFA) for the time interval 2009-2014. It concerns 424 to 453 teams according to the 5 competition seasons. The analysis is based on the rank-size theory considerations, the size being the UEFA coefficient at the end of a season. Three classes emerge: (i) the few "top" teams, (ii) 300 teams, (iii) the rest of the involved teams (about 150) in the tail of the distribution. There are marked empirical laws describing each class. A 3-parameter Lavalette function is used to describe the concave curving as the rank increases, and to distinguish the the tail from the central behavior.
Protein-Protein Docking with F2Dock 2.0 and GB-Rerank
Chowdhury, Rezaul; Rasheed, Muhibur; Keidel, Donald; Moussalem, Maysam; Olson, Arthur; Sanner, Michel; Bajaj, Chandrajit
2013-01-01
Motivation Computational simulation of protein-protein docking can expedite the process of molecular modeling and drug discovery. This paper reports on our new F2 Dock protocol which improves the state of the art in initial stage rigid body exhaustive docking search, scoring and ranking by introducing improvements in the shape-complementarity and electrostatics affinity functions, a new knowledge-based interface propensity term with FFT formulation, a set of novel knowledge-based filters and finally a solvation energy (GBSA) based reranking technique. Our algorithms are based on highly efficient data structures including the dynamic packing grids and octrees which significantly speed up the computations and also provide guaranteed bounds on approximation error. Results The improved affinity functions show superior performance compared to their traditional counterparts in finding correct docking poses at higher ranks. We found that the new filters and the GBSA based reranking individually and in combination significantly improve the accuracy of docking predictions with only minor increase in computation time. We compared F2 Dock 2.0 with ZDock 3.0.2 and found improvements over it, specifically among 176 complexes in ZLab Benchmark 4.0, F2 Dock 2.0 finds a near-native solution as the top prediction for 22 complexes; where ZDock 3.0.2 does so for 13 complexes. F2 Dock 2.0 finds a near-native solution within the top 1000 predictions for 106 complexes as opposed to 104 complexes for ZDock 3.0.2. However, there are 17 and 15 complexes where F2 Dock 2.0 finds a solution but ZDock 3.0.2 does not and vice versa; which indicates that the two docking protocols can also complement each other. Availability The docking protocol has been implemented as a server with a graphical client (TexMol) which allows the user to manage multiple docking jobs, and visualize the docked poses and interfaces. Both the server and client are available for download. Server: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dock.shtml. Client: http://www.cs.utexas.edu/~bajaj/cvc/software/f2dockclient.shtml. PMID:23483883
RANKL/RANK: from bone loss to the prevention of breast cancer.
Sigl, Verena; Jones, Laundette P; Penninger, Josef M
2016-11-01
RANK and RANKL, a receptor ligand pair belonging to the tumour necrosis factor family, are the critical regulators of osteoclast development and bone metabolism. Besides their essential function in bone, RANK and RANKL have also been identified as the key factors for the formation of a lactating mammary gland in pregnancy. Mechanistically, RANK and RANKL link the sex hormone progesterone with stem cell expansion and proliferation of mammary epithelial cells. Based on their normal physiology, RANKL/RANK control the onset of hormone-induced breast cancer through the expansion of mammary progenitor cells. Recently, we and others were able to show that RANK and RANKL are also critical regulators of BRCA1-mutation-driven breast cancer. Currently, the preventive strategy for BRCA1-mutation carriers includes preventive mastectomy, associated with wide-ranging risks and psychosocial effects. The search for an alternative non-invasive prevention strategy is therefore of paramount importance. As our work strongly implicates RANK and RANKL as key molecules involved in the initiation of BRCA1-associated breast cancer, we propose that anti-RANKL therapy could be a feasible preventive strategy for women carrying BRCA1 mutations, and by extension to other women with high risk of breast cancer. © 2016 The Authors.
Low-rank structure learning via nonconvex heuristic recovery.
Deng, Yue; Dai, Qionghai; Liu, Risheng; Zhang, Zengke; Hu, Sanqing
2013-03-01
In this paper, we propose a nonconvex framework to learn the essential low-rank structure from corrupted data. Different from traditional approaches, which directly utilizes convex norms to measure the sparseness, our method introduces more reasonable nonconvex measurements to enhance the sparsity in both the intrinsic low-rank structure and the sparse corruptions. We will, respectively, introduce how to combine the widely used ℓp norm (0 < p < 1) and log-sum term into the framework of low-rank structure learning. Although the proposed optimization is no longer convex, it still can be effectively solved by a majorization-minimization (MM)-type algorithm, with which the nonconvex objective function is iteratively replaced by its convex surrogate and the nonconvex problem finally falls into the general framework of reweighed approaches. We prove that the MM-type algorithm can converge to a stationary point after successive iterations. The proposed model is applied to solve two typical problems: robust principal component analysis and low-rank representation. Experimental results on low-rank structure learning demonstrate that our nonconvex heuristic methods, especially the log-sum heuristic recovery algorithm, generally perform much better than the convex-norm-based method (0 < p < 1) for both data with higher rank and with denser corruptions.
Multimodal medical information retrieval with unsupervised rank fusion.
Mourão, André; Martins, Flávio; Magalhães, João
2015-01-01
Modern medical information retrieval systems are paramount to manage the insurmountable quantities of clinical data. These systems empower health care experts in the diagnosis of patients and play an important role in the clinical decision process. However, the ever-growing heterogeneous information generated in medical environments poses several challenges for retrieval systems. We propose a medical information retrieval system with support for multimodal medical case-based retrieval. The system supports medical information discovery by providing multimodal search, through a novel data fusion algorithm, and term suggestions from a medical thesaurus. Our search system compared favorably to other systems in 2013 ImageCLEFMedical. Copyright © 2014 Elsevier Ltd. All rights reserved.
Eddy, Linda L; Hoeksel, Renee; Fitzgerald, Cindy; Doutrich, Dawn
We describe an innovative practice in advancing careers of academic nurse educators: demonstrating scholarly productivity from program grants. Scholarly productivity is often narrowly defined, especially in research-intensive institutions. The expectation may be a career trajectory based on the traditional scholarship of discovery. However, nurse educators, especially at the associate and full professor ranks, are often involved in leadership activities that include writing and managing program grants. We encourage the academy to value and support the development of program grants that include significant scholarly components, and we offer exemplars of associate and full professor scholarship derived from these projects.
How natural a kind is "eukaryote?".
Doolittle, W Ford
2014-06-02
Systematics balances uneasily between realism and nominalism, uncommitted as to whether biological taxa are discoveries or inventions. If the former, they might be taken as natural kinds. I briefly review some philosophers' concepts of natural kinds and then argue that several of these apply well enough to "eukaryote." Although there are some sticky issues around genomic chimerism and when eukaryotes first appeared, if we allow for degrees in the naturalness of kinds, existing eukaryotes rank highly, higher than prokaryotes. Most biologists feel this intuitively: All I attempt to do here is provide some conceptual justification. Copyright © 2014 Cold Spring Harbor Laboratory Press; all rights reserved.
T-regulatory cells-Triumph of perseverance: The Crafoord Prize for Polyarthritis in 2017.
Wollheim, Frank A
2018-02-01
The Crafoord Prize in Polyarthritis ranks as one of the most prestigious prizes and can be awarded only if the Royal Swedish Academy of Sciences decides the likelihood of prize worthy progress in the field, and at most every 4th year. This has happened only four times since 1982. This year the 5th Laureates were Shimon Sakaguchi, Fred Ramsdell, and Alexander Rudensky with the motivation "for their discoveries relating to regulatory T cells, which counteract harmful immune reactions in arthritis and other autoimmune diseases". Here I review the history of their contributions and its impact in rheumatology. Copyright © 2018. Published by Elsevier Inc.
Colorado quartz: occurrence and discovery
Kile, D.E.; Modreski, P.J.; Kile, D.L.
1991-01-01
The many varieties and associations of quartz found throughout the state rank it as one of the premier worldwide localities for that species. This paper briefly outlines the historical importance of the mineral, the mining history and the geological setting before discussing the varieties of quartz present, its crystallography and the geological enviroments in which it is found. The latter include volcanic rocks and near surface igneous rocks; pegmatites; metamorphic and plutonic rocks; hydrothermal veins; skarns and sedimentary deposits. Details of the localities and mode of occurrence of smoky quartz, amethyst, milky quartz, rock crystal, rose quartz, citrine, agate and jasper are then given. -S.J.Stone
Saber, Hamidreza; Rajah, Gary B; Kherallah, Riyad Y; Jadhav, Ashutosh P; Narayanan, Sandra
2017-12-15
Mechanical thrombectomy (MT) is increasingly used for large-vessel occlusions (LVO), but randomized clinical trial (RCT) level data with regard to differences in clinical outcomes of MT devices are limited. We conducted a network meta-analysis (NMA) that enables comparison of modern MT devices (Trevo, Solitaire, Aspiration) and strategies (stent retriever vs aspiration) across trials. Relevant RCTs were identified by a systematic review. The efficacy outcome was 90-day functional independence (modified Rankin Scale (mRS) score 0-2). Safety outcomes were 90-day catastrophic outcome (mRS 5-6) and symptomatic intracranial hemorrhage (sICH). Fixed-effect Bayesian NMA was performed to calculate risk estimates and the rank probabilities. In a NMA of six relevant RCTs (SWIFT, TREVO2, EXTEND-IA, SWIFT-PRIME, REVASCAT, THERAPY; total of 871 patients, 472 Solitaire vs medical-only, 108 Aspiration vs medical-only, 178 Trevo vs Merci, and 113 Solitaire vs Merci) with medical-only arm as the reference, Trevo had the greatest functional independence (OR 4.14, 95% credible interval (CrI) 1.41-11.80; top rank probability 92%) followed by Solitaire (OR 2.55, 95% CrI 1.75-3.74; top rank probability 72%). Solitaire and Aspiration devices had the greatest top rank probability with respect to low sICH and catastrophic outcomes (76% and 91%, respectively), but without significant differences between each other. In a separate network of seven RCTs (MR-CLEAN, ESCAPE, EXTEND-IA, SWIFT-PRIME, REVASCAT, THERAPY, ASTER; 1737 patients), first-line stent retriever was associated with a higher top rank probability of functional independence than aspiration (95% vs 54%), with comparable safety outcomes. These findings suggest that Trevo and Solitaire devices are associated with a greater likelihood of functional independence whereas Solitaire and Aspiration devices appear to be safer. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
The Double-Edged Sword of Pedagogy: Instruction Limits Spontaneous Exploration and Discovery
ERIC Educational Resources Information Center
Bonawitz, Elizabeth; Shafto, Patrick; Gweon, Hyowon; Goodman, Noah D.; Spelke, Elizabeth; Schulz, Laura
2011-01-01
Motivated by computational analyses, we look at how teaching affects exploration and discovery. In Experiment 1, we investigated children's exploratory play after an adult pedagogically demonstrated a function of a toy, after an interrupted pedagogical demonstration, after a naive adult demonstrated the function, and at baseline. Preschoolers in…
Network-based ranking methods for prediction of novel disease associated microRNAs.
Le, Duc-Hau
2015-10-01
Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In addition, the performance of PRINCE, PRP and KSM was comparable with that of RWR, whereas it was worst for the neighborhood-based algorithm. Moreover, all the algorithms were stable with the change of parameters. Final, using the integrated network, we predicted six novel miRNAs (i.e., hsa-miR-101, hsa-miR-181d, hsa-miR-192, hsa-miR-423-3p, hsa-miR-484 and hsa-miR-98) associated with breast cancer. Network-based ranking algorithms, which were successfully applied for either disease gene prediction or for studying social/web networks, can be also used effectively for disease microRNA prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.
Roy, Sujoy; Yun, Daqing; Madahian, Behrouz; Berry, Michael W.; Deng, Lih-Yuan; Goldowitz, Daniel; Homayouni, Ramin
2017-01-01
In this study, we developed and evaluated a novel text-mining approach, using non-negative tensor factorization (NTF), to simultaneously extract and functionally annotate transcriptional modules consisting of sets of genes, transcription factors (TFs), and terms from MEDLINE abstracts. A sparse 3-mode term × gene × TF tensor was constructed that contained weighted frequencies of 106,895 terms in 26,781 abstracts shared among 7,695 genes and 994 TFs. The tensor was decomposed into sub-tensors using non-negative tensor factorization (NTF) across 16 different approximation ranks. Dominant entries of each of 2,861 sub-tensors were extracted to form term–gene–TF annotated transcriptional modules (ATMs). More than 94% of the ATMs were found to be enriched in at least one KEGG pathway or GO category, suggesting that the ATMs are functionally relevant. One advantage of this method is that it can discover potentially new gene–TF associations from the literature. Using a set of microarray and ChIP-Seq datasets as gold standard, we show that the precision of our method for predicting gene–TF associations is significantly higher than chance. In addition, we demonstrate that the terms in each ATM can be used to suggest new GO classifications to genes and TFs. Taken together, our results indicate that NTF is useful for simultaneous extraction and functional annotation of transcriptional regulatory networks from unstructured text, as well as for literature based discovery. A web tool called Transcriptional Regulatory Modules Extracted from Literature (TREMEL), available at http://binf1.memphis.edu/tremel, was built to enable browsing and searching of ATMs. PMID:28894735
Dynamic selection mechanism for quality of service aware web services
NASA Astrophysics Data System (ADS)
D'Mello, Demian Antony; Ananthanarayana, V. S.
2010-02-01
A web service is an interface of the software component that can be accessed by standard Internet protocols. The web service technology enables an application to application communication and interoperability. The increasing number of web service providers throughout the globe have produced numerous web services providing the same or similar functionality. This necessitates the use of tools and techniques to search the suitable services available over the Web. UDDI (universal description, discovery and integration) is the first initiative to find the suitable web services based on the requester's functional demands. However, the requester's requirements may also include non-functional aspects like quality of service (QoS). In this paper, the authors define a QoS model for QoS aware and business driven web service publishing and selection. The authors propose a QoS requirement format for the requesters, to specify their complex demands on QoS for the web service selection. The authors define a tree structure called quality constraint tree (QCT) to represent the requester's variety of requirements on QoS properties having varied preferences. The paper proposes a QoS broker based architecture for web service selection, which facilitates the requesters to specify their QoS requirements to select qualitatively optimal web service. A web service selection algorithm is presented, which ranks the functionally similar web services based on the degree of satisfaction of the requester's QoS requirements and preferences. The paper defines web service provider qualities to distinguish qualitatively competitive web services. The paper also presents the modelling and selection mechanism for the requester's alternative constraints defined on the QoS. The authors implement the QoS broker based system to prove the correctness of the proposed web service selection mechanism.
Pfeiffenberger, Erik; Chaleil, Raphael A.G.; Moal, Iain H.
2017-01-01
ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. © 2016 Wiley Periodicals, Inc. PMID:27935158
An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.
Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco
2017-04-01
In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.
Dominance rank causally affects personality and glucocorticoid regulation in female rhesus macaques
Kohn, Jordan N.; Snyder-Mackler, Noah; Barreiro, Luis B.; Johnson, Zachary P.; Tung, Jenny; Wilson, Mark E.
2017-01-01
Low social status is frequently associated with heightened exposure to social stressors and altered glucocorticoid regulation by the hypothalamic-pituitary-adrenal (HPA) axis. Additionally, personality differences can affect how individuals behave in response to social conditions, and thus may aggravate or protect against the effects of low status on HPA function. Disentangling the relative importance of personality from the effects of the social environment on the HPA axis has been challenging, since social status can predict aspects of behavior, and both can remain stable across the lifespan. To do so here, we studied an animal model of social status and social behavior, the rhesus macaque (Macaca mulatta). We performed two sequential experimental manipulations of dominance rank (i.e., social status) in 45 adult females, allowing us to characterize personality and glucocorticoid regulation (based on sensitivity to the exogenous glucocorticoid dexamethasone) in each individual while she occupied two different dominance ranks. We identified two behavioral characteristics, termed ‘social approachability’ and ‘boldness,’ which were highly social status-dependent. Social approachability and a third dimension, anxiousness, were also associated with cortisol dynamics in low status females, suggesting that behavioral tendencies may sensitize individuals to the effects of low status on HPA axis function. Finally, we found that improvements in dominance rank increased dexamethasone-induced acute cortisol suppression and glucocorticoid negative feedback. Our findings indicate that social status causally affects both behavioral tendencies and glucocorticoid regulation, and that some behavioral tendencies also independently affect cortisol levels, beyond the effects of rank. Together, they highlight the importance of considering personality and social status together when investigating their effects on HPA axis function. PMID:27639059
Dominance rank causally affects personality and glucocorticoid regulation in female rhesus macaques.
Kohn, Jordan N; Snyder-Mackler, Noah; Barreiro, Luis B; Johnson, Zachary P; Tung, Jenny; Wilson, Mark E
2016-12-01
Low social status is frequently associated with heightened exposure to social stressors and altered glucocorticoid regulation by the hypothalamic-pituitary-adrenal (HPA) axis. Additionally, personality differences can affect how individuals behave in response to social conditions, and thus may aggravate or protect against the effects of low status on HPA function. Disentangling the relative importance of personality from the effects of the social environment on the HPA axis has been challenging, since social status can predict aspects of behavior, and both can remain stable across the lifespan. To do so here, we studied an animal model of social status and social behavior, the rhesus macaque (Macaca mulatta). We performed two sequential experimental manipulations of dominance rank (i.e., social status) in 45 adult females, allowing us to characterize personality and glucocorticoid regulation (based on sensitivity to the exogenous glucocorticoid dexamethasone) in each individual while she occupied two different dominance ranks. We identified two behavioral characteristics, termed 'social approachability' and 'boldness,' which were highly social status-dependent. Social approachability and a third dimension, anxiousness, were also associated with cortisol dynamics in low status females, suggesting that behavioral tendencies may sensitize individuals to the effects of low status on HPA axis function. Finally, we found that improvements in dominance rank increased dexamethasone-induced acute cortisol suppression and glucocorticoid negative feedback. Our findings indicate that social status causally affects both behavioral tendencies and glucocorticoid regulation, and that some behavioral tendencies also independently affect cortisol levels, beyond the effects of rank. Together, they highlight the importance of considering personality and social status together when investigating their effects on HPA axis function. Copyright © 2016 Elsevier Ltd. All rights reserved.
Yang, Xinan Holly; Li, Meiyi; Wang, Bin; Zhu, Wanqi; Desgardin, Aurelie; Onel, Kenan; de Jong, Jill; Chen, Jianjun; Chen, Luonan; Cunningham, John M
2015-03-24
Genes that regulate stem cell function are suspected to exert adverse effects on prognosis in malignancy. However, diverse cancer stem cell signatures are difficult for physicians to interpret and apply clinically. To connect the transcriptome and stem cell biology, with potential clinical applications, we propose a novel computational "gene-to-function, snapshot-to-dynamics, and biology-to-clinic" framework to uncover core functional gene-sets signatures. This framework incorporates three function-centric gene-set analysis strategies: a meta-analysis of both microarray and RNA-seq data, novel dynamic network mechanism (DNM) identification, and a personalized prognostic indicator analysis. This work uses complex disease acute myeloid leukemia (AML) as a research platform. We introduced an adjustable "soft threshold" to a functional gene-set algorithm and found that two different analysis methods identified distinct gene-set signatures from the same samples. We identified a 30-gene cluster that characterizes leukemic stem cell (LSC)-depleted cells and a 25-gene cluster that characterizes LSC-enriched cells in parallel; both mark favorable-prognosis in AML. Genes within each signature significantly share common biological processes and/or molecular functions (empirical p = 6e-5 and 0.03 respectively). The 25-gene signature reflects the abnormal development of stem cells in AML, such as AURKA over-expression. We subsequently determined that the clinical relevance of both signatures is independent of known clinical risk classifications in 214 patients with cytogenetically normal AML. We successfully validated the prognosis of both signatures in two independent cohorts of 91 and 242 patients respectively (log-rank p < 0.0015 and 0.05; empirical p < 0.015 and 0.08). The proposed algorithms and computational framework will harness systems biology research because they efficiently translate gene-sets (rather than single genes) into biological discoveries about AML and other complex diseases.
Bridging the Gap: Need for a Data Repository to Support Vaccine Prioritization Efforts*
Madhavan, Guruprasad; Phelps, Charles; Sangha, Kinpritma; Levin, Scott; Rappuoli, Rino
2015-01-01
As the mechanisms for discovery, development, and delivery of new vaccines become increasingly complex, strategic planning and priority setting have become ever more crucial. Traditional single value metrics such as disease burden or cost-effectiveness no longer suffice to rank vaccine candidates for development. The Institute of Medicine—in collaboration with the National Academy of Engineering—has developed a novel software system to support vaccine prioritization efforts. The Strategic Multi-Attribute Ranking Tool for Vaccines—SMART Vaccines—allows decision makers to specify their own value structure, selecting from among 28 pre-defined and up to 7 user-defined attributes relevant to the ranking of vaccine candidates. Widespread use of SMART Vaccines will require compilation of a comprehensive data repository for numerous relevant populations—including their demographics, disease burdens and associated treatment costs, as well as characterizing performance features of potential or existing vaccines that might be created, improved, or deployed. While the software contains preloaded data for a modest number of populations, a large gap exists between the existing data and a comprehensive data repository necessary to make full use of SMART Vaccines. While some of these data exist in disparate sources and forms, constructing a data repository will require much new coordination and focus. Finding strategies to bridge the gap to a comprehensive data repository remains the most important task in bringing SMART Vaccines to full fruition, and to support strategic vaccine prioritization efforts in general. PMID:26022565
Gao, Liyan; Ge, Haitao; Huang, Xiahe; Liu, Kehui; Zhang, Yuanya; Xu, Wu; Wang, Yingchun
2015-01-01
Large-scale quantitative evaluation of the tightness of membrane association for nontransmembrane proteins is important for identifying true peripheral membrane proteins with functional significance. Herein, we simultaneously ranked more than 1000 proteins of the photosynthetic model organism Synechocystis sp. PCC 6803 for their relative tightness of membrane association using a proteomic approach. Using multiple precisely ranked and experimentally verified peripheral subunits of photosynthetic protein complexes as the landmarks, we found that proteins involved in two-component signal transduction systems and transporters are overall tightly associated with the membranes, whereas the associations of ribosomal proteins are much weaker. Moreover, we found that hypothetical proteins containing the same domains generally have similar tightness. This work provided a global view of the structural organization of the membrane proteome with respect to divergent functions, and built the foundation for future investigation of the dynamic membrane proteome reorganization in response to different environmental or internal stimuli. PMID:25505158
Mass spectrometry of peptides and proteins from human blood.
Zhu, Peihong; Bowden, Peter; Zhang, Du; Marshall, John G
2011-01-01
It is difficult to convey the accelerating rate and growing importance of mass spectrometry applications to human blood proteins and peptides. Mass spectrometry can rapidly detect and identify the ionizable peptides from the proteins in a simple mixture and reveal many of their post-translational modifications. However, blood is a complex mixture that may contain many proteins first expressed in cells and tissues. The complete analysis of blood proteins is a daunting task that will rely on a wide range of disciplines from physics, chemistry, biochemistry, genetics, electromagnetic instrumentation, mathematics and computation. Therefore the comprehensive discovery and analysis of blood proteins will rank among the great technical challenges and require the cumulative sum of many of mankind's scientific achievements together. A variety of methods have been used to fractionate, analyze and identify proteins from blood, each yielding a small piece of the whole and throwing the great size of the task into sharp relief. The approaches attempted to date clearly indicate that enumerating the proteins and peptides of blood can be accomplished. There is no doubt that the mass spectrometry of blood will be crucial to the discovery and analysis of proteins, enzyme activities, and post-translational processes that underlay the mechanisms of disease. At present both discovery and quantification of proteins from blood are commonly reaching sensitivities of ∼1 ng/mL. Copyright © 2010 Wiley Periodicals, Inc.
EXAMINING SOCIOECONOMIC HEALTH DISPARITIES USING A RANK-DEPENDENT RÉNYI INDEX.
Talih, Makram
2015-06-01
The Rényi index (RI) is a one-parameter class of indices that summarize health disparities among population groups by measuring divergence between the distributions of disease burden and population shares of these groups. The rank-dependent RI introduced in this paper is a two-parameter class of health disparity indices that also accounts for the association between socioeconomic rank and health; it may be derived from a rank-dependent social welfare function. Two competing classes are discussed and the rank-dependent RI is shown to be more robust to changes in the distribution of either socioeconomic rank or health. The standard error and sampling distribution of the rank-dependent RI are evaluated using linearization and re-sampling techniques, and the methodology is illustrated using health survey data from the U.S. National Health and Nutrition Examination Survey and registry data from the U.S. Surveillance, Epidemiology and End Results Program. Such data underlie many population-based objectives within the U.S. Healthy People 2020 initiative. The rank-dependent RI provides a unified mathematical framework for eliciting various societal positions with regards to the policies that are tied to such wide-reaching public health initiatives. For example, if population groups with lower socioeconomic position were ascertained to be more likely to utilize costly public programs, then the parameters of the RI could be selected to reflect prioritizing those population groups for intervention or treatment.
EXAMINING SOCIOECONOMIC HEALTH DISPARITIES USING A RANK-DEPENDENT RÉNYI INDEX
Talih, Makram
2015-01-01
The Rényi index (RI) is a one-parameter class of indices that summarize health disparities among population groups by measuring divergence between the distributions of disease burden and population shares of these groups. The rank-dependent RI introduced in this paper is a two-parameter class of health disparity indices that also accounts for the association between socioeconomic rank and health; it may be derived from a rank-dependent social welfare function. Two competing classes are discussed and the rank-dependent RI is shown to be more robust to changes in the distribution of either socioeconomic rank or health. The standard error and sampling distribution of the rank-dependent RI are evaluated using linearization and re-sampling techniques, and the methodology is illustrated using health survey data from the U.S. National Health and Nutrition Examination Survey and registry data from the U.S. Surveillance, Epidemiology and End Results Program. Such data underlie many population-based objectives within the U.S. Healthy People 2020 initiative. The rank-dependent RI provides a unified mathematical framework for eliciting various societal positions with regards to the policies that are tied to such wide-reaching public health initiatives. For example, if population groups with lower socioeconomic position were ascertained to be more likely to utilize costly public programs, then the parameters of the RI could be selected to reflect prioritizing those population groups for intervention or treatment. PMID:26566419
Hühn, M; Lotito, S; Piepho, H P
1993-09-01
Multilocation trials in plant breeding lead to cross-classified data sets with rows=genotypes and columns=environments, where the breeder is particularly interested in the rank orders of the genotypes in the different environments. Non-identical rank orders are the result of genotype x environment interactions. Not every interaction, however, causes rank changes among the genotypes (rank-interaction). From a breeder's point of view, interaction is tolerable only as long as it does not affect the rank orders. Therefore, the question arises of under which circumstances does interaction become rank-interaction. This paper contributes to our understanding of this topic. In our study we emphasized the detection of relationships between the similarity of the rank orders (measured by Kendall's coefficient of concordance W) and the functions of the diverse variance components (genotypes, environments, interaction, error). On the basis of extensive data sets on different agricultural crops (faba bean, fodder beet, sugar beet, oats, winter rape) obtained from registration trials (1985-1989) carried out in the Federal Republic of Germany, we obtained the following as main result: W ≅ σ 2 (g) /(σ 2 (g) + σ 2 (v) ) where σ 2 (g) =genotypic variance and σ 2 (v) = σ 2 (ge) + σ 2 (o) /L with σ 2 (ge) =interaction variance, σ 2 (o) =error variance and L=number of replications.
Exarchos, Konstantinos P; Exarchos, Themis P; Rigas, Georgios; Papaloukas, Costas; Fotiadis, Dimitrios I
2011-05-10
In peptides and proteins, only a small percentile of peptide bonds adopts the cis configuration. Especially in the case of amide peptide bonds, the amount of cis conformations is quite limited thus hampering systematic studies, until recently. However, lately the emerging population of databases with more 3D structures of proteins has produced a considerable number of sequences containing non-proline cis formations (cis-nonPro). In our work, we extract regular expression-type patterns that are descriptive of regions surrounding the cis-nonPro formations. For this purpose, three types of pattern discovery are performed: i) exact pattern discovery, ii) pattern discovery using a chemical equivalency set, and iii) pattern discovery using a structural equivalency set. Afterwards, using each pattern as predicate, we search the Eukaryotic Linear Motif (ELM) resource to identify potential functional implications of regions with cis-nonPro peptide bonds. The patterns extracted from each type of pattern discovery are further employed, in order to formulate a pattern-based classifier, which is used to discriminate between cis-nonPro and trans-nonPro formations. In terms of functional implications, we observe a significant association of cis-nonPro peptide bonds towards ligand/binding functionalities. As for the pattern-based classification scheme, the highest results were obtained using the structural equivalency set, which yielded 70% accuracy, 77% sensitivity and 63% specificity.
The material and biological characteristics of osteoinductive calcium phosphate ceramics
Tang, Zhurong; Li, Xiangfeng; Tan, Yanfei
2018-01-01
Abstract The discovery of osteoinductivity of calcium phosphate (Ca-P) ceramics has set an enduring paradigm of conferring biological regenerative activity to materials with carefully designed structural characteristics. The unique phase composition and porous structural features of osteoinductive Ca-P ceramics allow it to interact with signaling molecules and extracellular matrices in the host system, creating a local environment conducive to new bone formation. Mounting evidence now indicate that the osteoinductive activity of Ca-P ceramics is linked to their physicochemical and three-dimensional structural properties. Inspired by this conceptual breakthrough, many laboratories have shown that other materials can be also enticed to join the rank of tissue-inducing biomaterials, and besides the bones, other tissues such as cartilage, nerves and blood vessels were also regenerated with the assistance of biomaterials. Here, we give a brief historical recount about the discovery of the osteoinductivity of Ca-P ceramics, summarize the underlying material factors and biological characteristics, and discuss the mechanism of osteoinduction concerning protein adsorption, and the interaction with different types of cells, and the involvement of the vascular and immune systems. PMID:29423267
Discovery of the First Quadruple Gravitationally Lensed Quasar Candidate with Pan-STARRS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Berghea, C. T.; Nelson, George J.; Dudik, R. P.
We report the serendipitous discovery of the first gravitationally lensed quasar candidate from Pan-STARRS. The grizy images reveal four point-like images with magnitudes between 14.9 and 18.1 mag. The colors of the point sources are similar, and they are more consistent with quasars than with stars or galaxies. The lensing galaxy is detected in the izy bands, with an inferred photometric redshift of ∼0.6, lower than that of the point sources. We successfully model the system with a singular isothermal ellipsoid with shear, using the relative positions of the five objects as constraints. While the brightness ranking of the pointmore » sources is consistent with that of the model, we find discrepancies between the model-predicted and observed fluxes, likely due to microlensing by stars and millilensing due to the dark matter substructure. In order to fully confirm the gravitational lens nature of this system and add it to the small but growing number of the powerful probes of cosmology and astrophysics represented by quadruply lensed quasars, we require further spectroscopy and high-resolution imaging.« less
Wang, Dongyao; Lv, Diya; Chen, Xiaofei; Liu, Yue; Ding, Xuan; Jia, Dan; Chen, Langdong; Zhu, Zhenyu; Cao, Yan; Chai, Yifeng
2015-12-01
Evaluating the biological activities of small molecules represents an important part of the drug discovery process. Cell membrane chromatography (CMC) is a well-developed biological chromatographic technique. In this study, we have developed combined SMMC-7721/CMC and HepG2/CMC with high-performance liquid chromatography and time-of-flight mass spectrometry to establish an integrated screening platform. These systems was subsequently validated and used for evaluating the activity of quinazoline compounds, which were designed and synthesized to target vascular endothelial growth factor receptor 2. The inhibitory activities of these compounds towards this receptor were also tested using a classical caliper mobility shift assay. The results revealed a significant correlation between these two methods (R(2) = 0.9565 or 0.9420) for evaluating the activities of these compounds. Compared with traditional methods of evaluating the activities analogous compounds, this integrated cell membrane chromatography screening system took less time and was more cost effective, indicating that it could be used as a practical method in drug discovery. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Hao, Ming; Wang, Yanli; Bryant, Stephen H
2016-02-25
Identification of drug-target interactions (DTI) is a central task in drug discovery processes. In this work, a simple but effective regularized least squares integrating with nonlinear kernel fusion (RLS-KF) algorithm is proposed to perform DTI predictions. Using benchmark DTI datasets, our proposed algorithm achieves the state-of-the-art results with area under precision-recall curve (AUPR) of 0.915, 0.925, 0.853 and 0.909 for enzymes, ion channels (IC), G protein-coupled receptors (GPCR) and nuclear receptors (NR) based on 10 fold cross-validation. The performance can further be improved by using a recalculated kernel matrix, especially for the small set of nuclear receptors with AUPR of 0.945. Importantly, most of the top ranked interaction predictions can be validated by experimental data reported in the literature, bioassay results in the PubChem BioAssay database, as well as other previous studies. Our analysis suggests that the proposed RLS-KF is helpful for studying DTI, drug repositioning as well as polypharmacology, and may help to accelerate drug discovery by identifying novel drug targets. Published by Elsevier B.V.
Fu, Ying; Sun, Yi-Na; Yi, Ke-Han; Li, Ming-Qiang; Cao, Hai-Feng; Li, Jia-Zhong; Ye, Fei
2017-06-09
p -Hydroxyphenylpyruvate dioxygenase (HPPD) is not only the useful molecular target in treating life-threatening tyrosinemia type I, but also an important target for chemical herbicides. A combined in silico structure-based pharmacophore and molecular docking-based virtual screening were performed to identify novel potential HPPD inhibitors. The complex-based pharmacophore model (CBP) with 0.721 of ROC used for screening compounds showed remarkable ability to retrieve known active ligands from among decoy molecules. The ChemDiv database was screened using CBP-Hypo2 as a 3D query, and the best-fit hits subjected to molecular docking with two methods of LibDock and CDOCKER in Accelrys Discovery Studio 2.5 (DS 2.5) to discern interactions with key residues at the active site of HPPD. Four compounds with top rankings in the HipHop model and well-known binding model were finally chosen as lead compounds with potential inhibitory effects on the active site of target. The results provided powerful insight into the development of novel HPPD inhibitors herbicides using computational techniques.
Optogenetic Approaches to Drug Discovery in Neuroscience and Beyond.
Zhang, Hongkang; Cohen, Adam E
2017-07-01
Recent advances in optogenetics have opened new routes to drug discovery, particularly in neuroscience. Physiological cellular assays probe functional phenotypes that connect genomic data to patient health. Optogenetic tools, in particular tools for all-optical electrophysiology, now provide a means to probe cellular disease models with unprecedented throughput and information content. These techniques promise to identify functional phenotypes associated with disease states and to identify compounds that improve cellular function regardless of whether the compound acts directly on a target or through a bypass mechanism. This review discusses opportunities and unresolved challenges in applying optogenetic techniques throughout the discovery pipeline - from target identification and validation, to target-based and phenotypic screens, to clinical trials. Copyright © 2017 Elsevier Ltd. All rights reserved.
Oral symptoms and functional outcome related to oral and oropharyngeal cancer.
Kamstra, Jolanda I; Jager-Wittenaar, Harriet; Dijkstra, Pieter U; Huisman, Paulien M; van Oort, Rob P; van der Laan, Bernard F A M; Roodenburg, Jan L N
2011-09-01
This study aimed to assess: (1) oral symptoms of patients treated for oral or oropharyngeal cancer; (2) how patients rank the burden of oral symptoms; (3) the impact of the tumor, the treatment, and oral symptoms on functional outcome. Eighty-nine patients treated for oral or oropharyngeal cancer were asked about their oral symptoms related to mouth opening, dental status, oral sensory function, tongue mobility, salivary function, and pain. They were asked to rank these oral symptoms according to the degree of burden experienced. The Mandibular Function Impairment Questionnaire (MFIQ) was used to assess functional outcome. In a multivariate linear regression analyses, variables related to MFIQ scores (p≤0.10) were entered as predictors with MFIQ score as the outcome. Lack of saliva (52%), restricted mouth opening (48%), and restricted tongue mobility (46%) were the most frequently reported oral symptoms. Lack of saliva was most frequently (32%) ranked as the most burdensome oral symptom. For radiated patients, an inability to wear a dental prosthesis, a T3 or T4 stage, and a higher age were predictive of MFIQ scores. For non-radiated patients, a restricted mouth opening, an inability to wear a dental prosthesis, restricted tongue mobility, and surgery of the mandible were predictive of MFIQ scores. Lack of saliva was not only the most frequently reported oral symptom after treatment for oral or oropharyngeal cancer, but also the most burdensome. Functional outcome is strongly influenced by an inability to wear a dental prosthesis in both radiated and non-radiated patients.
Religious Penalty in the U.S. News & World Report College Rankings
ERIC Educational Resources Information Center
Baumann, Robert W.; Chu, David K. W.; Anderton, Charles H.
2009-01-01
Since its debut in 1983, the "U.S. News & World Report College Guide" has become the premier "consumer report" of higher education. We find that peer assessment, which is the largest component of the "U.S. News & World Report" ranking function, contains a penalty for religiously affiliated schools that is independent of the other "U.S. News &…
Diagnosing and ranking retinopathy disease level using diabetic fundus image recuperation approach.
Somasundaram, K; Rajendran, P Alli
2015-01-01
Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.
Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image Recuperation Approach
Somasundaram, K.; Alli Rajendran, P.
2015-01-01
Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time. PMID:25945362
Gfeller, Kate; Turner, Christopher; Oleson, Jacob; Zhang, Xuyang; Gantz, Bruce; Froman, Rebecca; Olszewski, Carol
2007-06-01
The purposes of this study were to (a) examine the accuracy of cochlear implant recipients who use different types of devices and signal processing strategies on pitch ranking as a function of size of interval and frequency range and (b) to examine the relations between this pitch perception measure and demographic variables, melody recognition, and speech reception in background noise. One hundred fourteen cochlear implant users and 21 normal-hearing adults were tested on a pitch discrimination task (pitch ranking) that required them to determine direction of pitch change as a function of base frequency and interval size. Three groups were tested: (a) long electrode cochlear implant users (N = 101); (b) short electrode users that received acoustic plus electrical stimulation (A+E) (N = 13); and (c) a normal-hearing (NH) comparison group (N = 21). Pitch ranking was tested at standard frequencies of 131 to 1048 Hz, and the size of the pitch-change intervals ranged from 1 to 4 semitones. A generalized linear mixed model (GLMM) was fit to predict pitch ranking and to determine if group differences exist as a function of base frequency and interval size. Overall significance effects were measured with Chi-square tests and individual effects were measured with t-tests. Pitch ranking accuracy was correlated with demographic measures (age at time of testing, length of profound deafness, months of implant use), frequency difference limens, familiar melody recognition, and two measures of speech reception in noise. The long electrode recipients performed significantly poorer on pitch discrimination than the NH and A+E group. The A+E users performed similarly to the NH listeners as a function of interval size in the lower base frequency range, but their pitch discrimination scores deteriorated slightly in the higher frequency range. The long electrode recipients, although less accurate than participants in the NH and A+E groups, tended to perform with greater accuracy within the higher frequency range. There were statistically significant correlations between pitch ranking and familiar melody recognition as well as with pure-tone frequency difference limens at 200 and 400 Hz. Low-frequency acoustic hearing improves pitch discrimination as compared with traditional, electric-only cochlear implants. These findings have implications for musical tasks such as familiar melody recognition.
Scoring ligand similarity in structure-based virtual screening.
Zavodszky, Maria I; Rohatgi, Anjali; Van Voorst, Jeffrey R; Yan, Honggao; Kuhn, Leslie A
2009-01-01
Scoring to identify high-affinity compounds remains a challenge in virtual screening. On one hand, protein-ligand scoring focuses on weighting favorable and unfavorable interactions between the two molecules. Ligand-based scoring, on the other hand, focuses on how well the shape and chemistry of each ligand candidate overlay on a three-dimensional reference ligand. Our hypothesis is that a hybrid approach, using ligand-based scoring to rank dockings selected by protein-ligand scoring, can ensure that high-ranking molecules mimic the shape and chemistry of a known ligand while also complementing the binding site. Results from applying this approach to screen nearly 70 000 National Cancer Institute (NCI) compounds for thrombin inhibitors tend to support the hypothesis. EON ligand-based ranking of docked molecules yielded the majority (4/5) of newly discovered, low to mid-micromolar inhibitors from a panel of 27 assayed compounds, whereas ranking docked compounds by protein-ligand scoring alone resulted in one new inhibitor. Since the results depend on the choice of scoring function, an analysis of properties was performed on the top-scoring docked compounds according to five different protein-ligand scoring functions, plus EON scoring using three different reference compounds. The results indicate that the choice of scoring function, even among scoring functions measuring the same types of interactions, can have an unexpectedly large effect on which compounds are chosen from screening. Furthermore, there was almost no overlap between the top-scoring compounds from protein-ligand versus ligand-based scoring, indicating the two approaches provide complementary information. Matchprint analysis, a new addition to the SLIDE (Screening Ligands by Induced-fit Docking, Efficiently) screening toolset, facilitated comparison of docked molecules' interactions with those of known inhibitors. The majority of interactions conserved among top-scoring compounds for a given scoring function, and from the different scoring functions, proved to be conserved interactions in known inhibitors. This was particularly true in the S1 pocket, which was occupied by all the docked compounds. (c) 2009 John Wiley & Sons, Ltd.
Kinetic model of turbulence in an incompressible fluid
NASA Technical Reports Server (NTRS)
Tchen, C. M.
1978-01-01
A statistical description of turbulence in an incompressible fluid obeying the Navier-Stokes equations is proposed, where pressure is regarded as a potential for the interaction between fluid elements. A scaling procedure divides a fluctuation into three ranks representing the three transport processes of macroscopic evolution, transport property, and relaxation. Closure is obtained by relaxation, and a kinetic equation is obtained for the fluctuation of the macroscopic rank of the distribution function. The solution gives the transfer function and eddy viscosity. When applied to the inertia subrange of the energy spectrum the analysis recovers the Kolmogorov law and its numerical coefficient.
Optimization of global model composed of radial basis functions using the term-ranking approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cai, Peng; Tao, Chao, E-mail: taochao@nju.edu.cn; Liu, Xiao-Jun
2014-03-15
A term-ranking method is put forward to optimize the global model composed of radial basis functions to improve the predictability of the model. The effectiveness of the proposed method is examined by numerical simulation and experimental data. Numerical simulations indicate that this method can significantly lengthen the prediction time and decrease the Bayesian information criterion of the model. The application to real voice signal shows that the optimized global model can capture more predictable component in chaos-like voice data and simultaneously reduce the predictable component (periodic pitch) in the residual signal.
mRAISE: an alternative algorithmic approach to ligand-based virtual screening
NASA Astrophysics Data System (ADS)
von Behren, Mathias M.; Bietz, Stefan; Nittinger, Eva; Rarey, Matthias
2016-08-01
Ligand-based virtual screening is a well established method to find new lead molecules in todays drug discovery process. In order to be applicable in day to day practice, such methods have to face multiple challenges. The most important part is the reliability of the results, which can be shown and compared in retrospective studies. Furthermore, in the case of 3D methods, they need to provide biologically relevant molecular alignments of the ligands, that can be further investigated by a medicinal chemist. Last but not least, they have to be able to screen large databases in reasonable time. Many algorithms for ligand-based virtual screening have been proposed in the past, most of them based on pairwise comparisons. Here, a new method is introduced called mRAISE. Based on structural alignments, it uses a descriptor-based bitmap search engine (RAISE) to achieve efficiency. Alignments created on the fly by the search engine get evaluated with an independent shape-based scoring function also used for ranking of compounds. The correct ranking as well as the alignment quality of the method are evaluated and compared to other state of the art methods. On the commonly used Directory of Useful Decoys dataset mRAISE achieves an average area under the ROC curve of 0.76, an average enrichment factor at 1 % of 20.2 and an average hit rate at 1 % of 55.5. With these results, mRAISE is always among the top performing methods with available data for comparison. To access the quality of the alignments calculated by ligand-based virtual screening methods, we introduce a new dataset containing 180 prealigned ligands for 11 diverse targets. Within the top ten ranked conformations, the alignment closest to X-ray structure calculated with mRAISE has a root-mean-square deviation of less than 2.0 Å for 80.8 % of alignment pairs and achieves a median of less than 2.0 Å for eight of the 11 cases. The dataset used to rate the quality of the calculated alignments is freely available at http://www.zbh.uni-hamburg.de/mraise-dataset.html. The table of all PDB codes contained in the ensembles can be found in the supplementary material. The software tool mRAISE is freely available for evaluation purposes and academic use (see http://www.zbh.uni-hamburg.de/raise).
Boose, Klaree; White, Frances
2017-10-01
The immatures of many primate species frequently pester adult group members with aggressive behaviors referred to as a type of harassment. Although these behaviors are characteristic of immatures as they develop from infancy through adolescence, there have been few studies that specifically address the adaptive significance of harassment. Two functional hypotheses have been generated from observations of the behavior in chimpanzees. The Exploratory Aggression hypothesis describes harassment as a mechanism used by immatures to learn about the parameters of aggression and dominance behavior and to acquire information about novel, complex, or unpredictable relationships. The Rank Improvement hypothesis describes harassment as a mechanism of dominance acquisition used by immatures to outrank adults. This study investigated harassment of adults by immatures in a group of bonobos housed at the Columbus Zoo and compared the results to the predictions outlined by the Exploratory Aggression and Rank Improvement hypotheses. Although all immature bonobos in this group harassed adults, adolescents performed the behavior more frequently than did infants or juveniles and low-ranking adults were targeted more frequently than high-ranking. Targets responded more with agonistic behaviors than with neutral behaviors and the amount of harassment an individual received was significantly correlated with the amount of agonistic responses given. Furthermore, bouts of harassment were found to continue significantly more frequently when responses were agonistic than when they were neutral. Adolescents elicited mostly agonistic responses from targets whereas infants and juveniles received mostly neutral responses. These results support predictions from each hypothesis where harassment functions both as a mechanism of social exploration and as a tool to establish dominance rank.
Juan-Albarracín, Javier; Fuster-Garcia, Elies; Pérez-Girbés, Alexandre; Aparici-Robles, Fernando; Alberich-Bayarri, Ángel; Revert-Ventura, Antonio; Martí-Bonmatí, Luis; García-Gómez, Juan M
2018-06-01
Purpose To determine if preoperative vascular heterogeneity of glioblastoma is predictive of overall survival of patients undergoing standard-of-care treatment by using an unsupervised multiparametric perfusion-based habitat-discovery algorithm. Materials and Methods Preoperative magnetic resonance (MR) imaging including dynamic susceptibility-weighted contrast material-enhanced perfusion studies in 50 consecutive patients with glioblastoma were retrieved. Perfusion parameters of glioblastoma were analyzed and used to automatically draw four reproducible habitats that describe the tumor vascular heterogeneity: high-angiogenic and low-angiogenic regions of the enhancing tumor, potentially tumor-infiltrated peripheral edema, and vasogenic edema. Kaplan-Meier and Cox proportional hazard analyses were conducted to assess the prognostic potential of the hemodynamic tissue signature to predict patient survival. Results Cox regression analysis yielded a significant correlation between patients' survival and maximum relative cerebral blood volume (rCBV max ) and maximum relative cerebral blood flow (rCBF max ) in high-angiogenic and low-angiogenic habitats (P < .01, false discovery rate-corrected P < .05). Moreover, rCBF max in the potentially tumor-infiltrated peripheral edema habitat was also significantly correlated (P < .05, false discovery rate-corrected P < .05). Kaplan-Meier analysis demonstrated significant differences between the observed survival of populations divided according to the median of the rCBV max or rCBF max at the high-angiogenic and low-angiogenic habitats (log-rank test P < .05, false discovery rate-corrected P < .05), with an average survival increase of 230 days. Conclusion Preoperative perfusion heterogeneity contains relevant information about overall survival in patients who undergo standard-of-care treatment. The hemodynamic tissue signature method automatically describes this heterogeneity, providing a set of vascular habitats with high prognostic capabilities. © RSNA, 2018.
Tavares, Adriana Alexandre S; Lewsey, James; Dewar, Deborah; Pimlott, Sally L
2012-01-01
Previously, development of novel brain radiotracers has largely relied on simple screening tools. Improved selection methods at the early stages of radiotracer discovery and an increased understanding of the relationships between in vitro physicochemical and in vivo radiotracer properties are needed. We investigated if high performance liquid chromatography (HPLC) methodologies could provide criteria for lead candidate selection by comparing HPLC measurements with radiotracer properties in humans. Ten molecules, previously used as radiotracers in humans, were analysed to obtain the following measures: partition coefficient (Log P); permeability (P(m)); percentage of plasma protein binding (%PPB); and membrane partition coefficient (K(m)). Relationships between brain entry measurements (Log P, P(m) and %PPB) and in vivo brain percentage injected dose (%ID); and K(m) and specific binding in vivo (BP(ND)) were investigated. Log P values obtained using in silico packages and flask methods were compared with Log P values obtained using HPLC. The modelled associations with %ID were stronger for %PPB (r(2)=0.65) and P(m) (r(2)=0.77) than for Log P (r(2)=0.47) while 86% of BP(ND) variance was explained by K(m). Log P values were variable dependant on the methodology used. Log P should not be relied upon as a predictor of blood-brain barrier penetration during brain radiotracer discovery. HPLC measurements of permeability, %PPB and membrane interactions may be potentially useful in predicting in vivo performance and hence allow evaluation and ranking of compound libraries for the selection of lead radiotracer candidates at early stages of radiotracer discovery. Copyright © 2012 Elsevier Inc. All rights reserved.
Particle swarm optimization with recombination and dynamic linkage discovery.
Chen, Ying-Ping; Peng, Wen-Chih; Jian, Ming-Chung
2007-12-01
In this paper, we try to improve the performance of the particle swarm optimizer by incorporating the linkage concept, which is an essential mechanism in genetic algorithms, and design a new linkage identification technique called dynamic linkage discovery to address the linkage problem in real-parameter optimization problems. Dynamic linkage discovery is a costless and effective linkage recognition technique that adapts the linkage configuration by employing only the selection operator without extra judging criteria irrelevant to the objective function. Moreover, a recombination operator that utilizes the discovered linkage configuration to promote the cooperation of particle swarm optimizer and dynamic linkage discovery is accordingly developed. By integrating the particle swarm optimizer, dynamic linkage discovery, and recombination operator, we propose a new hybridization of optimization methodologies called particle swarm optimization with recombination and dynamic linkage discovery (PSO-RDL). In order to study the capability of PSO-RDL, numerical experiments were conducted on a set of benchmark functions as well as on an important real-world application. The benchmark functions used in this paper were proposed in the 2005 Institute of Electrical and Electronics Engineers Congress on Evolutionary Computation. The experimental results on the benchmark functions indicate that PSO-RDL can provide a level of performance comparable to that given by other advanced optimization techniques. In addition to the benchmark, PSO-RDL was also used to solve the economic dispatch (ED) problem for power systems, which is a real-world problem and highly constrained. The results indicate that PSO-RDL can successfully solve the ED problem for the three-unit power system and obtain the currently known best solution for the 40-unit system.
Niu, Shanzhou; Zhang, Shanli; Huang, Jing; Bian, Zhaoying; Chen, Wufan; Yu, Gaohang; Liang, Zhengrong; Ma, Jianhua
2016-01-01
Cerebral perfusion x-ray computed tomography (PCT) is an important functional imaging modality for evaluating cerebrovascular diseases and has been widely used in clinics over the past decades. However, due to the protocol of PCT imaging with repeated dynamic sequential scans, the associative radiation dose unavoidably increases as compared with that used in conventional CT examinations. Minimizing the radiation exposure in PCT examination is a major task in the CT field. In this paper, considering the rich similarity redundancy information among enhanced sequential PCT images, we propose a low-dose PCT image restoration model by incorporating the low-rank and sparse matrix characteristic of sequential PCT images. Specifically, the sequential PCT images were first stacked into a matrix (i.e., low-rank matrix), and then a non-convex spectral norm/regularization and a spatio-temporal total variation norm/regularization were then built on the low-rank matrix to describe the low rank and sparsity of the sequential PCT images, respectively. Subsequently, an improved split Bregman method was adopted to minimize the associative objective function with a reasonable convergence rate. Both qualitative and quantitative studies were conducted using a digital phantom and clinical cerebral PCT datasets to evaluate the present method. Experimental results show that the presented method can achieve images with several noticeable advantages over the existing methods in terms of noise reduction and universal quality index. More importantly, the present method can produce more accurate kinetic enhanced details and diagnostic hemodynamic parameter maps. PMID:27440948
A fortran program for Monte Carlo simulation of oil-field discovery sequences
Bohling, Geoffrey C.; Davis, J.C.
1993-01-01
We have developed a program for performing Monte Carlo simulation of oil-field discovery histories. A synthetic parent population of fields is generated as a finite sample from a distribution of specified form. The discovery sequence then is simulated by sampling without replacement from this parent population in accordance with a probabilistic discovery process model. The program computes a chi-squared deviation between synthetic and actual discovery sequences as a function of the parameters of the discovery process model, the number of fields in the parent population, and the distributional parameters of the parent population. The program employs the three-parameter log gamma model for the distribution of field sizes and employs a two-parameter discovery process model, allowing the simulation of a wide range of scenarios. ?? 1993.
Pituitary adenylate cyclase-activating polypeptide: a novel peptide with protean implications.
Pisegna, Joseph R; Oh, David S
2007-02-01
The purpose of this review is to highlight the importance of pituitary adenylate cyclase-activating polypeptide in physiological processes and to describe how this peptide is becoming increasingly recognized as having a major role in the body. Since its discovery in 1989, investigators have sought to determine the site of biological activity and the function of pituitary adenylate cyclase-activating polypeptide in maintaining homeostasis. Since its discovery, pituitary adenylate cyclase-activating polypeptide appears to play an important role in the regulation of processes within the central nervous system and gastrointestinal tract, as well in reproductive biology. Pituitary adenylate cyclase-activating polypeptide has been shown to regulate tumor cell growth and to regulate immune function through its effects on T lympocytes. These discoveries suggest the importance of pituitary adenylate cyclase-activating polypeptide in neuronal development, neuronal function, gastrointestinal tract function and reproduction. Future studies will examine more closely the role of pituitary adenylate cyclase-activating polypeptide in regulation of malignantly transformed cells, as well as in regulation of immune function.
Benchmarking Outpatient Rehabilitation Clinics Using Functional Status Outcomes.
Gozalo, Pedro L; Resnik, Linda J; Silver, Benjamin
2016-04-01
To utilize functional status (FS) outcomes to benchmark outpatient therapy clinics. Outpatient therapy data from clinics using Focus on Therapeutic Outcomes (FOTO) assessments. Retrospective analysis of 538 clinics, involving 2,040 therapists and 90,392 patients admitted July 2006-June 2008. FS at discharge was modeled using hierarchical regression methods with patients nested within therapists within clinics. Separate models were estimated for all patients, for those with lumbar, and for those with shoulder impairments. All models risk-adjusted for intake FS, age, gender, onset, surgery count, functional comorbidity index, fear-avoidance level, and payer type. Inverse probability weighting adjusted for censoring. Functional status was captured using computer adaptive testing at intake and at discharge. Clinic and therapist effects explained 11.6 percent of variation in FS. Clinics ranked in the lowest quartile had significantly different outcomes than those in the highest quartile (p < .01). Clinics ranked similarly in lumbar and shoulder impairments (correlation = 0.54), but some clinics ranked in the highest quintile for one condition and in the lowest for the other. Benchmarking models based on validated FS measures clearly separated high-quality from low-quality clinics, and they could be used to inform value-based-payment policies. © Health Research and Educational Trust.
Identification of Functionally Related Enzymes by Learning-to-Rank Methods.
Stock, Michiel; Fober, Thomas; Hüllermeier, Eyke; Glinca, Serghei; Klebe, Gerhard; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem
2014-01-01
Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.
H-index of Collective Health professors in Brazil.
Pereira, Julio Cesar Rodrigues; Bronhara, Bruna
2011-06-01
To estimate reference values and the hierarchy function of professors engaged in Collective Health in Brazil by analyzing the distribution of the h-index. From the Portal da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Portal of Coordination for the Improvement of Higher Education Personnel ), 934 authors were identified in 2008, of whom 819 were analyzed. The h-index of each professor was obtained through the Web of Science using search algorithms controlling for namesakes and alternative spellings of their names. For each Brazilian region and for the country as a whole, we adjusted an exponential probability density function to provide the population parameters and rate of decline by region. Ranking measures were identified using the complement of the cumulative probability function and the hierarchy function among authors according to the h-index by region. Among the professors analyzed, 29.8% had no citation record in Web of Science (h=0). The mean h for the country was 3.1, and the region with greatest mean was the southern region (h=4.7). The median h for the country was 3.1, and the greatest median was for the southern region (3.2). Standardizing populations to one hundred, the first rank in the country was h=16, but stratification by region shows that, within the northeastern, southeastern and southern regions, a greater value is necessary for achieving the first rank. In the southern region, the index needed to achieve the first rank was h=24. Most of the Brazilian Collective Health authors, if assessed on the basis of the Web of Science h-index, did not exceed h=5. Regional differences exist, with the southeastern and northeastern regions being similar and the southern region being outstanding.
Adsorption isotherms and kinetics of activated carbons produced from coals of different ranks.
Purevsuren, B; Lin, Chin-Jung; Davaajav, Y; Ariunaa, A; Batbileg, S; Avid, B; Jargalmaa, S; Huang, Yu; Liou, Sofia Ya-Hsuan
2015-01-01
Activated carbons (ACs) from six coals, ranging from low-rank lignite brown coal to high-rank stone coal, were utilized as adsorbents to remove basic methylene blue (MB) from an aqueous solution. The surface properties of the obtained ACs were characterized via thermal analysis, N2 isothermal sorption, scanning electron microscopy, Fourier transform infrared spectroscopy, X-ray photoelectron spectroscopy and Boehm titration. As coal rank decreased, an increase in the heterogeneity of the pore structures and abundance of oxygen-containing functional groups increased MB coverage on its surface. The equilibrium data fitted well with the Langmuir model, and adsorption capacity of MB ranged from 51.8 to 344.8 mg g⁻¹. Good correlation coefficients were obtained using the intra-particle diffusion model, indicating that the adsorption of MB onto ACs is diffusion controlled. The values of the effective diffusion coefficient ranged from 0.61 × 10⁻¹⁰ to 7.1 × 10⁻¹⁰ m² s⁻¹, indicating that ACs from lower-rank coals have higher effective diffusivities. Among all the ACs obtained from selected coals, the AC from low-rank lignite brown coal was the most effective in removing MB from an aqueous solution.
CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition
Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe
2013-01-01
Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764
Constrained Low-Rank Learning Using Least Squares-Based Regularization.
Li, Ping; Yu, Jun; Wang, Meng; Zhang, Luming; Cai, Deng; Li, Xuelong
2017-12-01
Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.
Sirvent, Juan Alberto; Lücking, Ulrich
2017-04-06
Sulfoximines have gained considerable recognition as an important structural motif in drug discovery of late. In particular, the clinical kinase inhibitors for the treatment of cancer, roniciclib (pan-CDK inhibitor), BAY 1143572 (P-TEFb inhibitor), and AZD 6738 (ATR inhibitor), have recently drawn considerable attention. Whilst the interest in this underrepresented functional group in drug discovery is clearly on the rise, there remains an incomplete understanding of the medicinal-chemistry-relevant properties of sulfoximines. Herein we report the synthesis and in vitro characterization of a variety of sulfoximine analogues of marketed drugs and advanced clinical candidates to gain a better understanding of this neglected functional group and its potential in drug discovery. © 2017 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Samara, Stavroula; Dailiana, Zoe; Chassanidis, Christos; Koromila, Theodora; Papatheodorou, Loukia; Malizos, Konstantinos N; Kollia, Panagoula
2014-02-01
Femoral head avascular necrosis (AVN) is a recalcitrant disease of the hip that leads to joint destruction. Osteoprotegerin (OPG), Receptor Activator of Nuclear Factor kappa-B (RANK) and RANK ligand (RANKL) regulate the balance between osteoclasts-osteoblasts. The expression of these genes affects the maturation and function of osteoblasts-osteoclasts and bone remodeling. In this study, we investigated the molecular pathways leading to AVN by studying the expression profile of OPG, RANK and RANKL genes. Quantitative Real Time-PCR was performed for evaluation of OPG, RANK and RANKL expression. Analysis was based on parallel evaluation of mRNA and protein levels in normal/necrotic sites of 42 osteonecrotic femoral heads (FHs). OPG and RANKL protein levels were estimated by western blotting. The OPG mRNA levels were higher (insignificantly) in the necrotic than the normal site (p > 0.05). Although the expression of RANK and RANKL was significantly lower than OPG in both sites, RANK and RANKL mRNA levels were higher in the necrotic part than the normal (p < 0.05). Protein levels of OPG and RANKL showed no remarkable divergence. Our results indicate that differential expression mechanisms for OPG, RANK and RANKL that could play an important role in the progress of bone remodeling in the necrotic area, disturbing bone homeostasis. This finding may have an effect on the resulting bone destruction and the subsequent collapse of the hip joint. Copyright © 2013. Published by Elsevier Inc.
WFIRST-AFTA Presentation to the NRC Mid-Decadal Panel
NASA Technical Reports Server (NTRS)
Gehrels, Neil; Grady, Kevin; Ruffa, John; Melton, Mark; Content, Dave; Zhao, Feng
2015-01-01
Over the past two years, increased funding has enabled significant progress in technology matura1on as well as addi1onal fidelity in the design reference mission. WFIRST with the 2.4--m telescope and coronagraph provides an exci1ng science program, superior to that recommended by NWNH and also advances exoplanet imaging technology (the highest ranked medium--class NWNH recommenda1on). Great opportunity for astronomy and astrophysics discoveries. Broad community support for WFIRST. Key development areas are anchored in a decade of investments in JPL's HCIT and GSFC's DCL. Great progress made in pre--formula1on, ready for KDP--A and launch in mid--2020s.
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics
Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven
2011-01-01
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Privacy-Preserving Relationship Path Discovery in Social Networks
NASA Astrophysics Data System (ADS)
Mezzour, Ghita; Perrig, Adrian; Gligor, Virgil; Papadimitratos, Panos
As social networks sites continue to proliferate and are being used for an increasing variety of purposes, the privacy risks raised by the full access of social networking sites over user data become uncomfortable. A decentralized social network would help alleviate this problem, but offering the functionalities of social networking sites is a distributed manner is a challenging problem. In this paper, we provide techniques to instantiate one of the core functionalities of social networks: discovery of paths between individuals. Our algorithm preserves the privacy of relationship information, and can operate offline during the path discovery phase. We simulate our algorithm on real social network topologies.
Liseron-Monfils, Christophe; Lewis, Tim; Ashlock, Daniel; McNicholas, Paul D; Fauteux, François; Strömvik, Martina; Raizada, Manish N
2013-03-15
The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize. A benchmark data set was used to evaluate the accuracy of three motif discovery programs: BioProspector, Weeder and MEME. Analysis showed that each motif discovery tool had limited accuracy and appeared to retrieve a distinct set of motifs. Therefore, using the benchmark, statistical filters were optimized to reduce the false discovery ratio, and then remaining motifs from all programs were combined to improve motif prediction. These principles were integrated into a user-friendly pipeline for motif discovery in maize called Promzea, available at http://www.promzea.org and on the Discovery Environment of the iPlant Collaborative website. Promzea was subsequently expanded to include rice and Arabidopsis. Within Promzea, a user enters cDNA sequences or gene IDs; corresponding upstream sequences are retrieved from the maize genome. Predicted motifs are filtered, combined and ranked. Promzea searches the chosen plant genome for genes containing each candidate motif, providing the user with the gene list and corresponding gene annotations. Promzea was validated in silico using a benchmark data set: the Promzea pipeline showed a 22% increase in nucleotide sensitivity compared to the best standalone program tool, Weeder, with equivalent nucleotide specificity. Promzea was also validated by its ability to retrieve the experimentally defined binding sites of transcription factors that regulate the maize anthocyanin and phlobaphene biosynthetic pathways. Promzea predicted additional promoter motifs, and genome-wide motif searches by Promzea identified 127 non-anthocyanin/phlobaphene genes that each contained all five predicted promoter motifs in their promoters, perhaps uncovering a broader co-regulated gene network. Promzea was also tested against tissue-specific microarray data from maize. An online tool customized for promoter motif discovery in plants has been generated called Promzea. Promzea was validated in silico by its ability to retrieve benchmark motifs and experimentally defined motifs and was tested using tissue-specific microarray data. Promzea predicted broader networks of gene regulation associated with the historic anthocyanin and phlobaphene biosynthetic pathways. Promzea is a new bioinformatics tool for understanding transcriptional gene regulation in maize and has been expanded to include rice and Arabidopsis.
Quantifying the Ease of Scientific Discovery
Arbesman, Samuel
2012-01-01
It has long been known that scientific output proceeds on an exponential increase, or more properly, a logistic growth curve. The interplay between effort and discovery is clear, and the nature of the functional form has been thought to be due to many changes in the scientific process over time. Here I show a quantitative method for examining the ease of scientific progress, another necessary component in understanding scientific discovery. Using examples from three different scientific disciplines – mammalian species, chemical elements, and minor planets – I find the ease of discovery to conform to an exponential decay. In addition, I show how the pace of scientific discovery can be best understood as the outcome of both scientific output and ease of discovery. A quantitative study of the ease of scientific discovery in the aggregate, such as done here, has the potential to provide a great deal of insight into both the nature of future discoveries and the technical processes behind discoveries in science. PMID:22328796
Quantifying the Ease of Scientific Discovery.
Arbesman, Samuel
2011-02-01
It has long been known that scientific output proceeds on an exponential increase, or more properly, a logistic growth curve. The interplay between effort and discovery is clear, and the nature of the functional form has been thought to be due to many changes in the scientific process over time. Here I show a quantitative method for examining the ease of scientific progress, another necessary component in understanding scientific discovery. Using examples from three different scientific disciplines - mammalian species, chemical elements, and minor planets - I find the ease of discovery to conform to an exponential decay. In addition, I show how the pace of scientific discovery can be best understood as the outcome of both scientific output and ease of discovery. A quantitative study of the ease of scientific discovery in the aggregate, such as done here, has the potential to provide a great deal of insight into both the nature of future discoveries and the technical processes behind discoveries in science.
Comparison: Discovery on WSMOLX and miAamics/jABC
NASA Astrophysics Data System (ADS)
Kubczak, Christian; Vitvar, Tomas; Winkler, Christian; Zaharia, Raluca; Zaremba, Maciej
This chapter compares the solutions to the SWS-Challenge discovery problems provided by DERI Galway and the joint solution from the Technical University of Dortmund and University of Postdam. The two approaches are described in depth in Chapters 10 and 13. The discovery scenario raises problems associated with making service discovery an automated process. It requires fine-grained specifications of search requests and service functionality including support for fetching dynamic information during the discovery process (e.g., shipment price). Both teams utilize semantics to describe services, service requests and data models in order to enable search at the required fine-grained level of detail.
Compressive Sensing via Nonlocal Smoothed Rank Function
Fan, Ya-Ru; Liu, Jun; Zhao, Xi-Le
2016-01-01
Compressive sensing (CS) theory asserts that we can reconstruct signals and images with only a small number of samples or measurements. Recent works exploiting the nonlocal similarity have led to better results in various CS studies. To better exploit the nonlocal similarity, in this paper, we propose a non-convex smoothed rank function based model for CS image reconstruction. We also propose an efficient alternating minimization method to solve the proposed model, which reduces a difficult and coupled problem to two tractable subproblems. Experimental results have shown that the proposed method performs better than several existing state-of-the-art CS methods for image reconstruction. PMID:27583683
A Study of the Dependence of the Properties of Galaxy Clusters on Cluster Morphology.
NASA Astrophysics Data System (ADS)
Lugger, Phyllis Minnie
1982-03-01
A quantitative study of the properties of clusters of galaxies as a function of cluster morphology has been carried out using photographic plates obtained with the Palomar 48 inch Schmidt telescope. Surface brightness profiles of 35 first ranked cluster galaxies and luminosity functions of nine clusters are presented and analyzed. The dispersion in the metric magnitudes of first ranked galaxies is quite small ((TURN) 0.4 mag) which is consistent with the results of Kristian, Sandage and Westphal as well as Hoessel, Gunn and Thuan. For the cD (supergiant elliptical) galaxy sample, the mean metric magnitude is (TURN) 0.5 mag brighter than for the non-cD galaxies. The dispersion in the metric magnitudes for the 10 cD galaxies studied is found to be much smaller ((sigma) (TURN) 0.1 mag) than the dispersion in the metric magnitudes of the non-cD first ranked galaxies ((sigma) (TURN) 0.4 mag). The de Vaucouleurs effective radius - magnitude relation determined in the present study for first ranked galaxies (log r(,e) = -0.2 M + const.) is consistent with the extrapolations to brighter magnitudes of the range of relations found by Strom and Strom. The average residuals from the mean radius-magnitude relation for the cD and non-cD galaxy samples were not found to differ at a significant level. Luminosity functions for the region within 0.5 Mpc of the cluster center for three of the clusters studied (A1656, A2147, and A2199) show a deficit of bright galaxies when compared to a concentric annular region with bounds of 0.5 and 1.0 Mpc. Characteristic magnitudes for the nine clusters (determined from square regions 4.6 Mpc on a side) show no significant correlation with cluster morphology, central density, or total magnitude of the first ranked galaxy. The mean values of the Schechter function parameters M('*) and (alpha) are in very good agreement with the previous determinations by Schechter and by Dressler. The differential luminosity functions for A569 and A1656 do not rise monotonically to fainter magnitudes but instead show dips. These data are used to test predictions of several recent theories of the dynamical evolution of clusters of galaxies.
Learning Robust and Discriminative Subspace With Low-Rank Constraints.
Li, Sheng; Fu, Yun
2016-11-01
In this paper, we aim at learning robust and discriminative subspaces from noisy data. Subspace learning is widely used in extracting discriminative features for classification. However, when data are contaminated with severe noise, the performance of most existing subspace learning methods would be limited. Recent advances in low-rank modeling provide effective solutions for removing noise or outliers contained in sample sets, which motivates us to take advantage of low-rank constraints in order to exploit robust and discriminative subspace for classification. In particular, we present a discriminative subspace learning method called the supervised regularization-based robust subspace (SRRS) approach, by incorporating the low-rank constraint. SRRS seeks low-rank representations from the noisy data, and learns a discriminative subspace from the recovered clean data jointly. A supervised regularization function is designed to make use of the class label information, and therefore to enhance the discriminability of subspace. Our approach is formulated as a constrained rank-minimization problem. We design an inexact augmented Lagrange multiplier optimization algorithm to solve it. Unlike the existing sparse representation and low-rank learning methods, our approach learns a low-dimensional subspace from recovered data, and explicitly incorporates the supervised information. Our approach and some baselines are evaluated on the COIL-100, ALOI, Extended YaleB, FERET, AR, and KinFace databases. The experimental results demonstrate the effectiveness of our approach, especially when the data contain considerable noise or variations.
Soneson, Charlotte; Fontes, Magnus
2012-01-01
Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.
Wilde, Elisabeth A.; Moretti, Paolo; MacLeod, Marianne C.; Pedroza, Claudia; Drever, Pamala; Fourwinds, Sierra; Frisby, Melisa L.; Beers, Sue R.; Scott, James N.; Hunter, Jill V.; Traipe, Elfrides; Valadka, Alex B.; Okonkwo, David O.; Zygun, David A.; Puccio, Ava M.; Clifton, Guy L.
2013-01-01
Abstract The Neurological Outcome Scale for Traumatic Brain Injury (NOS-TBI) is a measure assessing neurological functioning in patients with TBI. We hypothesized that the NOS-TBI would exhibit adequate concurrent and predictive validity and demonstrate more sensitivity to change, compared with other well-established outcome measures. We analyzed data from the National Acute Brain Injury Study: Hypothermia-II clinical trial. Participants were 16–45 years of age with severe TBI assessed at 1, 3, 6, and 12 months postinjury. For analysis of criterion-related validity (concurrent and predictive), Spearman's rank-order correlations were calculated between the NOS-TBI and the Glasgow Outcome Scale (GOS), GOS-Extended (GOS-E), Disability Rating Scale (DRS), and Neurobehavioral Rating Scale-Revised (NRS-R). Concurrent validity was demonstrated through significant correlations between the NOS-TBI and GOS, GOS-E, DRS, and NRS-R measured contemporaneously at 3, 6, and 12 months postinjury (all p<0.0013). For prediction analyses, the multiplicity-adjusted p value using the false discovery rate was <0.015. The 1-month NOS-TBI score was a significant predictor of outcome in the GOS, GOS-E, and DRS at 3 and 6 months postinjury (all p<0.015). The 3-month NOS-TBI significantly predicted GOS, GOS-E, DRS, and NRS-R outcomes at 6 and 12 months postinjury (all p<0.0015). Sensitivity to change was analyzed using Wilcoxon's signed rank-sum test of subsamples demonstrating no change in the GOS or GOS-E between 3 and 6 months. The NOS-TBI demonstrated higher sensitivity to change, compared with the GOS (p<0.038) and GOS-E (p<0.016). In summary, the NOS-TBI demonstrated adequate concurrent and predictive validity as well as sensitivity to change, compared with gold-standard outcome measures. The NOS-TBI may enhance prediction of outcome in clinical practice and measurement of outcome in TBI research. PMID:23617608
Liu, Xian; Xu, Yuan; Li, Shanshan; Wang, Yulan; Peng, Jianlong; Luo, Cheng; Luo, Xiaomin; Zheng, Mingyue; Chen, Kaixian; Jiang, Hualiang
2014-01-01
Ligand-based in silico target fishing can be used to identify the potential interacting target of bioactive ligands, which is useful for understanding the polypharmacology and safety profile of existing drugs. The underlying principle of the approach is that known bioactive ligands can be used as reference to predict the targets for a new compound. We tested a pipeline enabling large-scale target fishing and drug repositioning, based on simple fingerprint similarity rankings with data fusion. A large library containing 533 drug relevant targets with 179,807 active ligands was compiled, where each target was defined by its ligand set. For a given query molecule, its target profile is generated by similarity searching against the ligand sets assigned to each target, for which individual searches utilizing multiple reference structures are then fused into a single ranking list representing the potential target interaction profile of the query compound. The proposed approach was validated by 10-fold cross validation and two external tests using data from DrugBank and Therapeutic Target Database (TTD). The use of the approach was further demonstrated with some examples concerning the drug repositioning and drug side-effects prediction. The promising results suggest that the proposed method is useful for not only finding promiscuous drugs for their new usages, but also predicting some important toxic liabilities. With the rapid increasing volume and diversity of data concerning drug related targets and their ligands, the simple ligand-based target fishing approach would play an important role in assisting future drug design and discovery.
Best Friends: Alliances, Friend Ranking, and the MySpace Social Network.
DeScioli, Peter; Kurzban, Robert; Koch, Elizabeth N; Liben-Nowell, David
2011-01-01
Like many topics of psychological research, the explanation for friendship is at once intuitive and difficult to address empirically. These difficulties worsen when one seeks, as we do, to go beyond "obvious" explanations ("humans are social creatures") to ask deeper questions, such as "What is the evolved function of human friendship?" In recent years, however, a new window into human behavior has opened as a growing fraction of people's social activity has moved online, leaving a wealth of digital traces behind. One example is a feature of the MySpace social network that allows millions of users to rank their "Top Friends." In this study, we collected over 10 million people's friendship decisions from MySpace to test predictions made by hypotheses about human friendship. We found particular support for the alliance hypothesis, which holds that human friendship is caused by cognitive systems that function to create alliances for potential disputes. Because an ally's support can be undermined by a stronger outside relationship, the alliance model predicts that people will prefer partners who rank them above other friends. Consistent with the alliance model, we found that an individual's choice of best friend in MySpace is strongly predicted by how partners rank that individual. © The Author(s) 2011.
Melodic interval perception by normal-hearing listeners and cochlear implant users
Luo, Xin; Masterson, Megan E.; Wu, Ching-Chih
2014-01-01
The perception of melodic intervals (sequential pitch differences) is essential to music perception. This study tested melodic interval perception in normal-hearing (NH) listeners and cochlear implant (CI) users. Melodic interval ranking was tested using an adaptive procedure. CI users had slightly higher interval ranking thresholds than NH listeners. Both groups' interval ranking thresholds, although not affected by root note, significantly increased with standard interval size and were higher for descending intervals than for ascending intervals. The pitch direction effect may be due to a procedural artifact or a difference in central processing. In another test, familiar melodies were played with all the intervals scaled by a single factor. Subjects rated how in tune the melodies were and adjusted the scaling factor until the melodies sounded the most in tune. CI users had lower final interval ratings and less change in interval rating as a function of scaling factor than NH listeners. For CI users, the root-mean-square error of the final scaling factors and the width of the interval rating function were significantly correlated with the average ranking threshold for ascending rather than descending intervals, suggesting that CI users may have focused on ascending intervals when rating and adjusting the melodies. PMID:25324084
Computational methods for a three-dimensional model of the petroleum-discovery process
Schuenemeyer, J.H.; Bawiec, W.J.; Drew, L.J.
1980-01-01
A discovery-process model devised by Drew, Schuenemeyer, and Root can be used to predict the amount of petroleum to be discovered in a basin from some future level of exploratory effort: the predictions are based on historical drilling and discovery data. Because marginal costs of discovery and production are a function of field size, the model can be used to make estimates of future discoveries within deposit size classes. The modeling approach is a geometric one in which the area searched is a function of the size and shape of the targets being sought. A high correlation is assumed between the surface-projection area of the fields and the volume of petroleum. To predict how much oil remains to be found, the area searched must be computed, and the basin size and discovery efficiency must be estimated. The basin is assumed to be explored randomly rather than by pattern drilling. The model may be used to compute independent estimates of future oil at different depth intervals for a play involving multiple producing horizons. We have written FORTRAN computer programs that are used with Drew, Schuenemeyer, and Root's model to merge the discovery and drilling information and perform the necessary computations to estimate undiscovered petroleum. These program may be modified easily for the estimation of remaining quantities of commodities other than petroleum. ?? 1980.
NASA Astrophysics Data System (ADS)
Khoromskaia, Venera; Khoromskij, Boris N.
2014-12-01
Our recent method for low-rank tensor representation of sums of the arbitrarily positioned electrostatic potentials discretized on a 3D Cartesian grid reduces the 3D tensor summation to operations involving only 1D vectors however retaining the linear complexity scaling in the number of potentials. Here, we introduce and study a novel tensor approach for fast and accurate assembled summation of a large number of lattice-allocated potentials represented on 3D N × N × N grid with the computational requirements only weakly dependent on the number of summed potentials. It is based on the assembled low-rank canonical tensor representations of the collected potentials using pointwise sums of shifted canonical vectors representing the single generating function, say the Newton kernel. For a sum of electrostatic potentials over L × L × L lattice embedded in a box the required storage scales linearly in the 1D grid-size, O(N) , while the numerical cost is estimated by O(NL) . For periodic boundary conditions, the storage demand remains proportional to the 1D grid-size of a unit cell, n = N / L, while the numerical cost reduces to O(N) , that outperforms the FFT-based Ewald-type summation algorithms of complexity O(N3 log N) . The complexity in the grid parameter N can be reduced even to the logarithmic scale O(log N) by using data-sparse representation of canonical N-vectors via the quantics tensor approximation. For justification, we prove an upper bound on the quantics ranks for the canonical vectors in the overall lattice sum. The presented approach is beneficial in applications which require further functional calculus with the lattice potential, say, scalar product with a function, integration or differentiation, which can be performed easily in tensor arithmetics on large 3D grids with 1D cost. Numerical tests illustrate the performance of the tensor summation method and confirm the estimated bounds on the tensor ranks.
Gelernter, Joel; Sherva, Richard; Koesterer, Ryan; Almasy, Laura; Zhao, Hongyu; Kranzler, Henry R.; Farrer, Lindsay
2013-01-01
We report a GWAS for cocaine dependence (CD) in three sets of African- and European-American subjects (AAs and EAs, respectively), to identify pathways, genes, and alleles important in CD risk. The discovery GWAS dataset (n=5,697 subjects) was genotyped using the Illumina OmniQuad microarray (890,000 analyzed SNPs). Additional genotypes were imputed based on the 1000 Genomes reference panel. Top-ranked findings were evaluated by incorporating information from publicly available GWAS data from 4,063 subjects. Then, the most significant GWAS SNPs were genotyped in 2,549 independent subjects. We observed one genomewide-significant (GWS) result: rs7086629 at the FAM53B (“family with sequence similarity 53, member B”) locus. This was supported in both AAs and EAs; p-value (meta-analysis of all samples) =4.28×10−8. The gene maps to the same chromosomal region as the maximum peak we observed in a previous linkage study. NCOR2 (nuclear receptor corepressor 1) SNP rs150954431 was associated with p=1.19×10−9 in the EA discovery sample. SNP rs2456778, which maps to CDK1 (“cyclin-dependent kinase 1”), was associated with cocaine-induced paranoia in AAs in the discovery sample only (p=4.68×10−8). This is the first study to identify risk variants for CD using GWAS. Our results implicate novel risk loci and provide insights into potential therapeutic and prevention strategies. PMID:23958962
Testing inhomogeneous solvation theory in structure-based ligand discovery.
Balius, Trent E; Fischer, Marcus; Stein, Reed M; Adler, Thomas B; Nguyen, Crystal N; Cruz, Anthony; Gilson, Michael K; Kurtzman, Tom; Shoichet, Brian K
2017-08-15
Binding-site water is often displaced upon ligand recognition, but is commonly neglected in structure-based ligand discovery. Inhomogeneous solvation theory (IST) has become popular for treating this effect, but it has not been tested in controlled experiments at atomic resolution. To do so, we turned to a grid-based version of this method, GIST, readily implemented in molecular docking. Whereas the term only improves docking modestly in retrospective ligand enrichment, it could be added without disrupting performance. We thus turned to prospective docking of large libraries to investigate GIST's impact on ligand discovery, geometry, and water structure in a model cavity site well-suited to exploring these terms. Although top-ranked docked molecules with and without the GIST term often overlapped, many ligands were meaningfully prioritized or deprioritized; some of these were selected for testing. Experimentally, 13/14 molecules prioritized by GIST did bind, whereas none of the molecules that it deprioritized were observed to bind. Nine crystal complexes were determined. In six, the ligand geometry corresponded to that predicted by GIST, for one of these the pose without the GIST term was wrong, and three crystallographic poses differed from both predictions. Notably, in one structure, an ordered water molecule with a high GIST displacement penalty was observed to stay in place. Inclusion of this water-displacement term can substantially improve the hit rates and ligand geometries from docking screens, although the magnitude of its effects can be small and its impact in drug binding sites merits further controlled studies.
Application of chemical biology in target identification and drug discovery.
Zhu, Yue; Xiao, Ting; Lei, Saifei; Zhou, Fulai; Wang, Ming-Wei
2015-09-01
Drug discovery and development is vital to the well-being of mankind and sustainability of the pharmaceutical industry. Using chemical biology approaches to discover drug leads has become a widely accepted path partially because of the completion of the Human Genome Project. Chemical biology mainly solves biological problems through searching previously unknown targets for pharmacologically active small molecules or finding ligands for well-defined drug targets. It is a powerful tool to study how these small molecules interact with their respective targets, as well as their roles in signal transduction, molecular recognition and cell functions. There have been an increasing number of new therapeutic targets being identified and subsequently validated as a result of advances in functional genomics, which in turn led to the discovery of numerous active small molecules via a variety of high-throughput screening initiatives. In this review, we highlight some applications of chemical biology in the context of drug discovery.
Discovery of novel drug targets and their functions using phenotypic screening of natural products.
Chang, Junghwa; Kwon, Ho Jeong
2016-03-01
Natural products are valuable resources that provide a variety of bioactive compounds and natural pharmacophores in modern drug discovery. Discovery of biologically active natural products and unraveling their target proteins to understand their mode of action have always been critical hurdles for their development into clinical drugs. For effective discovery and development of bioactive natural products into novel therapeutic drugs, comprehensive screening and identification of target proteins are indispensable. In this review, a systematic approach to understanding the mode of action of natural products isolated using phenotypic screening involving chemical proteomics-based target identification is introduced. This review highlights three natural products recently discovered via phenotypic screening, namely glucopiericidin A, ecumicin, and terpestacin, as representative case studies to revisit the pivotal role of natural products as powerful tools in discovering the novel functions and druggability of targets in biological systems and pathological diseases of interest.
Efficient discovery of bioactive scaffolds by activity-directed synthesis
NASA Astrophysics Data System (ADS)
Karageorgis, George; Warriner, Stuart; Nelson, Adam
2014-10-01
The structures and biological activities of natural products have often provided inspiration in drug discovery. The functional benefits of natural products to the host organism steers the evolution of their biosynthetic pathways. Here, we describe a discovery approach—which we term activity-directed synthesis—in which reactions with alternative outcomes are steered towards functional products. Arrays of catalysed reactions of α-diazo amides, whose outcome was critically dependent on the specific conditions used, were performed. The products were assayed at increasingly low concentration, with the results informing the design of a subsequent reaction array. Finally, promising reactions were scaled up and, after purification, submicromolar ligands based on two scaffolds with no previous annotated activity against the androgen receptor were discovered. The approach enables the discovery, in tandem, of both bioactive small molecules and associated synthetic routes, analogous to the evolution of biosynthetic pathways to yield natural products.
Gerlt, John A
2017-08-22
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of "genomic enzymology" web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence-function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems.
2017-01-01
The exponentially increasing number of protein and nucleic acid sequences provides opportunities to discover novel enzymes, metabolic pathways, and metabolites/natural products, thereby adding to our knowledge of biochemistry and biology. The challenge has evolved from generating sequence information to mining the databases to integrating and leveraging the available information, i.e., the availability of “genomic enzymology” web tools. Web tools that allow identification of biosynthetic gene clusters are widely used by the natural products/synthetic biology community, thereby facilitating the discovery of novel natural products and the enzymes responsible for their biosynthesis. However, many novel enzymes with interesting mechanisms participate in uncharacterized small-molecule metabolic pathways; their discovery and functional characterization also can be accomplished by leveraging information in protein and nucleic acid databases. This Perspective focuses on two genomic enzymology web tools that assist the discovery novel metabolic pathways: (1) Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST) for generating sequence similarity networks to visualize and analyze sequence–function space in protein families and (2) Enzyme Function Initiative-Genome Neighborhood Tool (EFI-GNT) for generating genome neighborhood networks to visualize and analyze the genome context in microbial and fungal genomes. Both tools have been adapted to other applications to facilitate target selection for enzyme discovery and functional characterization. As the natural products community has demonstrated, the enzymology community needs to embrace the essential role of web tools that allow the protein and genome sequence databases to be leveraged for novel insights into enzymological problems. PMID:28826221
Sgobba, Miriam; Caporuscio, Fabiana; Anighoro, Andrew; Portioli, Corinne; Rastelli, Giulio
2012-12-01
In the last decades, molecular docking has emerged as an increasingly useful tool in the modern drug discovery process, but it still needs to overcome many hurdles and limitations such as how to account for protein flexibility and poor scoring function performance. For this reason, it has been recognized that in many cases docking results need to be post-processed to achieve a significant agreement with experimental activities. In this study, we have evaluated the performance of MM-PBSA and MM-GBSA scoring functions, implemented in our post-docking procedure BEAR, in rescoring docking solutions. For the first time, the performance of this post-docking procedure has been evaluated on six different biological targets (namely estrogen receptor, thymidine kinase, factor Xa, adenosine deaminase, aldose reductase, and enoyl ACP reductase) by using i) both a single and a multiple protein conformation approach, and ii) two different software, namely AutoDock and LibDock. The assessment has been based on two of the most important criteria for the evaluation of docking methods, i.e., the ability of known ligands to enrich the top positions of a ranked database with respect to molecular decoys, and the consistency of the docking poses with crystallographic binding modes. We found that, in many cases, MM-PBSA and MM-GBSA are able to yield higher enrichment factors compared to those obtained with the docking scoring functions alone. However, for only a minority of the cases, the enrichment factors obtained by using multiple protein conformations were higher than those obtained by using only one protein conformation. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
CRISPR/Cas9: From Genome Engineering to Cancer Drug Discovery
Luo, Ji
2016-01-01
Advances in translational research are often driven by new technologies. The advent of microarrays, next-generation sequencing, proteomics and RNA interference (RNAi) have led to breakthroughs in our understanding of the mechanisms of cancer and the discovery of new cancer drug targets. The discovery of the bacterial clustered regularly interspaced palindromic repeat (CRISPR) system and its subsequent adaptation as a tool for mammalian genome engineering has opened up new avenues for functional genomics studies. This review will focus on the utility of CRISPR in the context of cancer drug target discovery. PMID:28603775
Trace Norm Regularized CANDECOMP/PARAFAC Decomposition With Missing Data.
Liu, Yuanyuan; Shang, Fanhua; Jiao, Licheng; Cheng, James; Cheng, Hong
2015-11-01
In recent years, low-rank tensor completion (LRTC) problems have received a significant amount of attention in computer vision, data mining, and signal processing. The existing trace norm minimization algorithms for iteratively solving LRTC problems involve multiple singular value decompositions of very large matrices at each iteration. Therefore, they suffer from high computational cost. In this paper, we propose a novel trace norm regularized CANDECOMP/PARAFAC decomposition (TNCP) method for simultaneous tensor decomposition and completion. We first formulate a factor matrix rank minimization model by deducing the relation between the rank of each factor matrix and the mode- n rank of a tensor. Then, we introduce a tractable relaxation of our rank function, and then achieve a convex combination problem of much smaller-scale matrix trace norm minimization. Finally, we develop an efficient algorithm based on alternating direction method of multipliers to solve our problem. The promising experimental results on synthetic and real-world data validate the effectiveness of our TNCP method. Moreover, TNCP is significantly faster than the state-of-the-art methods and scales to larger problems.
Discriminative Multi-View Interactive Image Re-Ranking.
Li, Jun; Xu, Chang; Yang, Wankou; Sun, Changyin; Tao, Dacheng
2017-07-01
Given an unreliable visual patterns and insufficient query information, content-based image retrieval is often suboptimal and requires image re-ranking using auxiliary information. In this paper, we propose a discriminative multi-view interactive image re-ranking (DMINTIR), which integrates user relevance feedback capturing users' intentions and multiple features that sufficiently describe the images. In DMINTIR, heterogeneous property features are incorporated in the multi-view learning scheme to exploit their complementarities. In addition, a discriminatively learned weight vector is obtained to reassign updated scores and target images for re-ranking. Compared with other multi-view learning techniques, our scheme not only generates a compact representation in the latent space from the redundant multi-view features but also maximally preserves the discriminative information in feature encoding by the large-margin principle. Furthermore, the generalization error bound of the proposed algorithm is theoretically analyzed and shown to be improved by the interactions between the latent space and discriminant function learning. Experimental results on two benchmark data sets demonstrate that our approach boosts baseline retrieval quality and is competitive with the other state-of-the-art re-ranking strategies.
Smoothed low rank and sparse matrix recovery by iteratively reweighted least squares minimization.
Lu, Canyi; Lin, Zhouchen; Yan, Shuicheng
2015-02-01
This paper presents a general framework for solving the low-rank and/or sparse matrix minimization problems, which may involve multiple nonsmooth terms. The iteratively reweighted least squares (IRLSs) method is a fast solver, which smooths the objective function and minimizes it by alternately updating the variables and their weights. However, the traditional IRLS can only solve a sparse only or low rank only minimization problem with squared loss or an affine constraint. This paper generalizes IRLS to solve joint/mixed low-rank and sparse minimization problems, which are essential formulations for many tasks. As a concrete example, we solve the Schatten-p norm and l2,q-norm regularized low-rank representation problem by IRLS, and theoretically prove that the derived solution is a stationary point (globally optimal if p,q ≥ 1). Our convergence proof of IRLS is more general than previous one that depends on the special properties of the Schatten-p norm and l2,q-norm. Extensive experiments on both synthetic and real data sets demonstrate that our IRLS is much more efficient.
Sojod, Bouchra; Chateau, Danielle; Mueller, Christopher G.; Babajko, Sylvie; Berdal, Ariane; Lézot, Frédéric; Castaneda, Beatriz
2017-01-01
Periodontitis is based on a complex inflammatory over-response combined with possible genetic predisposition factors. The RANKL/RANK/OPG signaling pathway is implicated in bone resorption through its key function in osteoclast differentiation and activation, as well as in the inflammatory response. This central element of osteo-immunology has been suggested to be perturbed in several diseases, including periodontitis, as it is a predisposing factor for this disease. The aim of the present study was to validate this hypothesis using a transgenic mouse line, which over-expresses RANK (RTg) and develops a periodontitis-like phenotype at 5 months of age. RTg mice exhibited severe alveolar bone loss, an increased number of TRAP positive cells, and disorganization of periodontal ligaments. This phenotype was more pronounced in females. We also observed dental root resorption lacunas. Hyperplasia of the gingival epithelium, including Malassez epithelial rests, was visible as early as 25 days, preceding any other symptoms. These results demonstrate that perturbations of the RANKL/RANK/OPG system constitute a core element of periodontitis, and more globally, osteo-immune diseases. PMID:28596739
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-09-07
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications.
Udrescu, Lucreţia; Sbârcea, Laura; Topîrceanu, Alexandru; Iovanovici, Alexandru; Kurunczi, Ludovic; Bogdan, Paul; Udrescu, Mihai
2016-01-01
Analyzing drug-drug interactions may unravel previously unknown drug action patterns, leading to the development of new drug discovery tools. We present a new approach to analyzing drug-drug interaction networks, based on clustering and topological community detection techniques that are specific to complex network science. Our methodology uncovers functional drug categories along with the intricate relationships between them. Using modularity-based and energy-model layout community detection algorithms, we link the network clusters to 9 relevant pharmacological properties. Out of the 1141 drugs from the DrugBank 4.1 database, our extensive literature survey and cross-checking with other databases such as Drugs.com, RxList, and DrugBank 4.3 confirm the predicted properties for 85% of the drugs. As such, we argue that network analysis offers a high-level grasp on a wide area of pharmacological aspects, indicating possible unaccounted interactions and missing pharmacological properties that can lead to drug repositioning for the 15% drugs which seem to be inconsistent with the predicted property. Also, by using network centralities, we can rank drugs according to their interaction potential for both simple and complex multi-pathology therapies. Moreover, our clustering approach can be extended for applications such as analyzing drug-target interactions or phenotyping patients in personalized medicine applications. PMID:27599720
Evaluating the Predictivity of Virtual Screening for Abl Kinase Inhibitors to Hinder Drug Resistance
Gani, Osman A B S M; Narayanan, Dilip; Engh, Richard A
2013-01-01
Virtual screening methods are now widely used in early stages of drug discovery, aiming to rank potential inhibitors. However, any practical ligand set (of active or inactive compounds) chosen for deriving new virtual screening approaches cannot fully represent all relevant chemical space for potential new compounds. In this study, we have taken a retrospective approach to evaluate virtual screening methods for the leukemia target kinase ABL1 and its drug-resistant mutant ABL1-T315I. ‘Dual active’ inhibitors against both targets were grouped together with inactive ligands chosen from different decoy sets and tested with virtual screening approaches with and without explicit use of target structures (docking). We show how various scoring functions and choice of inactive ligand sets influence overall and early enrichment of the libraries. Although ligand-based methods, for example principal component analyses of chemical properties, can distinguish some decoy sets from active compounds, the addition of target structural information via docking improves enrichment, and explicit consideration of multiple target conformations (i.e. types I and II) achieves best enrichment of active versus inactive ligands, even without assuming knowledge of the binding mode. We believe that this study can be extended to other therapeutically important kinases in prospective virtual screening studies. PMID:23746052
Young, Simon W; Dakic, Jodie; Stroia, Kathleen; Nguyen, Michael L; Safran, Marc R
2017-07-01
To assess the outcome and time to return to previous level of competitive play after shoulder surgery in professional tennis players. Retrospective case series. Tertiary academic centre. The records of all female tennis players on the Women's Tennis Association (WTA) professional circuit between January 2008 and June 2010 were reviewed to identify players who underwent shoulder surgery on their dominant (serving) shoulder. Primary outcomes were the ability and time to return to professional play and if they were able to return to their previous level of function as determined by singles ranking. Preoperative and postoperative singles rankings were used to determine rate and completeness of return to preoperative function. During the study period, 8 professional women tennis players from the WTA tour underwent shoulder surgery on their dominant arm. Indications included rotator cuff debridement or repair, labral reconstruction for instability or superior labral anterior posterior lesion, and neurolysis of the suprascapular nerve. Seven players (88%) returned to professional play. The mean time to return to play was 7 months after surgery. However, only 25% (2 of 8) players achieved their preinjury singles rank or better by 18 months postoperatively. In total, 4 players returned to their preinjury singles ranking, with their peak singles ranking being attained at a mean of 2.4 years postoperatively. In professional female tennis players, a high return to play rate after arthroscopic shoulder surgery is associated with a prolonged and often incomplete return to previous level of performance. Thus, counseling the patient to this fact is important to manage expectations. Level IV-Case Series.
Carving out the end of the world or (superconformal bootstrap in six dimensions)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chang, Chi-Ming; Lin, Ying-Hsuan
We bootstrap N=(1,0) superconformal field theories in six dimensions, by analyzing the four-point function of flavor current multiplets. By assuming E 8 flavor group, we present universal bounds on the central charge C T and the flavor central charge C J. Based on the numerical data, we conjecture that the rank-one E-string theory saturates the universal lower bound on C J , and numerically determine the spectrum of long multiplets in the rank-one E-string theory. We comment on the possibility of solving the higher-rank E-string theories by bootstrap and thereby probing M-theory on AdS 7×S 4/Z 2 .
Carving out the end of the world or (superconformal bootstrap in six dimensions)
Chang, Chi-Ming; Lin, Ying-Hsuan
2017-08-29
We bootstrap N=(1,0) superconformal field theories in six dimensions, by analyzing the four-point function of flavor current multiplets. By assuming E 8 flavor group, we present universal bounds on the central charge C T and the flavor central charge C J. Based on the numerical data, we conjecture that the rank-one E-string theory saturates the universal lower bound on C J , and numerically determine the spectrum of long multiplets in the rank-one E-string theory. We comment on the possibility of solving the higher-rank E-string theories by bootstrap and thereby probing M-theory on AdS 7×S 4/Z 2 .
Mathur, Sunil; Sadana, Ajit
2015-12-01
We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.
Can Functional Magnetic Resonance Imaging Improve Success Rates in CNS Drug Discovery?
Borsook, David; Hargreaves, Richard; Becerra, Lino
2011-01-01
Introduction The bar for developing new treatments for CNS disease is getting progressively higher and fewer novel mechanisms are being discovered, validated and developed. The high costs of drug discovery necessitate early decisions to ensure the best molecules and hypotheses are tested in expensive late stage clinical trials. The discovery of brain imaging biomarkers that can bridge preclinical to clinical CNS drug discovery and provide a ‘language of translation’ affords the opportunity to improve the objectivity of decision-making. Areas Covered This review discusses the benefits, challenges and potential issues of using a science based biomarker strategy to change the paradigm of CNS drug development and increase success rates in the discovery of new medicines. The authors have summarized PubMed and Google Scholar based publication searches to identify recent advances in functional, structural and chemical brain imaging and have discussed how these techniques may be useful in defining CNS disease state and drug effects during drug development. Expert opinion The use of novel brain imaging biomarkers holds the bold promise of making neuroscience drug discovery smarter by increasing the objectivity of decision making thereby improving the probability of success of identifying useful drugs to treat CNS diseases. Functional imaging holds the promise to: (1) define pharmacodynamic markers as an index of target engagement (2) improve translational medicine paradigms to predict efficacy; (3) evaluate CNS efficacy and safety based on brain activation; (4) determine brain activity drug dose-response relationships and (5) provide an objective evaluation of symptom response and disease modification. PMID:21765857
Maternal effects on offspring stress physiology in wild chimpanzees.
Murray, Carson M; Stanton, Margaret A; Wellens, Kaitlin R; Santymire, Rachel M; Heintz, Matthew R; Lonsdorf, Elizabeth V
2018-01-01
Early life experiences are known to influence hypothalamic-pituitary-adrenal (HPA) axis development, which can impact health outcomes through the individual's ability to mount appropriate physiological reactions to stressors. In primates, these early experiences are most often mediated through the mother and can include the physiological environment experienced during gestation. Here, we investigate stress physiology of dependent offspring in wild chimpanzees for the first time and examine whether differences in maternal stress physiology are related to differences in offspring stress physiology. Specifically, we explore the relationship between maternal rank and maternal fecal glucocorticoid metabolite (FGM) concentration during pregnancy and early lactation (first 6 months post-partum) and examine whether differences based on maternal rank are associated with dependent offspring FGM concentrations. We found that low-ranking females exhibited significantly higher FGM concentrations during pregnancy than during the first 6 months of lactation. Furthermore, during pregnancy, low-ranking females experienced significantly higher FGM concentrations than high-ranking females. As for dependent offspring, we found that male offspring of low-ranking mothers experienced stronger decreases in FGM concentrations as they aged compared to males with high-ranking mothers or their dependent female counterparts. Together, these results suggest that maternal rank and FGM concentrations experienced during gestation are related to offspring stress physiology and that this relationship is particularly pronounced in males compared to females. Importantly, this study provides the first evidence for maternal effects on the development of offspring HPA function in wild chimpanzees, which likely relates to subsequent health and fitness outcomes. Am. J. Primatol. 80:e22525, 2018. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Metagenomics and novel gene discovery
Culligan, Eamonn P; Sleator, Roy D; Marchesi, Julian R; Hill, Colin
2014-01-01
Metagenomics provides a means of assessing the total genetic pool of all the microbes in a particular environment, in a culture-independent manner. It has revealed unprecedented diversity in microbial community composition, which is further reflected in the encoded functional diversity of the genomes, a large proportion of which consists of novel genes. Herein, we review both sequence-based and functional metagenomic methods to uncover novel genes and outline some of the associated problems of each type of approach, as well as potential solutions. Furthermore, we discuss the potential for metagenomic biotherapeutic discovery, with a particular focus on the human gut microbiome and finally, we outline how the discovery of novel genes may be used to create bioengineered probiotics. PMID:24317337
Discovery of functional elements in 12 Drosophila genomes using evolutionary signatures
Stark, Alexander; Lin, Michael F.; Kheradpour, Pouya; Pedersen, Jakob S.; Parts, Leopold; Carlson, Joseph W.; Crosby, Madeline A.; Rasmussen, Matthew D.; Roy, Sushmita; Deoras, Ameya N.; Ruby, J. Graham; Brennecke, Julius; Hodges, Emily; Hinrichs, Angie S.; Caspi, Anat; Paten, Benedict; Park, Seung-Won; Han, Mira V.; Maeder, Morgan L.; Polansky, Benjamin J.; Robson, Bryanne E.; Aerts, Stein; van Helden, Jacques; Hassan, Bassem; Gilbert, Donald G.; Eastman, Deborah A.; Rice, Michael; Weir, Michael; Hahn, Matthew W.; Park, Yongkyu; Dewey, Colin N.; Pachter, Lior; Kent, W. James; Haussler, David; Lai, Eric C.; Bartel, David P.; Hannon, Gregory J.; Kaufman, Thomas C.; Eisen, Michael B.; Clark, Andrew G.; Smith, Douglas; Celniker, Susan E.; Gelbart, William M.; Kellis, Manolis
2008-01-01
Sequencing of multiple related species followed by comparative genomics analysis constitutes a powerful approach for the systematic understanding of any genome. Here, we use the genomes of 12 Drosophila species for the de novo discovery of functional elements in the fly. Each type of functional element shows characteristic patterns of change, or ‘evolutionary signatures’, dictated by its precise selective constraints. Such signatures enable recognition of new protein-coding genes and exons, spurious and incorrect gene annotations, and numerous unusual gene structures, including abundant stop-codon readthrough. Similarly, we predict non-protein-coding RNA genes and structures, and new microRNA (miRNA) genes. We provide evidence of miRNA processing and functionality from both hairpin arms and both DNA strands. We identify several classes of pre- and post-transcriptional regulatory motifs, and predict individual motif instances with high confidence. We also study how discovery power scales with the divergence and number of species compared, and we provide general guidelines for comparative studies. PMID:17994088
Establishment of a 12-gene expression signature to predict colon cancer prognosis
Zhao, Guangxi; Dong, Pingping; Wu, Bingrui
2018-01-01
A robust and accurate gene expression signature is essential to assist oncologists to determine which subset of patients at similar Tumor-Lymph Node-Metastasis (TNM) stage has high recurrence risk and could benefit from adjuvant therapies. Here we applied a two-step supervised machine-learning method and established a 12-gene expression signature to precisely predict colon adenocarcinoma (COAD) prognosis by using COAD RNA-seq transcriptome data from The Cancer Genome Atlas (TCGA). The predictive performance of the 12-gene signature was validated with two independent gene expression microarray datasets: GSE39582 includes 566 COAD cases for the development of six molecular subtypes with distinct clinical, molecular and survival characteristics; GSE17538 is a dataset containing 232 colon cancer patients for the generation of a metastasis gene expression profile to predict recurrence and death in COAD patients. The signature could effectively separate the poor prognosis patients from good prognosis group (disease specific survival (DSS): Kaplan Meier (KM) Log Rank p = 0.0034; overall survival (OS): KM Log Rank p = 0.0336) in GSE17538. For patients with proficient mismatch repair system (pMMR) in GSE39582, the signature could also effectively distinguish high risk group from low risk group (OS: KM Log Rank p = 0.005; Relapse free survival (RFS): KM Log Rank p = 0.022). Interestingly, advanced stage patients were significantly enriched in high 12-gene score group (Fisher’s exact test p = 0.0003). After stage stratification, the signature could still distinguish poor prognosis patients in GSE17538 from good prognosis within stage II (Log Rank p = 0.01) and stage II & III (Log Rank p = 0.017) in the outcome of DFS. Within stage III or II/III pMMR patients treated with Adjuvant Chemotherapies (ACT) and patients with higher 12-gene score showed poorer prognosis (III, OS: KM Log Rank p = 0.046; III & II, OS: KM Log Rank p = 0.041). Among stage II/III pMMR patients with lower 12-gene scores in GSE39582, the subgroup receiving ACT showed significantly longer OS time compared with those who received no ACT (Log Rank p = 0.021), while there is no obvious difference between counterparts among patients with higher 12-gene scores (Log Rank p = 0.12). Besides COAD, our 12-gene signature is multifunctional in several other cancer types including kidney cancer, lung cancer, uveal and skin melanoma, brain cancer, and pancreatic cancer. Functional classification showed that seven of the twelve genes are involved in immune system function and regulation, so our 12-gene signature could potentially be used to guide decisions about adjuvant therapy for patients with stage II/III and pMMR COAD.
The Smoothed Dirichlet Distribution: Understanding Cross-Entropy Ranking in Information Retrieval
2006-07-01
reflect those of the spon- sor. viii ABSTRACT Unigram Language modeling is a successful probabilistic framework for Information Retrieval (IR) that uses...the Relevance model (RM), a state-of-the-art model for IR in the language modeling framework that uses the same cross-entropy as its ranking function...In addition, the SD based classifier provides more flexibility than RM in modeling documents owing to a consistent generative framework . We
Training: The No. 1 Manpower Management Function
ERIC Educational Resources Information Center
Lippert, Frederick G.
1977-01-01
Reports results of a University of Connecticut study in which business administration graduate students in a manpower management course ranked six major functions of a competent personnel department according to perceived importance. A description of the course is included. (TA)
Performance Analysis of Scientific and Engineering Applications Using MPInside and TAU
NASA Technical Reports Server (NTRS)
Saini, Subhash; Mehrotra, Piyush; Taylor, Kenichi Jun Haeng; Shende, Sameer Suresh; Biswas, Rupak
2010-01-01
In this paper, we present performance analysis of two NASA applications using performance tools like Tuning and Analysis Utilities (TAU) and SGI MPInside. MITgcmUV and OVERFLOW are two production-quality applications used extensively by scientists and engineers at NASA. MITgcmUV is a global ocean simulation model, developed by the Estimating the Circulation and Climate of the Ocean (ECCO) Consortium, for solving the fluid equations of motion using the hydrostatic approximation. OVERFLOW is a general-purpose Navier-Stokes solver for computational fluid dynamics (CFD) problems. Using these tools, we analyze the MPI functions (MPI_Sendrecv, MPI_Bcast, MPI_Reduce, MPI_Allreduce, MPI_Barrier, etc.) with respect to message size of each rank, time consumed by each function, and how ranks communicate. MPI communication is further analyzed by studying the performance of MPI functions used in these two applications as a function of message size and number of cores. Finally, we present the compute time, communication time, and I/O time as a function of the number of cores.
Al-Balas, Qosay A.; Amawi, Haneen A.; Hassan, Mohammad A.; Qandil, Amjad M.; Almaaytah, Ammar M.; Mhaidat, Nizar M.
2013-01-01
Farnesyltransferase enzyme (FTase) is considered an essential enzyme in the Ras signaling pathway associated with cancer. Thus, designing inhibitors for this enzyme might lead to the discovery of compounds with effective anticancer activity. In an attempt to obtain effective FTase inhibitors, pharmacophore hypotheses were generated using structure-based and ligand-based approaches built in Discovery Studio v3.1. Knowing the presence of the zinc feature is essential for inhibitor’s binding to the active site of FTase enzyme; further customization was applied to include this feature in the generated pharmacophore hypotheses. These pharmacophore hypotheses were thoroughly validated using various procedures such as ROC analysis and ligand pharmacophore mapping. The validated pharmacophore hypotheses were used to screen 3D databases to identify possible hits. Those which were both high ranked and showed sufficient ability to bind the zinc feature in active site, were further refined by applying drug-like criteria such as Lipiniski’s “rule of five” and ADMET filters. Finally, the two candidate compounds (ZINC39323901 and ZINC01034774) were allowed to dock using CDOCKER and GOLD in the active site of FTase enzyme to optimize hit selection. PMID:24276257
Al-Balas, Qosay A; Amawi, Haneen A; Hassan, Mohammad A; Qandil, Amjad M; Almaaytah, Ammar M; Mhaidat, Nizar M
2013-05-27
Farnesyltransferase enzyme (FTase) is considered an essential enzyme in the Ras signaling pathway associated with cancer. Thus, designing inhibitors for this enzyme might lead to the discovery of compounds with effective anticancer activity. In an attempt to obtain effective FTase inhibitors, pharmacophore hypotheses were generated using structure-based and ligand-based approaches built in Discovery Studio v3.1. Knowing the presence of the zinc feature is essential for inhibitor's binding to the active site of FTase enzyme; further customization was applied to include this feature in the generated pharmacophore hypotheses. These pharmacophore hypotheses were thoroughly validated using various procedures such as ROC analysis and ligand pharmacophore mapping. The validated pharmacophore hypotheses were used to screen 3D databases to identify possible hits. Those which were both high ranked and showed sufficient ability to bind the zinc feature in active site, were further refined by applying drug-like criteria such as Lipiniski's "rule of five" and ADMET filters. Finally, the two candidate compounds (ZINC39323901 and ZINC01034774) were allowed to dock using CDOCKER and GOLD in the active site of FTase enzyme to optimize hit selection.
A gastroenterological list for the millennium.
Janowitz, H D; Abittan, C S; Fiedler, L M
1999-12-01
To determine the 10 most significant advances in gastroenterology during this century as we approach the millennium, the authors polled 50 distinguished active clinicians and leading researchers in the field, including workers in liver disease and the pathology of the gut and its associated glands. Forty-five persons (90%) responded and listed 58 different items. These were then organized into four groups: group A, with 10 categories that received between 42 and 11 votes; group B, with 10 categories that received between 10 and 3 votes; group C, with 3 items receiving 2 votes each; and group D, with the remaining 14 items receiving 1 vote each. The respondents did not indicate their choices in rank order. The top 10 leading choices (group A, containing between 42 and 11 votes) included Helicobacter pylori, fiberoptic endoscopy, gastrointestinal imaging by radiograph and computed tomographic scan, Australia antigen including vaccines for hepatitis A and B, the molecular basis of colon cancer, liver transplantation, laparoscopic-assisted surgery, therapy for peptic ulcer disease including H2-receptor antagonists and proton pump inhibitors, the discovery of gastrointestinal hormones beginning with secretin, and lastly the discovery of the role for gluten in celiac disease.
NASA Astrophysics Data System (ADS)
Annapoorani, Angusamy; Umamageswaran, Venugopal; Parameswari, Radhakrishnan; Pandian, Shunmugiah Karutha; Ravi, Arumugam Veera
2012-09-01
Drugs have been discovered in the past mainly either by identification of active components from traditional remedies or by unpredicted discovery. A key motivation for the study of structure based virtual screening is the exploitation of such information to design targeted drugs. In this study, structure based virtual screening was used in search for putative quorum sensing inhibitors (QSI) of Pseudomonas aeruginosa. The virtual screening programme Glide version 5.5 was applied to screen 1,920 natural compounds/drugs against LasR and RhlR receptor proteins of P. aeruginosa. Based on the results of in silico docking analysis, five top ranking compounds namely rosmarinic acid, naringin, chlorogenic acid, morin and mangiferin were subjected to in vitro bioassays against laboratory strain PAO1 and two more antibiotic resistant clinical isolates, P. aeruginosa AS1 (GU447237) and P. aeruginosa AS2 (GU447238). Among the five compounds studied, except mangiferin other four compounds showed significant inhibition in the production of protease, elastase and hemolysin. Further, all the five compounds potentially inhibited the biofilm related behaviours. This interaction study provided promising ligands to inhibit the quorum sensing (QS) mediated virulence factors production in P. aeruginosa.
A Mixed-Method Exploration of Functioning in Safe Schools/Healthy Students Partnerships
ERIC Educational Resources Information Center
Merrill, Marina L.; Taylor, Nicole L.; Martin, Alison J.; Maxim, Lauren A.; D'Ambrosio, Ryan; Gabriel, Roy M.; Wendt, Staci J.; Mannix, Danyelle; Wells, Michael E.
2012-01-01
This paper presents a mixed-method approach to measuring the functioning of Safe Schools/Healthy Students (SS/HS) Initiative partnerships. The SS/HS national evaluation team developed a survey to collect partners' perceptions of functioning within SS/HS partnerships. Average partnership functioning scores were used to rank each site from lowest to…
2012-01-01
Background Development and application of transcriptomics-based gene classifiers for ecotoxicological applications lag far behind those of biomedical sciences. Many such classifiers discovered thus far lack vigorous statistical and experimental validations. A combination of genetic algorithm/support vector machines and genetic algorithm/K nearest neighbors was used in this study to search for classifiers of endocrine-disrupting chemicals (EDCs) in zebrafish. Searches were conducted on both tissue-specific and tissue-combined datasets, either across the entire transcriptome or within individual transcription factor (TF) networks previously linked to EDC effects. Candidate classifiers were evaluated by gene set enrichment analysis (GSEA) on both the original training data and a dedicated validation dataset. Results Multi-tissue dataset yielded no classifiers. Among the 19 chemical-tissue conditions evaluated, the transcriptome-wide searches yielded classifiers for six of them, each having approximately 20 to 30 gene features unique to a condition. Searches within individual TF networks produced classifiers for 15 chemical-tissue conditions, each containing 100 or fewer top-ranked gene features pooled from those of multiple TF networks and also unique to each condition. For the training dataset, 10 out of 11 classifiers successfully identified the gene expression profiles (GEPs) of their targeted chemical-tissue conditions by GSEA. For the validation dataset, classifiers for prochloraz-ovary and flutamide-ovary also correctly identified the GEPs of corresponding conditions while no classifier could predict the GEP from prochloraz-brain. Conclusions The discrepancies in the performance of these classifiers were attributed in part to varying data complexity among the conditions, as measured to some degree by Fisher’s discriminant ratio statistic. This variation in data complexity could likely be compensated by adjusting sample size for individual chemical-tissue conditions, thus suggesting a need for a preliminary survey of transcriptomic responses before launching a full scale classifier discovery effort. Classifier discovery based on individual TF networks could yield more mechanistically-oriented biomarkers. GSEA proved to be a flexible and effective tool for application of gene classifiers but a similar and more refined algorithm, connectivity mapping, should also be explored. The distribution characteristics of classifiers across tissues, chemicals, and TF networks suggested a differential biological impact among the EDCs on zebrafish transcriptome involving some basic cellular functions. PMID:22849515
Construction of normal-regular decisions of Bessel typed special system
NASA Astrophysics Data System (ADS)
Tasmambetov, Zhaksylyk N.; Talipova, Meiramgul Zh.
2017-09-01
Studying a special system of differential equations in the separate production of the second order is solved by the degenerate hypergeometric function reducing to the Bessel functions of two variables. To construct a solution of this system near regular and irregular singularities, we use the method of Frobenius-Latysheva applying the concepts of rank and antirank. There is proved the basic theorem that establishes the existence of four linearly independent solutions of studying system type of Bessel. To prove the existence of normal-regular solutions we establish necessary conditions for the existence of such solutions. The existence and convergence of a normally regular solution are shown using the notion of rank and antirank.
Estimating sales and sales market share from sales rank data for consumer appliances
NASA Astrophysics Data System (ADS)
Touzani, Samir; Van Buskirk, Robert
2016-06-01
Our motivation in this work is to find an adequate probability distribution to fit sales volumes of different appliances. This distribution allows for the translation of sales rank into sales volume. This paper shows that the log-normal distribution and specifically the truncated version are well suited for this purpose. We demonstrate that using sales proxies derived from a calibrated truncated log-normal distribution function can be used to produce realistic estimates of market average product prices, and product attributes. We show that the market averages calculated with the sales proxies derived from the calibrated, truncated log-normal distribution provide better market average estimates than sales proxies estimated with simpler distribution functions.
The international translational regenerative medicine center.
Alexis, Mardi de Veuve; Grinnemo, Karl-Henrik; Jove, Richard
2012-11-01
The International Translational Regenerative Medicine Center, an organizing sponsor of the World Stem Cell Summit 2012, is a global initiative established in 2011 by founding partners Karolinska Institutet (Stockholm, Sweden) and Beckman Research Institute at City of Hope (CA, USA) with a mission to facilitate the acceleration of translational research and medicine on a global scale. Karolinska Institutet, home of the Nobel Prize in Medicine or Physiology, is one of the most prestigious medical research institutions in the world. The Beckman Research Institute/City of Hope is ranked among the leading NIH-designated comprehensive cancer research and treatment institutions in the USA, has the largest academic GMP facility and advanced drug discovery capability, and is a pioneer in diabetes research and treatment.
Drug discovery strategies to outer membrane targets in Gram-negative pathogens.
Brown, Dean G
2016-12-15
This review will cover selected recent examples of drug discovery strategies which target the outer membrane (OM) of Gram-negative bacteria either by disruption of outer membrane function or by inhibition of essential gene products necessary for outer membrane assembly. Significant advances in pathway elucidation, structural biology and molecular inhibitor designs have created new opportunities for drug discovery within this target-class space. Copyright © 2016 Elsevier Ltd. All rights reserved.
Arrayed antibody library technology for therapeutic biologic discovery.
Bentley, Cornelia A; Bazirgan, Omar A; Graziano, James J; Holmes, Evan M; Smider, Vaughn V
2013-03-15
Traditional immunization and display antibody discovery methods rely on competitive selection amongst a pool of antibodies to identify a lead. While this approach has led to many successful therapeutic antibodies, targets have been limited to proteins which are easily purified. In addition, selection driven discovery has produced a narrow range of antibody functionalities focused on high affinity antagonism. We review the current progress in developing arrayed protein libraries for screening-based, rather than selection-based, discovery. These single molecule per microtiter well libraries have been screened in multiplex formats against both purified antigens and directly against targets expressed on the cell surface. This facilitates the discovery of antibodies against therapeutically interesting targets (GPCRs, ion channels, and other multispanning membrane proteins) and epitopes that have been considered poorly accessible to conventional discovery methods. Copyright © 2013. Published by Elsevier Inc.
A User’s Guide to BISAM (BIvariate SAMple): The Bivariate Data Modeling Program.
1983-08-01
method for the null case specified and is then used to form the bivariate density-quantile function as described in section 4. If D(U) in stage...employed assigns average ranks for tied observations. Other methods for assigning ranks to tied observations are often employed but are not attempted...34 €.. . . . .. . .. . . . ,.. . ,•. . . ... *.., .. , - . . . . - - . . .. - -. .. observations will weaken the results obtained since underlying continuous distributions are assumed. One should avoid such situations if possible. Two methods
2008-06-01
The most common outranking methods are the preference ranking organization method for enrichment evaluation ( PROMETHEE ) and the elimination and...Brans and Ph. Vincke, “A Preference Ranking Organization Method: (The PROMETHEE Method for Multiple Criteria Decision-Making),” Management Science 31... PROMETHEE ). This method needs a preference function for each criterion to compute the degree of preference.72 “The credibility of the outranking
Respiratory symptoms, lung function, and sensitisation to flour in a British bakery.
Musk, A W; Venables, K M; Crook, B; Nunn, A J; Hawkins, R; Crook, G D; Graneek, B J; Tee, R D; Farrer, N; Johnson, D A
1989-01-01
A survey of dust exposure, respiratory symptoms, lung function, and response to skin prick tests was conducted in a modern British bakery. Of the 318 bakery employees, 279 (88%) took part. Jobs were ranked from 0 to 10 by perceived dustiness and this ranking correlated well with total dust concentration measured in 79 personal dust samples. Nine samples had concentrations greater than 10 mg/m3, the exposure limit for nuisance dust. All participants completed a self administered questionnaire on symptoms and their relation to work. FEV1 and FVC were measured by a dry wedge spirometer and bronchial reactivity to methacholine was estimated. Skin prick tests were performed with three common allergens and with 11 allergens likely to be found in bakery dust, including mites and moulds. Of the participants in the main exposure group, 35% reported chest symptoms which in 13% were work related. The corresponding figures for nasal symptoms were 38% and 19%. Symptoms, lung function, bronchial reactivity, and response to skin prick tests were related to current or past exposure to dust using logistic or linear regression analysis as appropriate. Exposure rank was significantly associated with most of the response variables studied. The study shows that respiratory symptoms and sensitisation are common, even in a modern bakery. PMID:2789967
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katz, Francine S.; Pecic, Stevan; Tran, Timothy H.
Acetylcholinesterase (AChE) that has been covalently inhibited by organophosphate compounds (OPCs), such as nerve agents and pesticides, has traditionally been reactivated by using nucleophilic oximes. There is, however, a clearly recognized need for new classes of compounds with the ability to reactivate inhibited AChE with improved in vivo efficacy. Here we describe our discovery of new functional groups—Mannich phenols and general bases—that are capable of reactivating OPC-inhibited AChE more efficiently than standard oximes and we describe the cooperative mechanism by which these functionalities are delivered to the active site. These discoveries, supported by preliminary in vivo results and crystallographic data,more » significantly broaden the available approaches for reactivation of AChE.« less
Cross ranking of cities and regions: population versus income
NASA Astrophysics Data System (ADS)
Cerqueti, Roy; Ausloos, Marcel
2015-07-01
This paper explores the relationship between the inner economical structure of communities and their population distribution through a rank-rank analysis of official data, along statistical physics ideas within two techniques. The data is taken on Italian cities. The analysis is performed both at a global (national) and at a more local (regional) level in order to distinguish ‘macro’ and ‘micro’ aspects. First, the rank-size rule is found not to be a standard power law, as in many other studies, but a doubly decreasing power law. Next, the Kendall τ and the Spearman ρ rank correlation coefficients which measure pair concordance and the correlation between fluctuations in two rankings, respectively,—as a correlation function does in thermodynamics, are calculated for finding rank correlation (if any) between demography and wealth. Results show non only global disparities for the whole (country) set, but also (regional) disparities, when comparing the number of cities in regions, the number of inhabitants in cities and that in regions, as well as when comparing the aggregated tax income of the cities and that of regions. Different outliers are pointed out and justified. Interestingly, two classes of cities in the country and two classes of regions in the country are found. ‘Common sense’ social, political, and economic considerations sustain the findings. More importantly, the methods show that they allow to distinguish communities, very clearly, when specific criteria are numerically sound. A specific modeling for the findings is presented, i.e. for the doubly decreasing power law and the two phase system, based on statistics theory, e.g. urn filling. The model ideas can be expected to hold when similar rank relationship features are observed in fields. It is emphasized that the analysis makes more sense than one through a Pearson Π value-value correlation analysis
ERIC Educational Resources Information Center
Sanchez,Gilbert; Cali, Alfred J.
This study was designed to compare time allocations to major functions actually performed and idealized by bilingual administrators and principals; to rank specific procedures used in accomplishing these functions; to determine staffing patterns, and program and organizational characteristics; and to isolate personal/professional demographics of…
Closed-Loop Multitarget Optimization for Discovery of New Emulsion Polymerization Recipes
2015-01-01
Self-optimization of chemical reactions enables faster optimization of reaction conditions or discovery of molecules with required target properties. The technology of self-optimization has been expanded to discovery of new process recipes for manufacture of complex functional products. A new machine-learning algorithm, specifically designed for multiobjective target optimization with an explicit aim to minimize the number of “expensive” experiments, guides the discovery process. This “black-box” approach assumes no a priori knowledge of chemical system and hence particularly suited to rapid development of processes to manufacture specialist low-volume, high-value products. The approach was demonstrated in discovery of process recipes for a semibatch emulsion copolymerization, targeting a specific particle size and full conversion. PMID:26435638
Factors affecting reproducibility between genome-scale siRNA-based screens
Barrows, Nicholas J.; Le Sommer, Caroline; Garcia-Blanco, Mariano A.; Pearson, James L.
2011-01-01
RNA interference-based screening is a powerful new genomic technology which addresses gene function en masse. To evaluate factors influencing hit list composition and reproducibility, we performed two identically designed small interfering RNA (siRNA)-based, whole genome screens for host factors supporting yellow fever virus infection. These screens represent two separate experiments completed five months apart and allow the direct assessment of the reproducibility of a given siRNA technology when performed in the same environment. Candidate hit lists generated by sum rank, median absolute deviation, z-score, and strictly standardized mean difference were compared within and between whole genome screens. Application of these analysis methodologies within a single screening dataset using a fixed threshold equivalent to a p-value ≤ 0.001 resulted in hit lists ranging from 82 to 1,140 members and highlighted the tremendous impact analysis methodology has on hit list composition. Intra- and inter-screen reproducibility was significantly influenced by the analysis methodology and ranged from 32% to 99%. This study also highlighted the power of testing at least two independent siRNAs for each gene product in primary screens. To facilitate validation we conclude by suggesting methods to reduce false discovery at the primary screening stage. In this study we present the first comprehensive comparison of multiple analysis strategies, and demonstrate the impact of the analysis methodology on the composition of the “hit list”. Therefore, we propose that the entire dataset derived from functional genome-scale screens, especially if publicly funded, should be made available as is done with data derived from gene expression and genome-wide association studies. PMID:20625183
Machine Learning Helps Identify CHRONO as a Circadian Clock Component
Venkataraman, Anand; Ramanathan, Chidambaram; Kavakli, Ibrahim H.; Hughes, Michael E.; Baggs, Julie E.; Growe, Jacqueline; Liu, Andrew C.; Kim, Junhyong; Hogenesch, John B.
2014-01-01
Over the last decades, researchers have characterized a set of “clock genes” that drive daily rhythms in physiology and behavior. This arduous work has yielded results with far-reaching consequences in metabolic, psychiatric, and neoplastic disorders. Recent attempts to expand our understanding of circadian regulation have moved beyond the mutagenesis screens that identified the first clock components, employing higher throughput genomic and proteomic techniques. In order to further accelerate clock gene discovery, we utilized a computer-assisted approach to identify and prioritize candidate clock components. We used a simple form of probabilistic machine learning to integrate biologically relevant, genome-scale data and ranked genes on their similarity to known clock components. We then used a secondary experimental screen to characterize the top candidates. We found that several physically interact with known clock components in a mammalian two-hybrid screen and modulate in vitro cellular rhythms in an immortalized mouse fibroblast line (NIH 3T3). One candidate, Gene Model 129, interacts with BMAL1 and functionally represses the key driver of molecular rhythms, the BMAL1/CLOCK transcriptional complex. Given these results, we have renamed the gene CHRONO (computationally highlighted repressor of the network oscillator). Bi-molecular fluorescence complementation and co-immunoprecipitation demonstrate that CHRONO represses by abrogating the binding of BMAL1 to its transcriptional co-activator CBP. Most importantly, CHRONO knockout mice display a prolonged free-running circadian period similar to, or more drastic than, six other clock components. We conclude that CHRONO is a functional clock component providing a new layer of control on circadian molecular dynamics. PMID:24737000
2014-01-01
Background Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. Results We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. Conclusions Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality. PMID:24980787
Teramoto, Reiji; Saito, Chiaki; Funahashi, Shin-ichi
2014-06-30
Knockdown or overexpression of genes is widely used to identify genes that play important roles in many aspects of cellular functions and phenotypes. Because next-generation sequencing generates high-throughput data that allow us to detect genes, it is important to identify genes that drive functional and phenotypic changes of cells. However, conventional methods rely heavily on the assumption of normality and they often give incorrect results when the assumption is not true. To relax the Gaussian assumption in causal inference, we introduce the non-paranormal method to test conditional independence in the PC-algorithm. Then, we present the non-paranormal intervention-calculus when the directed acyclic graph (DAG) is absent (NPN-IDA), which incorporates the cumulative nature of effects through a cascaded pathway via causal inference for ranking causal genes against a phenotype with the non-paranormal method for estimating DAGs. We demonstrate that causal inference with the non-paranormal method significantly improves the performance in estimating DAGs on synthetic data in comparison with the original PC-algorithm. Moreover, we show that NPN-IDA outperforms the conventional methods in exploring regulators of the flowering time in Arabidopsis thaliana and regulators that control the browning of white adipocytes in mice. Our results show that performance improvement in estimating DAGs contributes to an accurate estimation of causal effects. Although the simplest alternative procedure was used, our proposed method enables us to design efficient intervention experiments and can be applied to a wide range of research purposes, including drug discovery, because of its generality.
Mazel, Florent; Guilhaumon, François; Mouquet, Nicolas; Devictor, Vincent; Gravel, Dominique; Renaud, Julien; Cianciaruso, Marcus Vinicius; Loyola, Rafael Dias; Diniz-Filho, José Alexandre Felizola; Mouillot, David; Thuiller, Wilfried
2014-08-01
To define biome-scale hotspots of phylogenetic and functional mammalian biodiversity (PD and FD, respectively) and compare them to 'classical' hotspots based on species richness (SR) only. Global. SR, PD & FD were computed for 782 terrestrial ecoregions using distribution ranges of 4616 mammalian species. We used a set of comprehensive diversity indices unified by a recent framework that incorporates the species relative coverage in each ecoregion. We build large-scale multifaceted diversity-area relationships to rank ecoregions according to their levels of biodiversity while accounting for the effect of area on each diversity facet. Finally we defined hotspots as the top-ranked ecoregions. While ignoring species relative coverage led to a relative good congruence between biome top ranked SR, PD and FD hotspots, ecoregions harboring a rich and abundantly represented evolutionary history and functional diversity did not match with top ranked ecoregions defined by species richness. More importantly PD and FD hotspots showed important spatial mismatches. We also found that FD and PD generally reached their maximum values faster than species richness as a function of area. The fact that PD/FD reach faster their maximal value than SR may suggest that the two former facets might be less vulnerable to habitat loss than the latter. While this point is expected, it is the first time that it is quantified at global scale and should have important consequences in conservation. Incorporating species relative coverage into the delineation of multifaceted hotspots of diversity lead to weak congruence between SR, PD and FD hotspots. This means that maximizing species number may fail at preserving those nodes (in the phylogenetic or functional tree) that are relatively abundant in the ecoregion. As a consequence it may be of prime importance to adopt a multifaceted biodiversity perspective to inform conservation strategies at global scale.
Mazel, Florent; Guilhaumon, François; Mouquet, Nicolas; Devictor, Vincent; Gravel, Dominique; Renaud, Julien; Cianciaruso, Marcus Vinicius; Loyola, Rafael Dias; Diniz-Filho, José Alexandre Felizola; Mouillot, David; Thuiller, Wilfried
2014-01-01
Aim To define biome-scale hotspots of phylogenetic and functional mammalian biodiversity (PD and FD, respectively) and compare them to ‘classical’ hotspots based on species richness (SR) only. Location Global Methods SR, PD & FD were computed for 782 terrestrial ecoregions using distribution ranges of 4616 mammalian species. We used a set of comprehensive diversity indices unified by a recent framework that incorporates the species relative coverage in each ecoregion. We build large-scale multifaceted diversity-area relationships to rank ecoregions according to their levels of biodiversity while accounting for the effect of area on each diversity facet. Finally we defined hotspots as the top-ranked ecoregions. Results While ignoring species relative coverage led to a relative good congruence between biome top ranked SR, PD and FD hotspots, ecoregions harboring a rich and abundantly represented evolutionary history and functional diversity did not match with top ranked ecoregions defined by species richness. More importantly PD and FD hotspots showed important spatial mismatches. We also found that FD and PD generally reached their maximum values faster than species richness as a function of area. Main conclusions The fact that PD/FD reach faster their maximal value than SR may suggest that the two former facets might be less vulnerable to habitat loss than the latter. While this point is expected, it is the first time that it is quantified at global scale and should have important consequences in conservation. Incorporating species relative coverage into the delineation of multifaceted hotspots of diversity lead to weak congruence between SR, PD and FD hotspots. This means that maximizing species number may fail at preserving those nodes (in the phylogenetic or functional tree) that are relatively abundant in the ecoregion. As a consequence it may be of prime importance to adopt a multifaceted biodiversity perspective to inform conservation strategies at global scale. PMID:25071413
Vishnyakova, Dina; Pasche, Emilie; Ruch, Patrick
2012-01-01
We report on the original integration of an automatic text categorization pipeline, so-called ToxiCat (Toxicogenomic Categorizer), that we developed to perform biomedical documents classification and prioritization in order to speed up the curation of the Comparative Toxicogenomics Database (CTD). The task can be basically described as a binary classification task, where a scoring function is used to rank a selected set of articles. Then components of a question-answering system are used to extract CTD-specific annotations from the ranked list of articles. The ranking function is generated using a Support Vector Machine, which combines three main modules: an information retrieval engine for MEDLINE (EAGLi), a gene normalization service (NormaGene) developed for a previous BioCreative campaign and finally, a set of answering components and entity recognizer for diseases and chemicals. The main components of the pipeline are publicly available both as web application and web services. The specific integration performed for the BioCreative competition is available via a web user interface at http://pingu.unige.ch:8080/Toxicat.
Role of RANKL (TNFSF11)-dependent osteopetrosis in the dental phenotype of Msx2 null mutant mice.
Castaneda, Beatriz; Simon, Yohann; Ferbus, Didier; Robert, Benoit; Chesneau, Julie; Mueller, Christopher; Berdal, Ariane; Lézot, Frédéric
2013-01-01
The MSX2 homeoprotein is implicated in all aspects of craniofacial skeletal development. During postnatal growth, MSX2 is expressed in all cells involved in mineralized tissue formation and plays a role in their differentiation and function. Msx2 null (Msx2 (-/-)) mice display complex craniofacial skeleton abnormalities with bone and tooth defects. A moderate form osteopetrotic phenotype is observed, along with decreased expression of RANKL (TNFSF11), the main osteoclast-differentiating factor. In order to elucidate the role of such an osteopetrosis in the Msx2 (-/-) mouse dental phenotype, a bone resorption rescue was performed by mating Msx2 (-/-) mice with a transgenic mouse line overexpressing Rank (Tnfrsf11a). Msx2 (-/-) Rank(Tg) mice had significant improvement in the molar phenotype, while incisor epithelium defects were exacerbated in the enamel area, with formation of massive osteolytic tumors. Although compensation for RANKL loss of function could have potential as a therapy for osteopetrosis, but in Msx2 (-/-) mice, this approach via RANK overexpression in monocyte-derived lineages, amplified latent epithelial tumor development in the peculiar continuously growing incisor.
Kuzmickienė, Jurgita; Kaubrys, Gintaras
2016-10-08
BACKGROUND The primary manifestation of Alzheimer's disease (AD) is decline in memory. Dysexecutive symptoms have tremendous impact on functional activities and quality of life. Data regarding frontal-executive dysfunction in mild AD are controversial. The aim of this study was to assess the presence and specific features of executive dysfunction in mild AD based on Cambridge Neuropsychological Test Automated Battery (CANTAB) results. MATERIAL AND METHODS Fifty newly diagnosed, treatment-naïve, mild, late-onset AD patients (MMSE ≥20, AD group) and 25 control subjects (CG group) were recruited in this prospective, cross-sectional study. The CANTAB tests CRT, SOC, PAL, SWM were used for in-depth cognitive assessment. Comparisons were performed using the t test or Mann-Whitney U test, as appropriate. Correlations were evaluated by Pearson r or Spearman R. Statistical significance was set at p<0.05. RESULTS AD and CG groups did not differ according to age, education, gender, or depression. Few differences were found between groups in the SOC test for performance measures: Mean moves (minimum 3 moves): AD (Rank Sum=2227), CG (Rank Sum=623), p<0.001. However, all SOC test time measures differed significantly between groups: SOC Mean subsequent thinking time (4 moves): AD (Rank Sum=2406), CG (Rank Sum=444), p<0.001. Correlations were weak between executive function (SOC) and episodic/working memory (PAL, SWM) (R=0.01-0.38) or attention/psychomotor speed (CRT) (R=0.02-0.37). CONCLUSIONS Frontal-executive functions are impaired in mild AD patients. Executive dysfunction is highly prominent in time measures, but minimal in performance measures. Executive disorders do not correlate with a decline in episodic and working memory or psychomotor speed in mild AD.
High-resolution dynamic 31 P-MRSI using a low-rank tensor model.
Ma, Chao; Clifford, Bryan; Liu, Yuchi; Gu, Yuning; Lam, Fan; Yu, Xin; Liang, Zhi-Pei
2017-08-01
To develop a rapid 31 P-MRSI method with high spatiospectral resolution using low-rank tensor-based data acquisition and image reconstruction. The multidimensional image function of 31 P-MRSI is represented by a low-rank tensor to capture the spatial-spectral-temporal correlations of data. A hybrid data acquisition scheme is used for sparse sampling, which consists of a set of "training" data with limited k-space coverage to capture the subspace structure of the image function, and a set of sparsely sampled "imaging" data for high-resolution image reconstruction. An explicit subspace pursuit approach is used for image reconstruction, which estimates the bases of the subspace from the "training" data and then reconstructs a high-resolution image function from the "imaging" data. We have validated the feasibility of the proposed method using phantom and in vivo studies on a 3T whole-body scanner and a 9.4T preclinical scanner. The proposed method produced high-resolution static 31 P-MRSI images (i.e., 6.9 × 6.9 × 10 mm 3 nominal resolution in a 15-min acquisition at 3T) and high-resolution, high-frame-rate dynamic 31 P-MRSI images (i.e., 1.5 × 1.5 × 1.6 mm 3 nominal resolution, 30 s/frame at 9.4T). Dynamic spatiospectral variations of 31 P-MRSI signals can be efficiently represented by a low-rank tensor. Exploiting this mathematical structure for data acquisition and image reconstruction can lead to fast 31 P-MRSI with high resolution, frame-rate, and SNR. Magn Reson Med 78:419-428, 2017. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.
Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe
2012-04-01
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.
Theodosiou, T; Vizirianakis, I S; Angelis, L; Tsaftaris, A; Darzentas, N
2011-12-01
PubMed is the most widely used database of biomedical literature. To the detriment of the user though, the ranking of the documents retrieved for a query is not content-based, and important semantic information in the form of assigned Medical Subject Headings (MeSH) terms is not readily presented or productively utilized. The motivation behind this work was the discovery of unanticipated information through the appropriate ranking of MeSH term pairs and, indirectly, documents. Such information can be useful in guiding novel research and following promising trends. A web-based tool, called MeSHy, was developed implementing a mainly statistical algorithm. The algorithm takes into account the frequencies of occurrences, concurrences, and the semantic similarities of MeSH terms in retrieved PubMed documents to create MeSH term pairs. These are then scored and ranked, focusing on their unexpectedly frequent or infrequent occurrences. MeSHy presents results through an online interactive interface facilitating further manipulation through filtering and sorting. The results themselves include the MeSH term pairs, along with MeSH categories, the score, and document IDs, all of which are hyperlinked for convenience. To highlight the applicability of the tool, we report the findings of an expert in the pharmacology field on querying the molecularly-targeted drug imatinib and nutrition-related flavonoids. To the best of our knowledge, MeSHy is the first publicly available tool able to directly provide such a different perspective on the complex nature of published work. Implemented in Perl and served by Apache2 at http://bat.ina.certh.gr/tools/meshy/ with all major browsers supported. Copyright © 2011 Elsevier Inc. All rights reserved.
Low-rank coal oil agglomeration product and process
Knudson, Curtis L.; Timpe, Ronald C.; Potas, Todd A.; DeWall, Raymond A.; Musich, Mark A.
1992-01-01
A selectively-sized, raw, low-rank coal is processed to produce a low ash and relative water-free agglomerate with an enhanced heating value and a hardness sufficient to produce a non-decrepitating, shippable fuel. The low-rank coal is treated, under high shear conditions, in the first stage to cause ash reduction and subsequent surface modification which is necessary to facilitate agglomerate formation. In the second stage the treated low-rank coal is contacted with bridging and binding oils under low shear conditions to produce agglomerates of selected size. The bridging and binding oils may be coal or petroleum derived. The process incorporates a thermal deoiling step whereby the bridging oil may be completely or partially recovered from the agglomerate; whereas, partial recovery of the bridging oil functions to leave as an agglomerate binder, the heavy constituents of the bridging oil. The recovered oil is suitable for recycling to the agglomeration step or can serve as a value-added product.
Low-rank coal oil agglomeration product and process
Knudson, C.L.; Timpe, R.C.; Potas, T.A.; DeWall, R.A.; Musich, M.A.
1992-11-10
A selectively-sized, raw, low-rank coal is processed to produce a low ash and relative water-free agglomerate with an enhanced heating value and a hardness sufficient to produce a non-degradable, shippable fuel. The low-rank coal is treated, under high shear conditions, in the first stage to cause ash reduction and subsequent surface modification which is necessary to facilitate agglomerate formation. In the second stage the treated low-rank coal is contacted with bridging and binding oils under low shear conditions to produce agglomerates of selected size. The bridging and binding oils may be coal or petroleum derived. The process incorporates a thermal deoiling step whereby the bridging oil may be completely or partially recovered from the agglomerate; whereas, partial recovery of the bridging oil functions to leave as an agglomerate binder, the heavy constituents of the bridging oil. The recovered oil is suitable for recycling to the agglomeration step or can serve as a value-added product.
Ranking and averaging independent component analysis by reproducibility (RAICAR).
Yang, Zhi; LaConte, Stephen; Weng, Xuchu; Hu, Xiaoping
2008-06-01
Independent component analysis (ICA) is a data-driven approach that has exhibited great utility for functional magnetic resonance imaging (fMRI). Standard ICA implementations, however, do not provide the number and relative importance of the resulting components. In addition, ICA algorithms utilizing gradient-based optimization give decompositions that are dependent on initialization values, which can lead to dramatically different results. In this work, a new method, RAICAR (Ranking and Averaging Independent Component Analysis by Reproducibility), is introduced to address these issues for spatial ICA applied to fMRI. RAICAR utilizes repeated ICA realizations and relies on the reproducibility between them to rank and select components. Different realizations are aligned based on correlations, leading to aligned components. Each component is ranked and thresholded based on between-realization correlations. Furthermore, different realizations of each aligned component are selectively averaged to generate the final estimate of the given component. Reliability and accuracy of this method are demonstrated with both simulated and experimental fMRI data. Copyright 2007 Wiley-Liss, Inc.
A parallel computer implementation of fast low-rank QR approximation of the Biot-Savart law
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, D A; Fasenfest, B J; Stowell, M L
2005-11-07
In this paper we present a low-rank QR method for evaluating the discrete Biot-Savart law on parallel computers. It is assumed that the known current density and the unknown magnetic field are both expressed in a finite element expansion, and we wish to compute the degrees-of-freedom (DOF) in the basis function expansion of the magnetic field. The matrix that maps the current DOF to the field DOF is full, but if the spatial domain is properly partitioned the matrix can be written as a block matrix, with blocks representing distant interactions being low rank and having a compressed QR representation.more » The matrix partitioning is determined by the number of processors, the rank of each block (i.e. the compression) is determined by the specific geometry and is computed dynamically. In this paper we provide the algorithmic details and present computational results for large-scale computations.« less
Robustness of weighted networks
NASA Astrophysics Data System (ADS)
Bellingeri, Michele; Cassi, Davide
2018-01-01
Complex network response to node loss is a central question in different fields of network science because node failure can cause the fragmentation of the network, thus compromising the system functioning. Previous studies considered binary networks where the intensity (weight) of the links is not accounted for, i.e. a link is either present or absent. However, in real-world networks the weights of connections, and thus their importance for network functioning, can be widely different. Here, we analyzed the response of real-world and model networks to node loss accounting for link intensity and the weighted structure of the network. We used both classic binary node properties and network functioning measure, introduced a weighted rank for node importance (node strength), and used a measure for network functioning that accounts for the weight of the links (weighted efficiency). We find that: (i) the efficiency of the attack strategies changed using binary or weighted network functioning measures, both for real-world or model networks; (ii) in some cases, removing nodes according to weighted rank produced the highest damage when functioning was measured by the weighted efficiency; (iii) adopting weighted measure for the network damage changed the efficacy of the attack strategy with respect the binary analyses. Our results show that if the weighted structure of complex networks is not taken into account, this may produce misleading models to forecast the system response to node failure, i.e. consider binary links may not unveil the real damage induced in the system. Last, once weighted measures are introduced, in order to discover the best attack strategy, it is important to analyze the network response to node loss using nodes rank accounting the intensity of the links to the node.
The structure of first-ranked cluster galaxies and the radius-magnitude relation
NASA Astrophysics Data System (ADS)
Lugger, P. M.
1984-11-01
To investigate theoretical predictions for the dynamical evolution of first-ranked galaxies, a quantitative study of their properties, as a function of cluster morphology, has been carried out using photographic plates obtained with the Palomar 48 inch (1.2 m) Schmidt telescope. Surface brightness profiles to radii of several hundred kpc for 35 first-ranked cluster galaxies have been analyzed. The dispersion in the metric magnitudes of first-ranked galaxies is quite small (about 0.4 mag), which is consistent with the results of Kristian, Sandage, and Westphal (1978) as well as those of Hoessel, Gunn, and Thuan (1980) and the recent work of Schneider, Gunn, and Hoessel (1983). For the cD (supergiant elliptical) galaxy sample, the mean metric magnitude is about 0.5 mag brighter than for the non-cD galaxies. The mean de Vaucouleurs effective radius for the cD galaxy sample is 80 percent larger than that of the non-cD sample. The relation between de Vaucouleurs effective radius and magnitude determined in the present study for first-ranked galaxies, log r(e) equal to about -0.26 M + constant is consistent with the relations found for fainter galaxies by Strom and Strom (1978) as well as Wirth (1982). The residuals in radius from the mean radius-magnitude relation for first-ranked galaxies do not correlate with the Bautz-Morgan (1970) type of the cluster.
Fish cell lines as a tool for the ecotoxicity assessment and ranking of engineered nanomaterials.
Bermejo-Nogales, A; Fernández-Cruz, M L; Navas, J M
2017-11-01
Risk assessment of engineered nanomaterials (ENMs) is being hindered by the sheer production volume of these materials. In this regard, the grouping and ranking of ENMs appears as a promising strategy. Here we sought to evaluate the usefulness of in vitro systems based on fish cell lines for ranking a set of ENMs on the basis of their cytotoxicity. We used the topminnow (Poeciliopsis lucida) liver cell line (PLHC-1) and the rainbow trout (Oncorhynchus mykiss) fibroblast-like gonadal cell line (RTG-2). ENMs were obtained from the EU Joint Research Centre repository. The size frequency distribution of ENM suspensions in cell culture media was characterized. Cytotoxicity was evaluated after 24 h of exposure. PLHC-1 cells exhibited higher sensitivity to the ENMs than RTG-2 cells. ZnO-NM was found to exert toxicity mainly by altering lysosome function and metabolic activity, while multi-walled carbon nanotubes (MWCNTs) caused plasma membrane disruption at high concentrations. The hazard ranking for toxicity (ZnO-NM > MWCNT ≥ CeO 2 -NM = SiO 2 -NM) was inversely related to the ranking in size detected in culture medium. Our findings reveal the suitability of fish cell lines for establishing hazard rankings of ENMs in the framework of integrated approaches to testing and assessment. Copyright © 2017 Elsevier Inc. All rights reserved.
Ishii, Masaru
2015-06-01
Recent advances in intravital bone imaging technology has enabled us to grasp the real cellular behaviors and functions in vivo , revolutionizing the field of drug discovery for novel therapeutics against intractable bone diseases. In this chapter, I introduce various updated information on pharmacological actions of several antibone resorptive agents, which could only be derived from advanced imaging techniques, and also discuss the future perspectives of this new trend in drug discovery.
Balancing novelty with confined chemical space in modern drug discovery.
Medina-Franco, José L; Martinez-Mayorga, Karina; Meurice, Nathalie
2014-02-01
The concept of chemical space has broad applications in drug discovery. In response to the needs of drug discovery campaigns, different approaches are followed to efficiently populate, mine and select relevant chemical spaces that overlap with biologically relevant chemical spaces. This paper reviews major trends in current drug discovery and their impact on the mining and population of chemical space. We also survey different approaches to develop screening libraries with confined chemical spaces balancing physicochemical properties. In this context, the confinement is guided by criteria that can be divided in two broad categories: i) library design focused on a relevant therapeutic target or disease and ii) library design focused on the chemistry or a desired molecular function. The design and development of chemical libraries should be associated with the specific purpose of the library and the project goals. The high complexity of drug discovery and the inherent imperfection of individual experimental and computational technologies prompt the integration of complementary library design and screening approaches to expedite the identification of new and better drugs. Library design approaches including diversity-oriented synthesis, biological-oriented synthesis or combinatorial library design, to name a few, and the design of focused libraries driven by target/disease, chemical structure or molecular function are more efficient if they are guided by multi-parameter optimization. In this context, consideration of pharmaceutically relevant properties is essential for balancing novelty with chemical space in drug discovery.
[Discovery of the target genes inhibited by formic acid in Candida shehatae].
Cai, Peng; Xiong, Xujie; Xu, Yong; Yong, Qiang; Zhu, Junjun; Shiyuan, Yu
2014-01-04
At transcriptional level, the inhibitory effects of formic acid was investigated on Candida shehatae, a model yeast strain capable of fermenting xylose to ethanol. Thereby, the target genes were regulated by formic acid and the transcript profiles were discovered. On the basis of the transcriptome data of C. shehatae metabolizing glucose and xylose, the genes responsible for ethanol fermentation were chosen as candidates by the combined method of yeast metabolic pathway analysis and manual gene BLAST search. These candidates were then quantitatively detected by RQ-PCR technique to find the regulating genes under gradient doses of formic acid. By quantitative analysis of 42 candidate genes, we finally identified 10 and 5 genes as markedly down-regulated and up-regulated targets by formic acid, respectively. With regard to gene transcripts regulated by formic acid in C. shehatae, the markedly down-regulated genes ranking declines as follows: xylitol dehydrogenase (XYL2), acetyl-CoA synthetase (ACS), ribose-5-phosphate isomerase (RKI), transaldolase (TAL), phosphogluconate dehydrogenase (GND1), transketolase (TKL), glucose-6-phosphate dehydrogenase (ZWF1), xylose reductase (XYL1), pyruvate dehydrogenase (PDH) and pyruvate decarboxylase (PDC); and a declining rank for up-regulated gens as follows: fructose-bisphosphate aldolase (ALD), glucokinase (GLK), malate dehydrogenase (MDH), 6-phosphofructokinase (PFK) and alcohol dehydrogenase (ADH).
Dual function of MG53 in membrane repair and insulin signaling
Tan, Tao; Ko, Young-Gyu; Ma, Jianjie
2016-01-01
MG53 is a member of the TRIM-family protein that acts as a key component of the cell membrane repair machinery. MG53 is also an E3-ligase that ubiquinates insulin receptor substrate-1 and controls insulin signaling in skeletal muscle cells. Since its discovery in 2009, research efforts have been devoted to translate this basic discovery into clinical applications in human degenerative and metabolic diseases. This review article highlights the dual function of MG53 in cell membrane repair and insulin signaling, the mechanism that underlies the control of MG53 function, and the therapeutic value of targeting MG53 function in regenerative medicine. [BMB Reports 2016; 49(8): 414-423] PMID:27174502
Forecasting petroleum discoveries in sparsely drilled areas: Nigeria and the North Sea
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attanasi, E.D.; Root, D.H.
1988-10-01
Decline function methods for projecting future discoveries generally capture the crowding effects of wildcat wells on the discovery rate. However, these methods do not accommodate easily situations where exploration areas and horizons are expanding. In this paper, a method is presented that uses a mapping algorithm for separating these often countervailing influences. The method is applied to Nigeria and the North Sea. For an amount of future drilling equivalent to past drilling (825 wildcat wells), future discoveries (in resources found) for Nigeria are expected to decline by 68% per well but still amount to 8.5 billion barrels of oil equivalentmore » (BOE). Similarly, for the total North Sea for an equivalent amount and mix among areas of past drilling (1322 wildcat wells), future discoveries are expected to amount to 17.9 billion BOE, whereas the average discovery rate per well is expected to decline by 71%.« less
Forecasting petroleum discoveries in sparsely drilled areas: Nigeria and the North Sea
Attanasi, E.D.; Root, D.H.
1988-01-01
Decline function methods for projecting future discoveries generally capture the crowding effects of wildcat wells on the discovery rate. However, these methods do not accommodate easily situations where exploration areas and horizons are expanding. In this paper, a method is presented that uses a mapping algorithm for separating these often countervailing influences. The method is applied to Nigeria and the North Sea. For an amount of future drilling equivalent to past drilling (825 wildcat wells), future discoveries (in resources found) for Nigeria are expected to decline by 68% per well but still amount to 8.5 billion barrels of oil equivalent (BOE). Similarly, for the total North Sea for an equivalent amount and mix among areas of past drilling (1322 wildcat wells), future discoveries are expected to amount to 17.9 billion BOE, whereas the average discovery rate per well is expected to decline by 71%. ?? 1988 International Association for Mathematical Geology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jain, M.K.; Narayan, R.; Han, O.
1992-01-30
The overall goal of this project is to find biological methods to remove carboxylic functionalities from low-rank coals under ambient conditions and to assess the properties of these modified coals towards coal liquefaction. The main objectives of this quarter were: (1) continuation of microbial consortia development, (2) evaluation of the isolated organisms for decarboxylation, (3) selection of best performing culture (known cultures vs. new isolates), and (4) coal decarboxylation using activated carbon as blanks. The project began on September 12, 1990.
2016-03-01
well as the Yahoo search engine and a classic SearchKing HIST algorithm. The co-PI immersed herself in the sociology literature for the relevant...Google matrix, PageRank as well as the Yahoo search engine and a classic SearchKing HIST algorithm. The co-PI immersed herself in the sociology...The PI studied all mathematical literature he can find related to the Google search engine, Google matrix, PageRank as well as the Yahoo search
Comparison of Two Methods Used to Model Shape Parameters of Pareto Distributions
Liu, C.; Charpentier, R.R.; Su, J.
2011-01-01
Two methods are compared for estimating the shape parameters of Pareto field-size (or pool-size) distributions for petroleum resource assessment. Both methods assume mature exploration in which most of the larger fields have been discovered. Both methods use the sizes of larger discovered fields to estimate the numbers and sizes of smaller fields: (1) the tail-truncated method uses a plot of field size versus size rank, and (2) the log-geometric method uses data binned in field-size classes and the ratios of adjacent bin counts. Simulation experiments were conducted using discovered oil and gas pool-size distributions from four petroleum systems in Alberta, Canada and using Pareto distributions generated by Monte Carlo simulation. The estimates of the shape parameters of the Pareto distributions, calculated by both the tail-truncated and log-geometric methods, generally stabilize where discovered pool numbers are greater than 100. However, with fewer than 100 discoveries, these estimates can vary greatly with each new discovery. The estimated shape parameters of the tail-truncated method are more stable and larger than those of the log-geometric method where the number of discovered pools is more than 100. Both methods, however, tend to underestimate the shape parameter. Monte Carlo simulation was also used to create sequences of discovered pool sizes by sampling from a Pareto distribution with a discovery process model using a defined exploration efficiency (in order to show how biased the sampling was in favor of larger fields being discovered first). A higher (more biased) exploration efficiency gives better estimates of the Pareto shape parameters. ?? 2011 International Association for Mathematical Geosciences.
Use of Semantic Technology to Create Curated Data Albums
NASA Technical Reports Server (NTRS)
Ramachandran, Rahul; Kulkarni, Ajinkya; Li, Xiang; Sainju, Roshan; Bakare, Rohan; Basyal, Sabin; Fox, Peter (Editor); Norack, Tom (Editor)
2014-01-01
One of the continuing challenges in any Earth science investigation is the discovery and access of useful science content from the increasingly large volumes of Earth science data and related information available online. Current Earth science data systems are designed with the assumption that researchers access data primarily by instrument or geophysical parameter. Those who know exactly the data sets they need can obtain the specific files using these systems. However, in cases where researchers are interested in studying an event of research interest, they must manually assemble a variety of relevant data sets by searching the different distributed data systems. Consequently, there is a need to design and build specialized search and discovery tools in Earth science that can filter through large volumes of distributed online data and information and only aggregate the relevant resources needed to support climatology and case studies. This paper presents a specialized search and discovery tool that automatically creates curated Data Albums. The tool was designed to enable key elements of the search process such as dynamic interaction and sense-making. The tool supports dynamic interaction via different modes of interactivity and visual presentation of information. The compilation of information and data into a Data Album is analogous to a shoebox within the sense-making framework. This tool automates most of the tedious information/data gathering tasks for researchers. Data curation by the tool is achieved via an ontology-based, relevancy ranking algorithm that filters out non-relevant information and data. The curation enables better search results as compared to the simple keyword searches provided by existing data systems in Earth science.
A Social Rank Explanation of How Money Influences Health
2014-01-01
Objective: Financial resources are a potent determinant of health, yet it remains unclear why this is the case. We aimed to identify whether the frequently observed association between absolute levels of monetary resources and health may occur because money acts an indirect proxy for a person’s social rank. Method: To address this question we examined over 230,000 observations on 40,400 adults drawn from two representative national panel studies; the British Household Panel Survey and the English Longitudinal Study of Ageing. We identified each person’s absolute income/wealth and their objective ranked position of income/wealth within a social reference-group. Absolute and rank income/wealth variables were then used to predict a series of self-reported and objectively recorded health outcomes in cross-sectional and longitudinal analyses. Results: As anticipated, those with higher levels of absolute income/wealth were found to have better health than others, after adjustment for age, gender, education, marital status, and labor force status. When evaluated simultaneously the ranked position of income/wealth but not absolute income/wealth predicted all health outcomes examined including: objective measures of allostatic load and obesity, the presence of long-standing illness, and ratings of health, physical functioning, role limitations, and pain. The health benefits of high rank were consistent in cross-sectional and longitudinal analyses and did not depend on the reference-group used to rank participants. Conclusions: This is the first study to demonstrate that social position rather than material conditions may explain the impact of money on human health. PMID:25133843
Quantification of heterogeneity observed in medical images.
Brooks, Frank J; Grigsby, Perry W
2013-03-02
There has been much recent interest in the quantification of visually evident heterogeneity within functional grayscale medical images, such as those obtained via magnetic resonance or positron emission tomography. In the case of images of cancerous tumors, variations in grayscale intensity imply variations in crucial tumor biology. Despite these considerable clinical implications, there is as yet no standardized method for measuring the heterogeneity observed via these imaging modalities. In this work, we motivate and derive a statistical measure of image heterogeneity. This statistic measures the distance-dependent average deviation from the smoothest intensity gradation feasible. We show how this statistic may be used to automatically rank images of in vivo human tumors in order of increasing heterogeneity. We test this method against the current practice of ranking images via expert visual inspection. We find that this statistic provides a means of heterogeneity quantification beyond that given by other statistics traditionally used for the same purpose. We demonstrate the effect of tumor shape upon our ranking method and find the method applicable to a wide variety of clinically relevant tumor images. We find that the automated heterogeneity rankings agree very closely with those performed visually by experts. These results indicate that our automated method may be used reliably to rank, in order of increasing heterogeneity, tumor images whether or not object shape is considered to contribute to that heterogeneity. Automated heterogeneity ranking yields objective results which are more consistent than visual rankings. Reducing variability in image interpretation will enable more researchers to better study potential clinical implications of observed tumor heterogeneity.
Ufarté, Lisa; Bozonnet, Sophie; Laville, Elisabeth; Cecchini, Davide A; Pizzut-Serin, Sandra; Jacquiod, Samuel; Demanèche, Sandrine; Simonet, Pascal; Franqueville, Laure; Veronese, Gabrielle Potocki
2016-01-01
Activity-based metagenomics is one of the most efficient approaches to boost the discovery of novel biocatalysts from the huge reservoir of uncultivated bacteria. In this chapter, we describe a highly generic procedure of metagenomic library construction and high-throughput screening for carbohydrate-active enzymes. Applicable to any bacterial ecosystem, it enables the swift identification of functional enzymes that are highly efficient, alone or acting in synergy, to break down polysaccharides and oligosaccharides.
NASA Astrophysics Data System (ADS)
Zeng, Lang; He, Yu; Povolotskyi, Michael; Liu, XiaoYan; Klimeck, Gerhard; Kubis, Tillmann
2013-06-01
In this work, the low rank approximation concept is extended to the non-equilibrium Green's function (NEGF) method to achieve a very efficient approximated algorithm for coherent and incoherent electron transport. This new method is applied to inelastic transport in various semiconductor nanodevices. Detailed benchmarks with exact NEGF solutions show (1) a very good agreement between approximated and exact NEGF results, (2) a significant reduction of the required memory, and (3) a large reduction of the computational time (a factor of speed up as high as 150 times is observed). A non-recursive solution of the inelastic NEGF transport equations of a 1000 nm long resistor on standard hardware illustrates nicely the capability of this new method.
Derivatives of random matrix characteristic polynomials with applications to elliptic curves
NASA Astrophysics Data System (ADS)
Snaith, N. C.
2005-12-01
The value distribution of derivatives of characteristic polynomials of matrices from SO(N) is calculated at the point 1, the symmetry point on the unit circle of the eigenvalues of these matrices. We consider subsets of matrices from SO(N) that are constrained to have at least n eigenvalues equal to 1 and investigate the first non-zero derivative of the characteristic polynomial at that point. The connection between the values of random matrix characteristic polynomials and values of L-functions in families has been well established. The motivation for this work is the expectation that through this connection with L-functions derived from families of elliptic curves, and using the Birch and Swinnerton-Dyer conjecture to relate values of the L-functions to the rank of elliptic curves, random matrix theory will be useful in probing important questions concerning these ranks.
Shenvi, Neil; van Aggelen, Helen; Yang, Yang; Yang, Weitao; Schwerdtfeger, Christine; Mazziotti, David
2013-08-07
Tensor hypercontraction is a method that allows the representation of a high-rank tensor as a product of lower-rank tensors. In this paper, we show how tensor hypercontraction can be applied to both the electron repulsion integral tensor and the two-particle excitation amplitudes used in the parametric 2-electron reduced density matrix (p2RDM) algorithm. Because only O(r) auxiliary functions are needed in both of these approximations, our overall algorithm can be shown to scale as O(r(4)), where r is the number of single-particle basis functions. We apply our algorithm to several small molecules, hydrogen chains, and alkanes to demonstrate its low formal scaling and practical utility. Provided we use enough auxiliary functions, we obtain accuracy similar to that of the standard p2RDM algorithm, somewhere between that of CCSD and CCSD(T).
Nerve Transfers for Improved Hand Function Following Cervical Spinal Cord Injury
the cervical spine resulting in diminished or complete loss of arm and/or hand function. Cervical SCI patients consistently rank hand function as the...most desired function above bowel and bladder function, sexual function, standing, and pain control. The overall goal of the proposed study is to...evaluate the efficacy of nerve transfers to treat patients with cervical SCIs. Over the last decade, nerve transfers have been used with increasing
2001-01-31
function of Jini, UPnP, SLP, Bluetooth , and HAVi • Projected specific UML models for Jini, UPnP, and SLP • Developed a Rapide Model of Jini...is used by all JINI entities in directed -- discovery mode. It is part of the SCM_Discovery -- Module. Sends Unicast messages to SCMs on list of... SCMS to be discovered until all SCMS are found. -- Receives updates from SCM DB of discovered SCMs and -- removes SCMs accordingly -- NOTE
BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation
2011-01-01
We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be. PMID:21696594
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data.
Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard
2014-04-01
Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices.
Filling the gap in functional trait databases: use of ecological hypotheses to replace missing data
Taugourdeau, Simon; Villerd, Jean; Plantureux, Sylvain; Huguenin-Elie, Olivier; Amiaud, Bernard
2014-01-01
Functional trait databases are powerful tools in ecology, though most of them contain large amounts of missing values. The goal of this study was to test the effect of imputation methods on the evaluation of trait values at species level and on the subsequent calculation of functional diversity indices at community level using functional trait databases. Two simple imputation methods (average and median), two methods based on ecological hypotheses, and one multiple imputation method were tested using a large plant trait database, together with the influence of the percentage of missing data and differences between functional traits. At community level, the complete-case approach and three functional diversity indices calculated from grassland plant communities were included. At the species level, one of the methods based on ecological hypothesis was for all traits more accurate than imputation with average or median values, but the multiple imputation method was superior for most of the traits. The method based on functional proximity between species was the best method for traits with an unbalanced distribution, while the method based on the existence of relationships between traits was the best for traits with a balanced distribution. The ranking of the grassland communities for their functional diversity indices was not robust with the complete-case approach, even for low percentages of missing data. With the imputation methods based on ecological hypotheses, functional diversity indices could be computed with a maximum of 30% of missing data, without affecting the ranking between grassland communities. The multiple imputation method performed well, but not better than single imputation based on ecological hypothesis and adapted to the distribution of the trait values for the functional identity and range of the communities. Ecological studies using functional trait databases have to deal with missing data using imputation methods corresponding to their specific needs and making the most out of the information available in the databases. Within this framework, this study indicates the possibilities and limits of single imputation methods based on ecological hypothesis and concludes that they could be useful when studying the ranking of communities for their functional diversity indices. PMID:24772273
Biological Awareness: Statements for Self-Discovery.
ERIC Educational Resources Information Center
Edington, D.W.; Cunningham, Lee
This guide to biological awareness through guided self-discovery is based on 51 single focus statements concerning the human body. For each statement there are explanations of the underlying physiological principles and suggested activities and discussion ideas to encourage understanding of the statement in terms of the human body's functions,…
From Mere Coincidences to Meaningful Discoveries
ERIC Educational Resources Information Center
Griffiths, Thomas L.; Tenenbaum, Joshua B.
2007-01-01
People's reactions to coincidences are often cited as an illustration of the irrationality of human reasoning about chance. We argue that coincidences may be better understood in terms of rational statistical inference, based on their functional role in processes of causal discovery and theory revision. We present a formal definition of…
The Cancer Target Discovery and Development (CTD2) Network aims to use functional genomics to accelerate the translation of high-throughput and high-content genomic and small-molecule data towards use in precision oncology.
Selection of entropy-measure parameters for knowledge discovery in heart rate variability data
2014-01-01
Background Heart rate variability is the variation of the time interval between consecutive heartbeats. Entropy is a commonly used tool to describe the regularity of data sets. Entropy functions are defined using multiple parameters, the selection of which is controversial and depends on the intended purpose. This study describes the results of tests conducted to support parameter selection, towards the goal of enabling further biomarker discovery. Methods This study deals with approximate, sample, fuzzy, and fuzzy measure entropies. All data were obtained from PhysioNet, a free-access, on-line archive of physiological signals, and represent various medical conditions. Five tests were defined and conducted to examine the influence of: varying the threshold value r (as multiples of the sample standard deviation σ, or the entropy-maximizing rChon), the data length N, the weighting factors n for fuzzy and fuzzy measure entropies, and the thresholds rF and rL for fuzzy measure entropy. The results were tested for normality using Lilliefors' composite goodness-of-fit test. Consequently, the p-value was calculated with either a two sample t-test or a Wilcoxon rank sum test. Results The first test shows a cross-over of entropy values with regard to a change of r. Thus, a clear statement that a higher entropy corresponds to a high irregularity is not possible, but is rather an indicator of differences in regularity. N should be at least 200 data points for r = 0.2 σ and should even exceed a length of 1000 for r = rChon. The results for the weighting parameters n for the fuzzy membership function show different behavior when coupled with different r values, therefore the weighting parameters have been chosen independently for the different threshold values. The tests concerning rF and rL showed that there is no optimal choice, but r = rF = rL is reasonable with r = rChon or r = 0.2σ. Conclusions Some of the tests showed a dependency of the test significance on the data at hand. Nevertheless, as the medical conditions are unknown beforehand, compromises had to be made. Optimal parameter combinations are suggested for the methods considered. Yet, due to the high number of potential parameter combinations, further investigations of entropy for heart rate variability data will be necessary. PMID:25078574
Selection of entropy-measure parameters for knowledge discovery in heart rate variability data.
Mayer, Christopher C; Bachler, Martin; Hörtenhuber, Matthias; Stocker, Christof; Holzinger, Andreas; Wassertheurer, Siegfried
2014-01-01
Heart rate variability is the variation of the time interval between consecutive heartbeats. Entropy is a commonly used tool to describe the regularity of data sets. Entropy functions are defined using multiple parameters, the selection of which is controversial and depends on the intended purpose. This study describes the results of tests conducted to support parameter selection, towards the goal of enabling further biomarker discovery. This study deals with approximate, sample, fuzzy, and fuzzy measure entropies. All data were obtained from PhysioNet, a free-access, on-line archive of physiological signals, and represent various medical conditions. Five tests were defined and conducted to examine the influence of: varying the threshold value r (as multiples of the sample standard deviation σ, or the entropy-maximizing rChon), the data length N, the weighting factors n for fuzzy and fuzzy measure entropies, and the thresholds rF and rL for fuzzy measure entropy. The results were tested for normality using Lilliefors' composite goodness-of-fit test. Consequently, the p-value was calculated with either a two sample t-test or a Wilcoxon rank sum test. The first test shows a cross-over of entropy values with regard to a change of r. Thus, a clear statement that a higher entropy corresponds to a high irregularity is not possible, but is rather an indicator of differences in regularity. N should be at least 200 data points for r = 0.2 σ and should even exceed a length of 1000 for r = rChon. The results for the weighting parameters n for the fuzzy membership function show different behavior when coupled with different r values, therefore the weighting parameters have been chosen independently for the different threshold values. The tests concerning rF and rL showed that there is no optimal choice, but r = rF = rL is reasonable with r = rChon or r = 0.2σ. Some of the tests showed a dependency of the test significance on the data at hand. Nevertheless, as the medical conditions are unknown beforehand, compromises had to be made. Optimal parameter combinations are suggested for the methods considered. Yet, due to the high number of potential parameter combinations, further investigations of entropy for heart rate variability data will be necessary.
Sankaran, Neeraja
2012-12-01
Scientific theories about the origin-of-life theories have historically been characterized by the chicken-and-egg problem of which essential aspect of life was the first to appear, replication or self-sustenance. By the 1950s the question was cast in molecular terms and DNA and proteins had come to represent the carriers of the two functions. Meanwhile, RNA, the other nucleic acid, had played a capricious role in origin theories. Because it contained building blocks very similar to DNA, biologists recognized early that RNA could store information in its linear sequences. With the discovery in the 1980s that RNA molecules were capable of biological catalysis, a function hitherto ascribed to proteins alone, RNA took on the role of the single entity that could act as both chicken and egg. Within a few years of the discovery of these catalytic RNAs (ribozymes) scientists had formulated an RNA World hypothesis that posited an early phase in the evolution of life where all key functions were performed by RNA molecules. This paper traces the history the role of RNA in origin-of-life theories with a focus on how the discovery of ribozymes influenced the discourse. Copyright © 2012 Elsevier Ltd. All rights reserved.
Neoclassic drug discovery: the case for lead generation using phenotypic and functional approaches.
Lee, Jonathan A; Berg, Ellen L
2013-12-01
Innovation and new molecular entity production by the pharmaceutical industry has been below expectations. Surprisingly, more first-in-class small-molecule drugs approved by the U.S. Food and Drug Administration (FDA) between 1999 and 2008 were identified by functional phenotypic lead generation strategies reminiscent of pre-genomics pharmacology than contemporary molecular targeted strategies that encompass the vast majority of lead generation efforts. This observation, in conjunction with the difficulty in validating molecular targets for drug discovery, has diminished the impact of the "genomics revolution" and has led to a growing grassroots movement and now broader trend in pharma to reconsider the use of modern physiology-based or phenotypic drug discovery (PDD) strategies. This "From the Guest Editors" column provides an introduction and overview of the two-part special issues of Journal of Biomolecular Screening on PDD. Terminology and the business case for use of PDD are defined. Key issues such as assay performance, chemical optimization, target identification, and challenges to the organization and implementation of PDD are discussed. Possible solutions for these challenges and a new neoclassic vision for PDD that combines phenotypic and functional approaches with technology innovations resulting from the genomics-driven era of target-based drug discovery (TDD) are also described. Finally, an overview of the manuscripts in this special edition is provided.
Thaut, Michael H
2015-01-01
The discovery of rhythmic auditory-motor entrainment in clinical populations was a historical breakthrough in demonstrating for the first time a neurological mechanism linking music to retraining brain and behavioral functions. Early pilot studies from this research center were followed up by a systematic line of research studying rhythmic auditory stimulation on motor therapies for stroke, Parkinson's disease, traumatic brain injury, cerebral palsy, and other movement disorders. The comprehensive effects on improving multiple aspects of motor control established the first neuroscience-based clinical method in music, which became the bedrock for the later development of neurologic music therapy. The discovery of entrainment fundamentally shifted and extended the view of the therapeutic properties of music from a psychosocially dominated view to a view using the structural elements of music to retrain motor control, speech and language function, and cognitive functions such as attention and memory. © 2015 Elsevier B.V. All rights reserved.
Kollmann, Christopher S; Bai, Xiaopeng; Tsai, Ching-Hsuan; Yang, Hongfang; Lind, Kenneth E; Skinner, Steven R; Zhu, Zhengrong; Israel, David I; Cuozzo, John W; Morgan, Barry A; Yuki, Koichi; Xie, Can; Springer, Timothy A; Shimaoka, Motomu; Evindar, Ghotas
2014-04-01
The inhibition of protein-protein interactions remains a challenge for traditional small molecule drug discovery. Here we describe the use of DNA-encoded library technology for the discovery of small molecules that are potent inhibitors of the interaction between lymphocyte function-associated antigen 1 and its ligand intercellular adhesion molecule 1. A DNA-encoded library with a potential complexity of 4.1 billion compounds was exposed to the I-domain of the target protein and the bound ligands were affinity selected, yielding an enriched small-molecule hit family. Compounds representing this family were synthesized without their DNA encoding moiety and found to inhibit the lymphocyte function-associated antigen 1/intercellular adhesion molecule-1 interaction with submicromolar potency in both ELISA and cell adhesion assays. Re-synthesized compounds conjugated to DNA or a fluorophore were demonstrated to bind to cells expressing the target protein. Copyright © 2014 Elsevier Ltd. All rights reserved.
Riddick, N V; Czoty, P W; Gage, H D; Kaplan, J R; Nader, S H; Icenhower, M; Pierre, P J; Bennett, A; Garg, P K; Garg, S; Nader, M A
2009-02-18
Socially housed monkeys have been used as a model to study human diseases. The present study examined behavioral, physiological and neurochemical measures as predictors of social rank in 16 experimentally naïve, individually housed female cynomolgus monkeys (Macaca fascicularis). The two behavioral measures examined were novel object reactivity (NOR), as determined by latency to touch an opaque acrylic box placed in the home cage, and locomotor activity assessed in a novel open-field apparatus. Serum cortisol concentrations were evaluated three times per week for four consecutive weeks, and stress reactivity was assessed on one occasion by evaluating the cortisol response to adrenocorticotropic hormone (ACTH) following dexamethasone suppression. Measures of serotonin (5-HT) function included whole blood 5-HT (WBS) concentrations, cerebrospinal fluid (CSF) concentrations of the 5-HT metabolite 5-hydroxyindoleacetic acid (5-HIAA) and brain 5-HT transporter (SERT) availability obtained using positron emission tomography (PET). After baseline measures were obtained, monkeys were assigned to four social groups of four monkeys per group. The two measures that correlated with eventual social rank were CSF 5-HIAA concentrations, which were significantly higher in the animals who eventually became subordinate, and latency to touch the novel object, which was significantly lower in eventual subordinate monkeys. Measures of 5-HT function did not change as a consequence of social rank. These data suggest that levels of central 5-HIAA and measures of novel object reactivity may be trait markers that influence eventual social rank in female macaques.
Lötsch, Jörn; Lippmann, Catharina; Kringel, Dario; Ultsch, Alfred
2017-01-01
Genes causally involved in human insensitivity to pain provide a unique molecular source of studying the pathophysiology of pain and the development of novel analgesic drugs. The increasing availability of “big data” enables novel research approaches to chronic pain while also requiring novel techniques for data mining and knowledge discovery. We used machine learning to combine the knowledge about n = 20 genes causally involved in human hereditary insensitivity to pain with the knowledge about the functions of thousands of genes. An integrated computational analysis proposed that among the functions of this set of genes, the processes related to nervous system development and to ceramide and sphingosine signaling pathways are particularly important. This is in line with earlier suggestions to use these pathways as therapeutic target in pain. Following identification of the biological processes characterizing hereditary insensitivity to pain, the biological processes were used for a similarity analysis with the functions of n = 4,834 database-queried drugs. Using emergent self-organizing maps, a cluster of n = 22 drugs was identified sharing important functional features with hereditary insensitivity to pain. Several members of this cluster had been implicated in pain in preclinical experiments. Thus, the present concept of machine-learned knowledge discovery for pain research provides biologically plausible results and seems to be suitable for drug discovery by identifying a narrow choice of repurposing candidates, demonstrating that contemporary machine-learned methods offer innovative approaches to knowledge discovery from available evidence. PMID:28848388
Perret, Didier
2012-01-01
Taking into consideration the gap between the decreasing interest of teenagers for careers in science, and the ever-increasing need for top-ranked scientists and engineers to maintain wealthy economies, it has become of the utmost importance for governments and public institutions to devise new initiatives that bring attractive insights and fascination of modern science to the general audience and in particular to pupils. This paper is targeted toward academic institutions engaged in the process of building incentive programmes in the broad field of molecular science. It describes the step-by-step creation of the Chimiscope, the discovery and experimentation platform inaugurated in 2011 by the University of Geneva, which proposes captivation and take-home messages on the intriguing world of molecules to visitors aged 7 to 107.
Personalization of Rule-based Web Services.
Choi, Okkyung; Han, Sang Yong
2008-04-04
Nowadays Web users have clearly expressed their wishes to receive personalized services directly. Personalization is the way to tailor services directly to the immediate requirements of the user. However, the current Web Services System does not provide any features supporting this such as consideration of personalization of services and intelligent matchmaking. In this research a flexible, personalized Rule-based Web Services System to address these problems and to enable efficient search, discovery and construction across general Web documents and Semantic Web documents in a Web Services System is proposed. This system utilizes matchmaking among service requesters', service providers' and users' preferences using a Rule-based Search Method, and subsequently ranks search results. A prototype of efficient Web Services search and construction for the suggested system is developed based on the current work.
Tulabandhula, Theja; Rudin, Cynthia
2014-06-01
Our goal is to design a prediction and decision system for real-time use during a professional car race. In designing a knowledge discovery process for racing, we faced several challenges that were overcome only when domain knowledge of racing was carefully infused within statistical modeling techniques. In this article, we describe how we leveraged expert knowledge of the domain to produce a real-time decision system for tire changes within a race. Our forecasts have the potential to impact how racing teams can optimize strategy by making tire-change decisions to benefit their rank position. Our work significantly expands previous research on sports analytics, as it is the only work on analytical methods for within-race prediction and decision making for professional car racing.
NASA Astrophysics Data System (ADS)
Paulraj, D.; Swamynathan, S.; Madhaiyan, M.
2012-11-01
Web Service composition has become indispensable as a single web service cannot satisfy complex functional requirements. Composition of services has received much interest to support business-to-business (B2B) or enterprise application integration. An important component of the service composition is the discovery of relevant services. In Semantic Web Services (SWS), service discovery is generally achieved by using service profile of Ontology Web Languages for Services (OWL-S). The profile of the service is a derived and concise description but not a functional part of the service. The information contained in the service profile is sufficient for atomic service discovery, but it is not sufficient for the discovery of composite semantic web services (CSWS). The purpose of this article is two-fold: first to prove that the process model is a better choice than the service profile for service discovery. Second, to facilitate the composition of inter-organisational CSWS by proposing a new composition method which uses process ontology. The proposed service composition approach uses an algorithm which performs a fine grained match at the level of atomic process rather than at the level of the entire service in a composite semantic web service. Many works carried out in this area have proposed solutions only for the composition of atomic services and this article proposes a solution for the composition of composite semantic web services.
Undergraduate Mathematics Students' Understanding of the Concept of Function
ERIC Educational Resources Information Center
Bardini, Caroline; Pierce, Robyn; Vincent, Jill; King, Deborah
2014-01-01
Concern has been expressed that many commencing undergraduate mathematics students have mastered skills without conceptual understanding. A pilot study carried out at a leading Australian university indicates that a significant number of students, with high tertiary entrance ranks, have very limited understanding of the concept of function,…
Ranking Highlights in Personal Videos by Analyzing Edited Videos.
Sun, Min; Farhadi, Ali; Chen, Tseng-Hung; Seitz, Steve
2016-11-01
We present a fully automatic system for ranking domain-specific highlights in unconstrained personal videos by analyzing online edited videos. A novel latent linear ranking model is proposed to handle noisy training data harvested online. Specifically, given a targeted domain such as "surfing," our system mines the YouTube database to find pairs of raw and their corresponding edited videos. Leveraging the assumption that an edited video is more likely to contain highlights than the trimmed parts of the raw video, we obtain pair-wise ranking constraints to train our model. The learning task is challenging due to the amount of noise and variation in the mined data. Hence, a latent loss function is incorporated to mitigate the issues caused by the noise. We efficiently learn the latent model on a large number of videos (about 870 min in total) using a novel EM-like procedure. Our latent ranking model outperforms its classification counterpart and is fairly competitive compared with a fully supervised ranking system that requires labels from Amazon Mechanical Turk. We further show that a state-of-the-art audio feature mel-frequency cepstral coefficients is inferior to a state-of-the-art visual feature. By combining both audio-visual features, we obtain the best performance in dog activity, surfing, skating, and viral video domains. Finally, we show that impressive highlights can be detected without additional human supervision for seven domains (i.e., skating, surfing, skiing, gymnastics, parkour, dog activity, and viral video) in unconstrained personal videos.
Gupta, Radhey S
2016-07-01
Analyses of genome sequences, by some approaches, suggest that the widespread occurrence of horizontal gene transfers (HGTs) in prokaryotes disguises their evolutionary relationships and have led to questioning of the Darwinian model of evolution for prokaryotes. These inferences are critically examined in the light of comparative genome analysis, characteristic synapomorphies, phylogenetic trees and Darwin's views on examining evolutionary relationships. Genome sequences are enabling discovery of numerous molecular markers (synapomorphies) such as conserved signature indels (CSIs) and conserved signature proteins (CSPs), which are distinctive characteristics of different prokaryotic taxa. Based on these molecular markers, exhibiting high degree of specificity and predictive ability, numerous prokaryotic taxa of different ranks, currently identified based on the 16S rRNA gene trees, can now be reliably demarcated in molecular terms. Within all studied groups, multiple CSIs and CSPs have been identified for successive nested clades providing reliable information regarding their hierarchical relationships and these inferences are not affected by HGTs. These results strongly support Darwin's views on evolution and classification and supplement the current phylogenetic framework based on 16S rRNA in important respects. The identified molecular markers provide important means for developing novel diagnostics, therapeutics and for functional studies providing important insights regarding prokaryotic taxa. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Query-Adaptive Hash Code Ranking for Large-Scale Multi-View Visual Search.
Liu, Xianglong; Huang, Lei; Deng, Cheng; Lang, Bo; Tao, Dacheng
2016-10-01
Hash-based nearest neighbor search has become attractive in many applications. However, the quantization in hashing usually degenerates the discriminative power when using Hamming distance ranking. Besides, for large-scale visual search, existing hashing methods cannot directly support the efficient search over the data with multiple sources, and while the literature has shown that adaptively incorporating complementary information from diverse sources or views can significantly boost the search performance. To address the problems, this paper proposes a novel and generic approach to building multiple hash tables with multiple views and generating fine-grained ranking results at bitwise and tablewise levels. For each hash table, a query-adaptive bitwise weighting is introduced to alleviate the quantization loss by simultaneously exploiting the quality of hash functions and their complement for nearest neighbor search. From the tablewise aspect, multiple hash tables are built for different data views as a joint index, over which a query-specific rank fusion is proposed to rerank all results from the bitwise ranking by diffusing in a graph. Comprehensive experiments on image search over three well-known benchmarks show that the proposed method achieves up to 17.11% and 20.28% performance gains on single and multiple table search over the state-of-the-art methods.
Sung, Yao-Ting; Wu, Jeng-Shin
2018-04-17
Traditionally, the visual analogue scale (VAS) has been proposed to overcome the limitations of ordinal measures from Likert-type scales. However, the function of VASs to overcome the limitations of response styles to Likert-type scales has not yet been addressed. Previous research using ranking and paired comparisons to compensate for the response styles of Likert-type scales has suffered from limitations, such as that the total score of ipsative measures is a constant that cannot be analyzed by means of many common statistical techniques. In this study we propose a new scale, called the Visual Analogue Scale for Rating, Ranking, and Paired-Comparison (VAS-RRP), which can be used to collect rating, ranking, and paired-comparison data simultaneously, while avoiding the limitations of each of these data collection methods. The characteristics, use, and analytic method of VAS-RRPs, as well as how they overcome the disadvantages of Likert-type scales, ranking, and VASs, are discussed. On the basis of analyses of simulated and empirical data, this study showed that VAS-RRPs improved reliability, response style bias, and parameter recovery. Finally, we have also designed a VAS-RRP Generator for researchers' construction and administration of their own VAS-RRPs.
Langerhans cell precursors acquire RANK/CD265 in prenatal human skin
Schöppl, Alice; Botta, Albert; Prior, Marion; Akgün, Johnnie; Schuster, Christopher; Elbe-Bürger, Adelheid
2015-01-01
The skin is the first barrier against foreign pathogens and the prenatal formation of a strong network of various innate and adaptive cells is required to protect the newborn from perinatal infections. While many studies about the immune system in healthy and diseased adult human skin exist, our knowledge about the cutaneous prenatal/developing immune system and especially about the phenotype and function of antigen-presenting cells such as epidermal Langerhans cells (LCs) in human skin is still scarce. It has been shown previously that LCs in healthy adult human skin express receptor activator of NF-κB (RANK), an important molecule prolonging their survival. In this study, we investigated at which developmental stage LCs acquire this important molecule. Immunofluorescence double-labeling of cryostat sections revealed that LC precursors in prenatal human skin either do not yet [10–11 weeks of estimated gestational age (EGA)] or only faintly (13–15 weeks EGA) express RANK. LCs express RANK at levels comparable to adult LCs by the end of the second trimester. Comparable with adult skin, dermal antigen-presenting cells at no gestational age express this marker. These findings indicate that epidermal leukocytes gradually acquire RANK during gestation – a phenomenon previously observed also for other markers on LCs in prenatal human skin. PMID:25722033
Genomics and transcriptomics in drug discovery.
Dopazo, Joaquin
2014-02-01
The popularization of genomic high-throughput technologies is causing a revolution in biomedical research and, particularly, is transforming the field of drug discovery. Systems biology offers a framework to understand the extensive human genetic heterogeneity revealed by genomic sequencing in the context of the network of functional, regulatory and physical protein-drug interactions. Thus, approaches to find biomarkers and therapeutic targets will have to take into account the complex system nature of the relationships of the proteins with the disease. Pharmaceutical companies will have to reorient their drug discovery strategies considering the human genetic heterogeneity. Consequently, modeling and computational data analysis will have an increasingly important role in drug discovery. Copyright © 2013 Elsevier Ltd. All rights reserved.
Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field.
Wójcikowski, Maciej; Zielenkiewicz, Piotr; Siedlecki, Pawel
2015-01-01
There has been huge progress in the open cheminformatics field in both methods and software development. Unfortunately, there has been little effort to unite those methods and software into one package. We here describe the Open Drug Discovery Toolkit (ODDT), which aims to fulfill the need for comprehensive and open source drug discovery software. The Open Drug Discovery Toolkit was developed as a free and open source tool for both computer aided drug discovery (CADD) developers and researchers. ODDT reimplements many state-of-the-art methods, such as machine learning scoring functions (RF-Score and NNScore) and wraps other external software to ease the process of developing CADD pipelines. ODDT is an out-of-the-box solution designed to be easily customizable and extensible. Therefore, users are strongly encouraged to extend it and develop new methods. We here present three use cases for ODDT in common tasks in computer-aided drug discovery. Open Drug Discovery Toolkit is released on a permissive 3-clause BSD license for both academic and industrial use. ODDT's source code, additional examples and documentation are available on GitHub (https://github.com/oddt/oddt).
Mining dynamic noteworthy functions in software execution sequences.
Zhang, Bing; Huang, Guoyan; Wang, Yuqian; He, Haitao; Ren, Jiadong
2017-01-01
As the quality of crucial entities can directly affect that of software, their identification and protection become an important premise for effective software development, management, maintenance and testing, which thus contribute to improving the software quality and its attack-defending ability. Most analysis and evaluation on important entities like codes-based static structure analysis are on the destruction of the actual software running. In this paper, from the perspective of software execution process, we proposed an approach to mine dynamic noteworthy functions (DNFM)in software execution sequences. First, according to software decompiling and tracking stack changes, the execution traces composed of a series of function addresses were acquired. Then these traces were modeled as execution sequences and then simplified so as to get simplified sequences (SFS), followed by the extraction of patterns through pattern extraction (PE) algorithm from SFS. After that, evaluating indicators inner-importance and inter-importance were designed to measure the noteworthiness of functions in DNFM algorithm. Finally, these functions were sorted by their noteworthiness. Comparison and contrast were conducted on the experiment results from two traditional complex network-based node mining methods, namely PageRank and DegreeRank. The results show that the DNFM method can mine noteworthy functions in software effectively and precisely.
Early Probe and Drug Discovery in Academia: A Minireview.
Roy, Anuradha
2018-02-09
Drug discovery encompasses processes ranging from target selection and validation to the selection of a development candidate. While comprehensive drug discovery work flows are implemented predominantly in the big pharma domain, early discovery focus in academia serves to identify probe molecules that can serve as tools to study targets or pathways. Despite differences in the ultimate goals of the private and academic sectors, the same basic principles define the best practices in early discovery research. A successful early discovery program is built on strong target definition and validation using a diverse set of biochemical and cell-based assays with functional relevance to the biological system being studied. The chemicals identified as hits undergo extensive scaffold optimization and are characterized for their target specificity and off-target effects in in vitro and in animal models. While the active compounds from screening campaigns pass through highly stringent chemical and Absorption, Distribution, Metabolism, and Excretion (ADME) filters for lead identification, the probe discovery involves limited medicinal chemistry optimization. The goal of probe discovery is identification of a compound with sub-µM activity and reasonable selectivity in the context of the target being studied. The compounds identified from probe discovery can also serve as starting scaffolds for lead optimization studies.
Modeling & Informatics at Vertex Pharmaceuticals Incorporated: our philosophy for sustained impact
NASA Astrophysics Data System (ADS)
McGaughey, Georgia; Patrick Walters, W.
2017-03-01
Molecular modelers and informaticians have the unique opportunity to integrate cross-functional data using a myriad of tools, methods and visuals to generate information. Using their drug discovery expertise, information is transformed to knowledge that impacts drug discovery. These insights are often times formulated locally and then applied more broadly, which influence the discovery of new medicines. This is particularly true in an organization where the members are exposed to projects throughout an organization, such as in the case of the global Modeling & Informatics group at Vertex Pharmaceuticals. From its inception, Vertex has been a leader in the development and use of computational methods for drug discovery. In this paper, we describe the Modeling & Informatics group at Vertex and the underlying philosophy, which has driven this team to sustain impact on the discovery of first-in-class transformative medicines.
The ``Missing Compounds'' affair in functionality-driven material discovery
NASA Astrophysics Data System (ADS)
Zunger, Alex
2014-03-01
In the paradigm of ``data-driven discovery,'' underlying one of the leading streams of the Material Genome Initiative (MGI), one attempts to compute high-throughput style as many of the properties of as many of the N (about 10**5- 10**6) compounds listed in databases of previously known compounds. One then inspects the ensuing Big Data, searching for useful trends. The alternative and complimentary paradigm of ``functionality-directed search and optimization'' used here, searches instead for the n much smaller than N configurations and compositions that have the desired value of the target functionality. Examples include the use of genetic and other search methods that optimize the structure or identity of atoms on lattice sites, using atomistic electronic structure (such as first-principles) approaches in search of a given electronic property. This addresses a few of the bottlenecks that have faced the alternative, data-driven/high throughput/Big Data philosophy: (i) When the configuration space is theoretically of infinite size, building a complete data base as in data-driven discovery is impossible, yet searching for the optimum functionality, is still a well-posed problem. (ii) The configuration space that we explore might include artificially grown, kinetically stabilized systems (such as 2D layer stacks; superlattices; colloidal nanostructures; Fullerenes) that are not listed in compound databases (used by data-driven approaches), (iii) a large fraction of chemically plausible compounds have not been experimentally synthesized, so in the data-driven approach these are often skipped. In our approach we search explicitly for such ``Missing Compounds''. It is likely that many interesting material properties will be found in cases (i)-(iii) that elude high throughput searches based on databases encapsulating existing knowledge. I will illustrate (a) Functionality-driven discovery of topological insulators and valley-split quantum-computer semiconductors, as well as (b) Use of ``first principles thermodynamics'' to discern which of the previously ``missing compounds'' should, in fact exist and in which structure. Synthesis efforts by Poeppelmeier group at NU realized 20 never-before-made half-Heusler compounds out of the 20 predicted ones, in our predicted space groups. This type of theory-led experimental search of designed materials with target functionalities may shorten the current process of discovery of interesting functional materials. Supported by DOE ,Office of Science, Energy Frontier Research Center for Inverse Design
1984-05-01
By means of the concept of change-of variance function we investigate the stability properties of the asymptotic variance of R-estimators. This allows us to construct the optimal V-robust R-estimator that minimizes the asymptotic variance at the model, under the side condition of a bounded change-of variance function. Finally, we discuss the connection between this function and an influence function for two-sample rank tests introduced by Eplett (1980). (Author)
Bischoff, Florian A; Harrison, Robert J; Valeev, Edward F
2012-09-14
We present an approach to compute accurate correlation energies for atoms and molecules using an adaptive discontinuous spectral-element multiresolution representation for the two-electron wave function. Because of the exponential storage complexity of the spectral-element representation with the number of dimensions, a brute-force computation of two-electron (six-dimensional) wave functions with high precision was not practical. To overcome the key storage bottlenecks we utilized (1) a low-rank tensor approximation (specifically, the singular value decomposition) to compress the wave function, and (2) explicitly correlated R12-type terms in the wave function to regularize the Coulomb electron-electron singularities of the Hamiltonian. All operations necessary to solve the Schrödinger equation were expressed so that the reconstruction of the full-rank form of the wave function is never necessary. Numerical performance of the method was highlighted by computing the first-order Møller-Plesset wave function of a helium atom. The computed second-order Møller-Plesset energy is precise to ~2 microhartrees, which is at the precision limit of the existing general atomic-orbital-based approaches. Our approach does not assume special geometric symmetries, hence application to molecules is straightforward.
Maximising information recovery from rank-order codes
NASA Astrophysics Data System (ADS)
Sen, B.; Furber, S.
2007-04-01
The central nervous system encodes information in sequences of asynchronously generated voltage spikes, but the precise details of this encoding are not well understood. Thorpe proposed rank-order codes as an explanation of the observed speed of information processing in the human visual system. The work described in this paper is inspired by the performance of SpikeNET, a biologically inspired neural architecture using rank-order codes for information processing, and is based on the retinal model developed by VanRullen and Thorpe. This model mimics retinal information processing by passing an input image through a bank of Difference of Gaussian (DoG) filters and then encoding the resulting coefficients in rank-order. To test the effectiveness of this encoding in capturing the information content of an image, the rank-order representation is decoded to reconstruct an image that can be compared with the original. The reconstruction uses a look-up table to infer the filter coefficients from their rank in the encoded image. Since the DoG filters are approximately orthogonal functions, they are treated as their own inverses in the reconstruction process. We obtained a quantitative measure of the perceptually important information retained in the reconstructed image relative to the original using a slightly modified version of an objective metric proposed by Petrovic. It is observed that around 75% of the perceptually important information is retained in the reconstruction. In the present work we reconstruct the input using a pseudo-inverse of the DoG filter-bank with the aim of improving the reconstruction and thereby extracting more information from the rank-order encoded stimulus. We observe that there is an increase of 10 - 15% in the information retrieved from a reconstructed stimulus as a result of inverting the filter-bank.
Valentini, Giorgio; Paccanaro, Alberto; Caniza, Horacio; Romero, Alfonso E; Re, Matteo
2014-06-01
In the context of "network medicine", gene prioritization methods represent one of the main tools to discover candidate disease genes by exploiting the large amount of data covering different types of functional relationships between genes. Several works proposed to integrate multiple sources of data to improve disease gene prioritization, but to our knowledge no systematic studies focused on the quantitative evaluation of the impact of network integration on gene prioritization. In this paper, we aim at providing an extensive analysis of gene-disease associations not limited to genetic disorders, and a systematic comparison of different network integration methods for gene prioritization. We collected nine different functional networks representing different functional relationships between genes, and we combined them through both unweighted and weighted network integration methods. We then prioritized genes with respect to each of the considered 708 medical subject headings (MeSH) diseases by applying classical guilt-by-association, random walk and random walk with restart algorithms, and the recently proposed kernelized score functions. The results obtained with classical random walk algorithms and the best single network achieved an average area under the curve (AUC) across the 708 MeSH diseases of about 0.82, while kernelized score functions and network integration boosted the average AUC to about 0.89. Weighted integration, by exploiting the different "informativeness" embedded in different functional networks, outperforms unweighted integration at 0.01 significance level, according to the Wilcoxon signed rank sum test. For each MeSH disease we provide the top-ranked unannotated candidate genes, available for further bio-medical investigation. Network integration is necessary to boost the performances of gene prioritization methods. Moreover the methods based on kernelized score functions can further enhance disease gene ranking results, by adopting both local and global learning strategies, able to exploit the overall topology of the network. Copyright © 2014 The Authors. Published by Elsevier B.V. All rights reserved.
Overcoming HERG affinity in the discovery of the CCR5 antagonist maraviroc.
Price, David A; Armour, Duncan; de Groot, Marcel; Leishman, Derek; Napier, Carolyn; Perros, Manos; Stammen, Blanda L; Wood, Anthony
2006-09-01
The discovery of maraviroc 17 is described with particular reference to the generation of high selectivity over affinity for the HERG potassium channel. This was achieved through the use of a high throughput binding assay for the HERG channel that is known to show an excellent correlation with functional effects.
Some Applications of Fourier's Great Discovery for Beginners
ERIC Educational Resources Information Center
Kraftmakher, Yaakov
2012-01-01
Nearly two centuries ago, Fourier discovered that any periodic function of period T can be presented as a sum of sine waveforms of frequencies equal to an integer times the fundamental frequency [omega] = 2[pi]/T (Fourier's series). It is impossible to overestimate the importance of Fourier's discovery, and all physics or engineering students…
Foltz, Ian N; Gunasekaran, Kannan; King, Chadwick T
2016-03-01
Since the late 1990s, the use of transgenic animal platforms has transformed the discovery of fully human therapeutic monoclonal antibodies. The first approved therapy derived from a transgenic platform--the epidermal growth factor receptor antagonist panitumumab to treat advanced colorectal cancer--was developed using XenoMouse(®) technology. Since its approval in 2006, the science of discovering and developing therapeutic monoclonal antibodies derived from the XenoMouse(®) platform has advanced considerably. The emerging array of antibody therapeutics developed using transgenic technologies is expected to include antibodies and antibody fragments with novel mechanisms of action and extreme potencies. In addition to these impressive functional properties, these antibodies will be designed to have superior biophysical properties that enable highly efficient large-scale manufacturing methods. Achieving these new heights in antibody drug discovery will ultimately bring better medicines to patients. Here, we review best practices for the discovery and bio-optimization of monoclonal antibodies that fit functional design goals and meet high manufacturing standards. © 2016 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Binary encoding of multiplexed images in mixed noise.
Lalush, David S
2008-09-01
Binary coding of multiplexed signals and images has been studied in the context of spectroscopy with models of either purely constant or purely proportional noise, and has been shown to result in improved noise performance under certain conditions. We consider the case of mixed noise in an imaging system consisting of multiple individually-controllable sources (X-ray or near-infrared, for example) shining on a single detector. We develop a mathematical model for the noise in such a system and show that the noise is dependent on the properties of the binary coding matrix and on the average number of sources used for each code. Each binary matrix has a characteristic linear relationship between the ratio of proportional-to-constant noise and the noise level in the decoded image. We introduce a criterion for noise level, which is minimized via a genetic algorithm search. The search procedure results in the discovery of matrices that outperform the Hadamard S-matrices at certain levels of mixed noise. Simulation of a seven-source radiography system demonstrates that the noise model predicts trends and rank order of performance in regions of nonuniform images and in a simple tomosynthesis reconstruction. We conclude that the model developed provides a simple framework for analysis, discovery, and optimization of binary coding patterns used in multiplexed imaging systems.
Gantner, Melisa E; Peroni, Roxana N; Morales, Juan F; Villalba, María L; Ruiz, María E; Talevi, Alan
2017-08-28
Breast Cancer Resistance Protein (BCRP) is an ATP-dependent efflux transporter linked to the multidrug resistance phenomenon in many diseases such as epilepsy and cancer and a potential source of drug interactions. For these reasons, the early identification of substrates and nonsubstrates of this transporter during the drug discovery stage is of great interest. We have developed a computational nonlinear model ensemble based on conformational independent molecular descriptors using a combined strategy of genetic algorithms, J48 decision tree classifiers, and data fusion. The best model ensemble consists in averaging the ranking of the 12 decision trees that showed the best performance on the training set, which also demonstrated a good performance for the test set. It was experimentally validated using the ex vivo everted rat intestinal sac model. Five anticonvulsant drugs classified as nonsubstrates for BRCP by the model ensemble were experimentally evaluated, and none of them proved to be a BCRP substrate under the experimental conditions used, thus confirming the predictive ability of the model ensemble. The model ensemble reported here is a potentially valuable tool to be used as an in silico ADME filter in computer-aided drug discovery campaigns intended to overcome BCRP-mediated multidrug resistance issues and to prevent drug-drug interactions.
Drug Target Mining and Analysis of the Chinese Tree Shrew for Pharmacological Testing
Liu, Jie; Lee, Wen-hui; Zhang, Yun
2014-01-01
The discovery of new drugs requires the development of improved animal models for drug testing. The Chinese tree shrew is considered to be a realistic candidate model. To assess the potential of the Chinese tree shrew for pharmacological testing, we performed drug target prediction and analysis on genomic and transcriptomic scales. Using our pipeline, 3,482 proteins were predicted to be drug targets. Of these predicted targets, 446 and 1,049 proteins with the highest rank and total scores, respectively, included homologs of targets for cancer chemotherapy, depression, age-related decline and cardiovascular disease. Based on comparative analyses, more than half of drug target proteins identified from the tree shrew genome were shown to be higher similarity to human targets than in the mouse. Target validation also demonstrated that the constitutive expression of the proteinase-activated receptors of tree shrew platelets is similar to that of human platelets but differs from that of mouse platelets. We developed an effective pipeline and search strategy for drug target prediction and the evaluation of model-based target identification for drug testing. This work provides useful information for future studies of the Chinese tree shrew as a source of novel targets for drug discovery research. PMID:25105297
Goehring, Jenny L.; Neff, Donna L.; Baudhuin, Jacquelyn L.; Hughes, Michelle L.
2014-01-01
The first objective of this study was to determine whether adaptive pitch-ranking and electrode-discrimination tasks with cochlear-implant (CI) recipients produce similar results for perceiving intermediate “virtual-channel” pitch percepts using current steering. Previous studies have not examined both behavioral tasks in the same subjects with current steering. A second objective was to determine whether a physiological metric of spatial separation using the electrically evoked compound action potential spread-of-excitation (ECAP SOE) function could predict performance in the behavioral tasks. The metric was the separation index (Σ), defined as the difference in normalized amplitudes between two adjacent ECAP SOE functions, summed across all masker electrodes. Eleven CII or 90 K Advanced Bionics (Valencia, CA) recipients were tested using pairs of electrodes from the basal, middle, and apical portions of the electrode array. The behavioral results, expressed as d′, showed no significant differences across tasks. There was also no significant effect of electrode region for either task. ECAP Σ was not significantly correlated with pitch ranking or electrode discrimination for any of the electrode regions. Therefore, the ECAP separation index is not sensitive enough to predict perceptual resolution of virtual channels. PMID:25480063
Role of RANKL (TNFSF11)-Dependent Osteopetrosis in the Dental Phenotype of Msx2 Null Mutant Mice
Castaneda, Beatriz; Simon, Yohann; Ferbus, Didier; Robert, Benoit; Chesneau, Julie; Mueller, Christopher
2013-01-01
The MSX2 homeoprotein is implicated in all aspects of craniofacial skeletal development. During postnatal growth, MSX2 is expressed in all cells involved in mineralized tissue formation and plays a role in their differentiation and function. Msx2 null (Msx2 −/−) mice display complex craniofacial skeleton abnormalities with bone and tooth defects. A moderate form osteopetrotic phenotype is observed, along with decreased expression of RANKL (TNFSF11), the main osteoclast-differentiating factor. In order to elucidate the role of such an osteopetrosis in the Msx2 −/− mouse dental phenotype, a bone resorption rescue was performed by mating Msx2 −/− mice with a transgenic mouse line overexpressing Rank (Tnfrsf11a). Msx2 −/− RankTg mice had significant improvement in the molar phenotype, while incisor epithelium defects were exacerbated in the enamel area, with formation of massive osteolytic tumors. Although compensation for RANKL loss of function could have potential as a therapy for osteopetrosis, but in Msx2 −/− mice, this approach via RANK overexpression in monocyte-derived lineages, amplified latent epithelial tumor development in the peculiar continuously growing incisor. PMID:24278237
Comparing the loss of functional independence of older adults in the U.S. and China.
Fong, Joelle H; Feng, Jun
2018-01-01
Functional loss among older adults is known to follow a hierarchical sequence, but little is known about whether such sequences differ across socio-cultural contexts. The aim of this study is to construct activities of daily livings (ADL) scales for oldest-old adults in the United States and China so as to compare their functional loss sequences. We use data from the Asset and Health Dynamics of the Oldest Old (n=1607) and Chinese Longitudinal Healthy Longevity Survey (n=5570) for years 1998-2008. ADL items are calibrated within a scale using the Rasch measurement model. Rasch scores are averaged across survey waves to identify the ADL loss sequence for each study population. We also assess scale stability over measurement periods. Factor analyses confirm that the ADL items in each study population can be combined meaningfully to form a hierarchical sequence. Internal consistency assessed by Cronbach's alpha is high (0.81 to 0.95). We find that bathing is the first activity that both older Americans and Chinese have difficulty with, while eating is the last activity. There are, however, differences in the rank order for toileting (ranked more challenging in the Chinese sample) and dressing (ranked more challenging in the U.S. sample). Item orderings are stable over time. The results highlight the relative importance of bathing in the functional loss sequence for older adults, regardless of socio-cultural context. Health interventions are needed to address deficits in the bathroom environment, especially in developing countries like China. Copyright © 2017 Elsevier B.V. All rights reserved.
Pokharel, Yuba Raj; Saarela, Jani; Szwajda, Agnieszka; Rupp, Christian; Rokka, Anne; Lal Kumar Karna, Shibendra; Teittinen, Kaisa; Corthals, Garry; Kallioniemi, Olli; Wennerberg, Krister; Aittokallio, Tero; Westermarck, Jukka
2015-12-01
High content protein interaction screens have revolutionized our understanding of protein complex assembly. However, one of the major challenges in translation of high content protein interaction data is identification of those interactions that are functionally relevant for a particular biological question. To address this challenge, we developed a relevance ranking platform (RRP), which consist of modular functional and bioinformatic filters to provide relevance rank among the interactome proteins. We demonstrate the versatility of RRP to enable a systematic prioritization of the most relevant interaction partners from high content data, highlighted by the analysis of cancer relevant protein interactions for oncoproteins Pin1 and PME-1. We validated the importance of selected interactions by demonstration of PTOV1 and CSKN2B as novel regulators of Pin1 target c-Jun phosphorylation and reveal previously unknown interacting proteins that may mediate PME-1 effects via PP2A-inhibition. The RRP framework is modular and can be modified to answer versatile research problems depending on the nature of the biological question under study. Based on comparison of RRP to other existing filtering tools, the presented data indicate that RRP offers added value especially for the analysis of interacting proteins for which there is no sufficient prior knowledge available. Finally, we encourage the use of RRP in combination with either SAINT or CRAPome computational tools for selecting the candidate interactors that fulfill the both important requirements, functional relevance, and high confidence interaction detection. © 2015 by The American Society for Biochemistry and Molecular Biology, Inc.
[Rank distributions in community ecology from the statistical viewpoint].
Maksimov, V N
2004-01-01
Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.
2012-08-01
small data noise and model error, the discrete Hessian can be approximated by a low-rank matrix. This in turn enables fast solution of an appropriately...implication of the compactness of the Hessian is that for small data noise and model error, the discrete Hessian can be approximated by a low-rank matrix. This...probability distribution is given by the inverse of the Hessian of the negative log likelihood function. For Gaussian data noise and model error, this