Sample records for rank-based statistical methods

  1. Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

    PubMed

    Chan, Kwun Chuen Gary; Qin, Jing

    2015-10-01

    Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  2. Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

    PubMed

    Mathur, Sunil; Sadana, Ajit

    2015-12-01

    We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.

  3. RankProd 2.0: a refactored bioconductor package for detecting differentially expressed features in molecular profiling datasets.

    PubMed

    Del Carratore, Francesco; Jankevics, Andris; Eisinga, Rob; Heskes, Tom; Hong, Fangxin; Breitling, Rainer

    2017-09-01

    The Rank Product (RP) is a statistical technique widely used to detect differentially expressed features in molecular profiling experiments such as transcriptomics, metabolomics and proteomics studies. An implementation of the RP and the closely related Rank Sum (RS) statistics has been available in the RankProd Bioconductor package for several years. However, several recent advances in the understanding of the statistical foundations of the method have made a complete refactoring of the existing package desirable. We implemented a completely refactored version of the RankProd package, which provides a more principled implementation of the statistics for unpaired datasets. Moreover, the permutation-based P -value estimation methods have been replaced by exact methods, providing faster and more accurate results. RankProd 2.0 is available at Bioconductor ( https://www.bioconductor.org/packages/devel/bioc/html/RankProd.html ) and as part of the mzMatch pipeline ( http://www.mzmatch.sourceforge.net ). rainer.breitling@manchester.ac.uk. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  4. MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset.

    PubMed

    Mallik, Saurav; Maulik, Ujjwal

    2015-10-01

    Gene ranking is an important problem in bioinformatics. Here, we propose a new framework for ranking biomolecules (viz., miRNAs, transcription-factors/TFs and genes) in a multi-informative uterine leiomyoma dataset having both gene expression and methylation data using (statistical) eigenvector centrality based approach. At first, genes that are both differentially expressed and methylated, are identified using Limma statistical test. A network, comprising these genes, corresponding TFs from TRANSFAC and ITFP databases, and targeter miRNAs from miRWalk database, is then built. The biomolecules are then ranked based on eigenvector centrality. Our proposed method provides better average accuracy in hub gene and non-hub gene classifications than other methods. Furthermore, pre-ranked Gene set enrichment analysis is applied on the pathway database as well as GO-term databases of Molecular Signatures Database with providing a pre-ranked gene-list based on different centrality values for comparing among the ranking methods. Finally, top novel potential gene-markers for the uterine leiomyoma are provided. Copyright © 2015 Elsevier Inc. All rights reserved.

  5. Time Series Analysis Based on Running Mann Whitney Z Statistics

    USDA-ARS?s Scientific Manuscript database

    A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...

  6. The exact probability distribution of the rank product statistics for replicated experiments.

    PubMed

    Eisinga, Rob; Breitling, Rainer; Heskes, Tom

    2013-03-18

    The rank product method is a widely accepted technique for detecting differentially regulated genes in replicated microarray experiments. To approximate the sampling distribution of the rank product statistic, the original publication proposed a permutation approach, whereas recently an alternative approximation based on the continuous gamma distribution was suggested. However, both approximations are imperfect for estimating small tail probabilities. In this paper we relate the rank product statistic to number theory and provide a derivation of its exact probability distribution and the true tail probabilities. Copyright © 2013 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

  7. Reproducible detection of disease-associated markers from gene expression data.

    PubMed

    Omae, Katsuhiro; Komori, Osamu; Eguchi, Shinto

    2016-08-18

    Detection of disease-associated markers plays a crucial role in gene screening for biological studies. Two-sample test statistics, such as the t-statistic, are widely used to rank genes based on gene expression data. However, the resultant gene ranking is often not reproducible among different data sets. Such irreproducibility may be caused by disease heterogeneity. When we divided data into two subsets, we found that the signs of the two t-statistics were often reversed. Focusing on such instability, we proposed a sign-sum statistic that counts the signs of the t-statistics for all possible subsets. The proposed method excludes genes affected by heterogeneity, thereby improving the reproducibility of gene ranking. We compared the sign-sum statistic with the t-statistic by a theoretical evaluation of the upper confidence limit. Through simulations and applications to real data sets, we show that the sign-sum statistic exhibits superior performance. We derive the sign-sum statistic for getting a robust gene ranking. The sign-sum statistic gives more reproducible ranking than the t-statistic. Using simulated data sets we show that the sign-sum statistic excludes hetero-type genes well. Also for the real data sets, the sign-sum statistic performs well in a viewpoint of ranking reproducibility.

  8. Toward two-dimensional search engines

    NASA Astrophysics Data System (ADS)

    Ermann, L.; Chepelianskii, A. D.; Shepelyansky, D. L.

    2012-07-01

    We study the statistical properties of various directed networks using ranking of their nodes based on the dominant vectors of the Google matrix known as PageRank and CheiRank. On average PageRank orders nodes proportionally to a number of ingoing links, while CheiRank orders nodes proportionally to a number of outgoing links. In this way, the ranking of nodes becomes two dimensional which paves the way for the development of two-dimensional search engines of a new type. Statistical properties of information flow on the PageRank-CheiRank plane are analyzed for networks of British, French and Italian universities, Wikipedia, Linux Kernel, gene regulation and other networks. A special emphasis is done for British universities networks using the large database publicly available in the UK. Methods of spam links control are also analyzed.

  9. Neural network approaches versus statistical methods in classification of multisource remote sensing data

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon A.; Swain, Philip H.; Ersoy, Okan K.

    1990-01-01

    Neural network learning procedures and statistical classificaiton methods are applied and compared empirically in classification of multisource remote sensing and geographic data. Statistical multisource classification by means of a method based on Bayesian classification theory is also investigated and modified. The modifications permit control of the influence of the data sources involved in the classification process. Reliability measures are introduced to rank the quality of the data sources. The data sources are then weighted according to these rankings in the statistical multisource classification. Four data sources are used in experiments: Landsat MSS data and three forms of topographic data (elevation, slope, and aspect). Experimental results show that two different approaches have unique advantages and disadvantages in this classification application.

  10. Corpus-based Statistical Screening for Phrase Identification

    PubMed Central

    Kim, Won; Wilbur, W. John

    2000-01-01

    Purpose: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material. Method: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database. Measurements: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings. Result: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases. Conclusion: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text. PMID:10984469

  11. SparRec: An effective matrix completion framework of missing data imputation for GWAS

    NASA Astrophysics Data System (ADS)

    Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen

    2016-10-01

    Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.

  12. Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

    NASA Astrophysics Data System (ADS)

    Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

    2003-03-01

    Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.

  13. The choice of statistical methods for comparisons of dosimetric data in radiotherapy.

    PubMed

    Chaikh, Abdulhamid; Giraud, Jean-Yves; Perrin, Emmanuel; Bresciani, Jean-Pierre; Balosso, Jacques

    2014-09-18

    Novel irradiation techniques are continuously introduced in radiotherapy to optimize the accuracy, the security and the clinical outcome of treatments. These changes could raise the question of discontinuity in dosimetric presentation and the subsequent need for practice adjustments in case of significant modifications. This study proposes a comprehensive approach to compare different techniques and tests whether their respective dose calculation algorithms give rise to statistically significant differences in the treatment doses for the patient. Statistical investigation principles are presented in the framework of a clinical example based on 62 fields of radiotherapy for lung cancer. The delivered doses in monitor units were calculated using three different dose calculation methods: the reference method accounts the dose without tissues density corrections using Pencil Beam Convolution (PBC) algorithm, whereas new methods calculate the dose with tissues density correction for 1D and 3D using Modified Batho (MB) method and Equivalent Tissue air ratio (ETAR) method, respectively. The normality of the data and the homogeneity of variance between groups were tested using Shapiro-Wilks and Levene test, respectively, then non-parametric statistical tests were performed. Specifically, the dose means estimated by the different calculation methods were compared using Friedman's test and Wilcoxon signed-rank test. In addition, the correlation between the doses calculated by the three methods was assessed using Spearman's rank and Kendall's rank tests. The Friedman's test showed a significant effect on the calculation method for the delivered dose of lung cancer patients (p <0.001). The density correction methods yielded to lower doses as compared to PBC by on average (-5 ± 4.4 SD) for MB and (-4.7 ± 5 SD) for ETAR. Post-hoc Wilcoxon signed-rank test of paired comparisons indicated that the delivered dose was significantly reduced using density-corrected methods as compared to the reference method. Spearman's and Kendall's rank tests indicated a positive correlation between the doses calculated with the different methods. This paper illustrates and justifies the use of statistical tests and graphical representations for dosimetric comparisons in radiotherapy. The statistical analysis shows the significance of dose differences resulting from two or more techniques in radiotherapy.

  14. Support vector methods for survival analysis: a comparison between ranking and regression approaches.

    PubMed

    Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K

    2011-10-01

    To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.

  15. RankExplorer: Visualization of Ranking Changes in Large Time Series Data.

    PubMed

    Shi, Conglei; Cui, Weiwei; Liu, Shixia; Xu, Panpan; Chen, Wei; Qu, Huamin

    2012-12-01

    For many applications involving time series data, people are often interested in the changes of item values over time as well as their ranking changes. For example, people search many words via search engines like Google and Bing every day. Analysts are interested in both the absolute searching number for each word as well as their relative rankings. Both sets of statistics may change over time. For very large time series data with thousands of items, how to visually present ranking changes is an interesting challenge. In this paper, we propose RankExplorer, a novel visualization method based on ThemeRiver to reveal the ranking changes. Our method consists of four major components: 1) a segmentation method which partitions a large set of time series curves into a manageable number of ranking categories; 2) an extended ThemeRiver view with embedded color bars and changing glyphs to show the evolution of aggregation values related to each ranking category over time as well as the content changes in each ranking category; 3) a trend curve to show the degree of ranking changes over time; 4) rich user interactions to support interactive exploration of ranking changes. We have applied our method to some real time series data and the case studies demonstrate that our method can reveal the underlying patterns related to ranking changes which might otherwise be obscured in traditional visualizations.

  16. Rank-based permutation approaches for non-parametric factorial designs.

    PubMed

    Umlauft, Maria; Konietschke, Frank; Pauly, Markus

    2017-11-01

    Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.

  17. Evaluation of bearing capacity of piles from cone penetration test data.

    DOT National Transportation Integrated Search

    2007-12-01

    A statistical analysis and ranking criteria were used to compare the CPT methods and the conventional alpha design method. Based on the results, the de Ruiter/Beringen and LCPC methods showed the best capability in predicting the measured load carryi...

  18. Quantification of heterogeneity observed in medical images.

    PubMed

    Brooks, Frank J; Grigsby, Perry W

    2013-03-02

    There has been much recent interest in the quantification of visually evident heterogeneity within functional grayscale medical images, such as those obtained via magnetic resonance or positron emission tomography. In the case of images of cancerous tumors, variations in grayscale intensity imply variations in crucial tumor biology. Despite these considerable clinical implications, there is as yet no standardized method for measuring the heterogeneity observed via these imaging modalities. In this work, we motivate and derive a statistical measure of image heterogeneity. This statistic measures the distance-dependent average deviation from the smoothest intensity gradation feasible. We show how this statistic may be used to automatically rank images of in vivo human tumors in order of increasing heterogeneity. We test this method against the current practice of ranking images via expert visual inspection. We find that this statistic provides a means of heterogeneity quantification beyond that given by other statistics traditionally used for the same purpose. We demonstrate the effect of tumor shape upon our ranking method and find the method applicable to a wide variety of clinically relevant tumor images. We find that the automated heterogeneity rankings agree very closely with those performed visually by experts. These results indicate that our automated method may be used reliably to rank, in order of increasing heterogeneity, tumor images whether or not object shape is considered to contribute to that heterogeneity. Automated heterogeneity ranking yields objective results which are more consistent than visual rankings. Reducing variability in image interpretation will enable more researchers to better study potential clinical implications of observed tumor heterogeneity.

  19. Statistical Optimality in Multipartite Ranking and Ordinal Regression.

    PubMed

    Uematsu, Kazuki; Lee, Yoonkyung

    2015-05-01

    Statistical optimality in multipartite ranking is investigated as an extension of bipartite ranking. We consider the optimality of ranking algorithms through minimization of the theoretical risk which combines pairwise ranking errors of ordinal categories with differential ranking costs. The extension shows that for a certain class of convex loss functions including exponential loss, the optimal ranking function can be represented as a ratio of weighted conditional probability of upper categories to lower categories, where the weights are given by the misranking costs. This result also bridges traditional ranking methods such as proportional odds model in statistics with various ranking algorithms in machine learning. Further, the analysis of multipartite ranking with different costs provides a new perspective on non-smooth list-wise ranking measures such as the discounted cumulative gain and preference learning. We illustrate our findings with simulation study and real data analysis.

  20. Permutation-based inference for the AUC: A unified approach for continuous and discontinuous data.

    PubMed

    Pauly, Markus; Asendorf, Thomas; Konietschke, Frank

    2016-11-01

    We investigate rank-based studentized permutation methods for the nonparametric Behrens-Fisher problem, that is, inference methods for the area under the ROC curve. We hereby prove that the studentized permutation distribution of the Brunner-Munzel rank statistic is asymptotically standard normal, even under the alternative. Thus, incidentally providing the hitherto missing theoretical foundation for the Neubert and Brunner studentized permutation test. In particular, we do not only show its consistency, but also that confidence intervals for the underlying treatment effects can be computed by inverting this permutation test. In addition, we derive permutation-based range-preserving confidence intervals. Extensive simulation studies show that the permutation-based confidence intervals appear to maintain the preassigned coverage probability quite accurately (even for rather small sample sizes). For a convenient application of the proposed methods, a freely available software package for the statistical software R has been developed. A real data example illustrates the application. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. AUPress: A Comparison of an Open Access University Press with Traditional Presses

    ERIC Educational Resources Information Center

    McGreal, Rory; Chen, Nian-Shing

    2011-01-01

    This study is a comparison of AUPress with three other traditional (non-open access) Canadian university presses. The analysis is based on the rankings that are correlated with book sales on Amazon.com and Amazon.ca. Statistical methods include the sampling of the sales ranking of randomly selected books from each press. The results of one-way…

  2. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  3. Wilcoxon's signed-rank statistic: what null hypothesis and why it matters.

    PubMed

    Li, Heng; Johnson, Terri

    2014-01-01

    In statistical literature, the term 'signed-rank test' (or 'Wilcoxon signed-rank test') has been used to refer to two distinct tests: a test for symmetry of distribution and a test for the median of a symmetric distribution, sharing a common test statistic. To avoid potential ambiguity, we propose to refer to those two tests by different names, as 'test for symmetry based on signed-rank statistic' and 'test for median based on signed-rank statistic', respectively. The utility of such terminological differentiation should become evident through our discussion of how those tests connect and contrast with sign test and one-sample t-test. Published 2014. This article is a U.S. Government work and is in the public domain in the USA. Published 2014. This article is a U.S. Government work and is in the public domain in the USA.

  4. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  5. rpsftm: An R Package for Rank Preserving Structural Failure Time Models

    PubMed Central

    Allison, Annabel; White, Ian R; Bond, Simon

    2018-01-01

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ, is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z(ψ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm. PMID:29564164

  6. rpsftm: An R Package for Rank Preserving Structural Failure Time Models.

    PubMed

    Allison, Annabel; White, Ian R; Bond, Simon

    2017-12-04

    Treatment switching in a randomised controlled trial occurs when participants change from their randomised treatment to the other trial treatment during the study. Failure to account for treatment switching in the analysis (i.e. by performing a standard intention-to-treat analysis) can lead to biased estimates of treatment efficacy. The rank preserving structural failure time model (RPSFTM) is a method used to adjust for treatment switching in trials with survival outcomes. The RPSFTM is due to Robins and Tsiatis (1991) and has been developed by White et al. (1997, 1999). The method is randomisation based and uses only the randomised treatment group, observed event times, and treatment history in order to estimate a causal treatment effect. The treatment effect, ψ , is estimated by balancing counter-factual event times (that would be observed if no treatment were received) between treatment groups. G-estimation is used to find the value of ψ such that a test statistic Z ( ψ ) = 0. This is usually the test statistic used in the intention-to-treat analysis, for example, the log rank test statistic. We present an R package that implements the method of rpsftm.

  7. 76 FR 22122 - Section 8 Housing Choice Voucher Program-Demonstration Project of Small Area Fair Market Rents in...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2011-04-20

    ... rent and the core-based statistical area (CBSA) rent as applied to the 40th percentile FMR for that..., calculated on the basis of the core-based statistical area (CBSA) or the metropolitan Statistical Area (MSA... will be ranked according to each of the statistics specified above, and then a weighted average ranking...

  8. Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

    PubMed

    Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal

    2017-01-01

    Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.

  9. A cautionary note on the rank product statistic.

    PubMed

    Koziol, James A

    2016-06-01

    The rank product method introduced by Breitling R et al. [2004, FEBS Letters 573, 83-92] has rapidly generated popularity in practical settings, in particular, detecting differential expression of genes in microarray experiments. The purpose of this note is to point out a particular property of the rank product method, namely, its differential sensitivity to over- and underexpression. It turns out that overexpression is less likely to be detected than underexpression with the rank product statistic. We have conducted both empirical and exact power studies that demonstrate this phenomenon, and summarize these findings in this note. © 2016 Federation of European Biochemical Societies.

  10. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  11. Text mining by Tsallis entropy

    NASA Astrophysics Data System (ADS)

    Jamaati, Maryam; Mehri, Ali

    2018-01-01

    Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.

  12. An Improved Rank Correlation Effect Size Statistic for Single-Case Designs: Baseline Corrected Tau.

    PubMed

    Tarlow, Kevin R

    2017-07-01

    Measuring treatment effects when an individual's pretreatment performance is improving poses a challenge for single-case experimental designs. It may be difficult to determine whether improvement is due to the treatment or due to the preexisting baseline trend. Tau- U is a popular single-case effect size statistic that purports to control for baseline trend. However, despite its strengths, Tau- U has substantial limitations: Its values are inflated and not bound between -1 and +1, it cannot be visually graphed, and its relatively weak method of trend control leads to unacceptable levels of Type I error wherein ineffective treatments appear effective. An improved effect size statistic based on rank correlation and robust regression, Baseline Corrected Tau, is proposed and field-tested with both published and simulated single-case time series. A web-based calculator for Baseline Corrected Tau is also introduced for use by single-case investigators.

  13. Analysis of high-throughput biological data using their rank values.

    PubMed

    Dembélé, Doulaye

    2018-01-01

    High-throughput biological technologies are routinely used to generate gene expression profiling or cytogenetics data. To achieve high performance, methods available in the literature become more specialized and often require high computational resources. Here, we propose a new versatile method based on the data-ordering rank values. We use linear algebra, the Perron-Frobenius theorem and also extend a method presented earlier for searching differentially expressed genes for the detection of recurrent copy number aberration. A result derived from the proposed method is a one-sample Student's t-test based on rank values. The proposed method is to our knowledge the only that applies to gene expression profiling and to cytogenetics data sets. This new method is fast, deterministic, and requires a low computational load. Probabilities are associated with genes to allow a statistically significant subset selection in the data set. Stability scores are also introduced as quality parameters. The performance and comparative analyses were carried out using real data sets. The proposed method can be accessed through an R package available from the CRAN (Comprehensive R Archive Network) website: https://cran.r-project.org/web/packages/fcros .

  14. A fast algorithm for determining bounds and accurate approximate p-values of the rank product statistic for replicate experiments.

    PubMed

    Heskes, Tom; Eisinga, Rob; Breitling, Rainer

    2014-11-21

    The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution. We derive strict lower and upper bounds to the exact p-value along with an accurate approximation that can be used to assess the significance of the rank product statistic in a computationally fast manner. The bounds and the proposed approximation are shown to provide far better accuracy over existing approximate methods in determining tail probabilities, with the slightly conservative upper bound protecting against false positives. We illustrate the proposed method in the context of a recently published analysis on transcriptomic profiling performed in blood. We provide a method to determine upper bounds and accurate approximate p-values of the rank product statistic. The proposed algorithm provides an order of magnitude increase in throughput as compared with current approaches and offers the opportunity to explore new application domains with even larger multiple testing issue. The R code is published in one of the Additional files and is available at http://www.ru.nl/publish/pages/726696/rankprodbounds.zip .

  15. Verification of learner’s differences by team-based learning in biochemistry classes

    PubMed Central

    2017-01-01

    Purpose We tested the effect of team-based learning (TBL) on medical education through the second-year premedical students’ TBL scores in biochemistry classes over 5 years. Methods We analyzed the results based on test scores before and after the students’ debate. The groups of students for statistical analysis were divided as follows: group 1 comprised the top-ranked students, group 3 comprised the low-ranked students, and group 2 comprised the medium-ranked students. Therefore, group T comprised 382 students (the total number of students in group 1, 2, and 3). To calibrate the difficulty of the test, original scores were converted into standardized scores. We determined the differences of the tests using Student t-test, and the relationship between scores before, and after the TBL using linear regression tests. Results Although there was a decrease in the lowest score, group T and 3 showed a significant increase in both original and standardized scores; there was also an increase in the standardized score of group 3. There was a positive correlation between the pre- and the post-debate scores in group T, and 2. And the beta values of the pre-debate scores and “the changes between the pre- and post-debate scores” were statistically significant in both original and standardized scores. Conclusion TBL is one of the educational methods for helping students improve their grades, particularly those of low-ranked students. PMID:29207457

  16. Ranked solutions to a class of combinatorial optimizations - with applications in mass spectrometry based peptide sequencing

    NASA Astrophysics Data System (ADS)

    Doerr, Timothy; Alves, Gelio; Yu, Yi-Kuo

    2006-03-01

    Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time. This suggests a way to efficiently find approximate solutions - - find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the fininte number of high- ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks - - peptide sequencing using tandem mass spectrometry data.

  17. Robust Covariate-Adjusted Log-Rank Statistics and Corresponding Sample Size Formula for Recurrent Events Data

    PubMed Central

    Song, Rui; Kosorok, Michael R.; Cai, Jianwen

    2009-01-01

    Summary Recurrent events data are frequently encountered in clinical trials. This article develops robust covariate-adjusted log-rank statistics applied to recurrent events data with arbitrary numbers of events under independent censoring and the corresponding sample size formula. The proposed log-rank tests are robust with respect to different data-generating processes and are adjusted for predictive covariates. It reduces to the Kong and Slud (1997, Biometrika 84, 847–862) setting in the case of a single event. The sample size formula is derived based on the asymptotic normality of the covariate-adjusted log-rank statistics under certain local alternatives and a working model for baseline covariates in the recurrent event data context. When the effect size is small and the baseline covariates do not contain significant information about event times, it reduces to the same form as that of Schoenfeld (1983, Biometrics 39, 499–503) for cases of a single event or independent event times within a subject. We carry out simulations to study the control of type I error and the comparison of powers between several methods in finite samples. The proposed sample size formula is illustrated using data from an rhDNase study. PMID:18162107

  18. New insights into old methods for identifying causal rare variants.

    PubMed

    Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi

    2011-11-29

    The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.

  19. Illustrating, Quantifying, and Correcting for Bias in Post-hoc Analysis of Gene-Based Rare Variant Tests of Association

    PubMed Central

    Grinde, Kelsey E.; Arbet, Jaron; Green, Alden; O'Connell, Michael; Valcarcel, Alessandra; Westra, Jason; Tintle, Nathan

    2017-01-01

    To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as “winner's curse.” We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10−6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures. PMID:28959274

  20. Deviation-based spam-filtering method via stochastic approach

    NASA Astrophysics Data System (ADS)

    Lee, Daekyung; Lee, Mi Jin; Kim, Beom Jun

    2018-03-01

    In the presence of a huge number of possible purchase choices, ranks or ratings of items by others often play very important roles for a buyer to make a final purchase decision. Perfectly objective rating is an impossible task to achieve, and we often use an average rating built on how previous buyers estimated the quality of the product. The problem of using a simple average rating is that it can easily be polluted by careless users whose evaluation of products cannot be trusted, and by malicious spammers who try to bias the rating result on purpose. In this letter we suggest how trustworthiness of individual users can be systematically and quantitatively reflected to build a more reliable rating system. We compute the suitably defined reliability of each user based on the user's rating pattern for all products she evaluated. We call our proposed method as the deviation-based ranking, since the statistical significance of each user's rating pattern with respect to the average rating pattern is the key ingredient. We find that our deviation-based ranking method outperforms existing methods in filtering out careless random evaluators as well as malicious spammers.

  1. Analysis of censored data.

    PubMed

    Lucijanic, Marko; Petrovecki, Mladen

    2012-01-01

    Analyzing events over time is often complicated by incomplete, or censored, observations. Special non-parametric statistical methods were developed to overcome difficulties in summarizing and comparing censored data. Life-table (actuarial) method and Kaplan-Meier method are described with an explanation of survival curves. For the didactic purpose authors prepared a workbook based on most widely used Kaplan-Meier method. It should help the reader understand how Kaplan-Meier method is conceptualized and how it can be used to obtain statistics and survival curves needed to completely describe a sample of patients. Log-rank test and hazard ratio are also discussed.

  2. Structural MRI-based detection of Alzheimer's disease using feature ranking and classification error.

    PubMed

    Beheshti, Iman; Demirel, Hasan; Farokhian, Farnaz; Yang, Chunlan; Matsuda, Hiroshi

    2016-12-01

    This paper presents an automatic computer-aided diagnosis (CAD) system based on feature ranking for detection of Alzheimer's disease (AD) using structural magnetic resonance imaging (sMRI) data. The proposed CAD system is composed of four systematic stages. First, global and local differences in the gray matter (GM) of AD patients compared to the GM of healthy controls (HCs) are analyzed using a voxel-based morphometry technique. The aim is to identify significant local differences in the volume of GM as volumes of interests (VOIs). Second, the voxel intensity values of the VOIs are extracted as raw features. Third, the raw features are ranked using a seven-feature ranking method, namely, statistical dependency (SD), mutual information (MI), information gain (IG), Pearson's correlation coefficient (PCC), t-test score (TS), Fisher's criterion (FC), and the Gini index (GI). The features with higher scores are more discriminative. To determine the number of top features, the estimated classification error based on training set made up of the AD and HC groups is calculated, with the vector size that minimized this error selected as the top discriminative feature. Fourth, the classification is performed using a support vector machine (SVM). In addition, a data fusion approach among feature ranking methods is introduced to improve the classification performance. The proposed method is evaluated using a data-set from ADNI (130 AD and 130 HC) with 10-fold cross-validation. The classification accuracy of the proposed automatic system for the diagnosis of AD is up to 92.48% using the sMRI data. An automatic CAD system for the classification of AD based on feature-ranking method and classification errors is proposed. In this regard, seven-feature ranking methods (i.e., SD, MI, IG, PCC, TS, FC, and GI) are evaluated. The optimal size of top discriminative features is determined by the classification error estimation in the training phase. The experimental results indicate that the performance of the proposed system is comparative to that of state-of-the-art classification models. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  3. Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data

    PubMed Central

    Salomon, Joshua A

    2003-01-01

    Background In survey studies on health-state valuations, ordinal ranking exercises often are used as precursors to other elicitation methods such as the time trade-off (TTO) or standard gamble, but the ranking data have not been used in deriving cardinal valuations. This study reconsiders the role of ordinal ranks in valuing health and introduces a new approach to estimate interval-scaled valuations based on aggregate ranking data. Methods Analyses were undertaken on data from a previously published general population survey study in the United Kingdom that included rankings and TTO values for hypothetical states described using the EQ-5D classification system. The EQ-5D includes five domains (mobility, self-care, usual activities, pain/discomfort and anxiety/depression) with three possible levels on each. Rank data were analysed using a random utility model, operationalized through conditional logit regression. In the statistical model, probabilities of observed rankings were related to the latent utilities of different health states, modeled as a linear function of EQ-5D domain scores, as in previously reported EQ-5D valuation functions. Predicted valuations based on the conditional logit model were compared to observed TTO values for the 42 states in the study and to predictions based on a model estimated directly from the TTO values. Models were evaluated using the intraclass correlation coefficient (ICC) between predictions and mean observations, and the root mean squared error of predictions at the individual level. Results Agreement between predicted valuations from the rank model and observed TTO values was very high, with an ICC of 0.97, only marginally lower than for predictions based on the model estimated directly from TTO values (ICC = 0.99). Individual-level errors were also comparable in the two models, with root mean squared errors of 0.503 and 0.496 for the rank-based and TTO-based predictions, respectively. Conclusions Modeling health-state valuations based on ordinal ranks can provide results that are similar to those obtained from more widely analyzed valuation techniques such as the TTO. The information content in aggregate ranking data is not currently exploited to full advantage. The possibility of estimating cardinal valuations from ordinal ranks could also simplify future data collection dramatically and facilitate wider empirical study of health-state valuations in diverse settings and population groups. PMID:14687419

  4. An R package for analyzing and modeling ranking data

    PubMed Central

    2013-01-01

    Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645

  5. An R package for analyzing and modeling ranking data.

    PubMed

    Lee, Paul H; Yu, Philip L H

    2013-05-14

    In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.

  6. System and method for progressive band selection for hyperspectral images

    NASA Technical Reports Server (NTRS)

    Fisher, Kevin (Inventor)

    2013-01-01

    Disclosed herein are systems, methods, and non-transitory computer-readable storage media for progressive band selection for hyperspectral images. A system having module configured to control a processor to practice the method calculates a virtual dimensionality of a hyperspectral image having multiple bands to determine a quantity Q of how many bands are needed for a threshold level of information, ranks each band based on a statistical measure, selects Q bands from the multiple bands to generate a subset of bands based on the virtual dimensionality, and generates a reduced image based on the subset of bands. This approach can create reduced datasets of full hyperspectral images tailored for individual applications. The system uses a metric specific to a target application to rank the image bands, and then selects the most useful bands. The number of bands selected can be specified manually or calculated from the hyperspectral image's virtual dimensionality.

  7. The Wilcoxon signed rank test for paired comparisons of clustered data.

    PubMed

    Rosner, Bernard; Glynn, Robert J; Lee, Mei-Ling T

    2006-03-01

    The Wilcoxon signed rank test is a frequently used nonparametric test for paired data (e.g., consisting of pre- and posttreatment measurements) based on independent units of analysis. This test cannot be used for paired comparisons arising from clustered data (e.g., if paired comparisons are available for each of two eyes of an individual). To incorporate clustering, a generalization of the randomization test formulation for the signed rank test is proposed, where the unit of randomization is at the cluster level (e.g., person), while the individual paired units of analysis are at the subunit within cluster level (e.g., eye within person). An adjusted variance estimate of the signed rank test statistic is then derived, which can be used for either balanced (same number of subunits per cluster) or unbalanced (different number of subunits per cluster) data, with an exchangeable correlation structure, with or without tied values. The resulting test statistic is shown to be asymptotically normal as the number of clusters becomes large, if the cluster size is bounded. Simulation studies are performed based on simulating correlated ranked data from a signed log-normal distribution. These studies indicate appropriate type I error for data sets with > or =20 clusters and a superior power profile compared with either the ordinary signed rank test based on the average cluster difference score or the multivariate signed rank test of Puri and Sen. Finally, the methods are illustrated with two data sets, (i) an ophthalmologic data set involving a comparison of electroretinogram (ERG) data in retinitis pigmentosa (RP) patients before and after undergoing an experimental surgical procedure, and (ii) a nutritional data set based on a randomized prospective study of nutritional supplements in RP patients where vitamin E intake outside of study capsules is compared before and after randomization to monitor compliance with nutritional protocols.

  8. High-dimensional statistical inference: From vector to matrix

    NASA Astrophysics Data System (ADS)

    Zhang, Anru

    Statistical inference for sparse signals or low-rank matrices in high-dimensional settings is of significant interest in a range of contemporary applications. It has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. In this thesis, we consider several problems in including sparse signal recovery (compressed sensing under restricted isometry) and low-rank matrix recovery (matrix recovery via rank-one projections and structured matrix completion). The first part of the thesis discusses compressed sensing and affine rank minimization in both noiseless and noisy cases and establishes sharp restricted isometry conditions for sparse signal and low-rank matrix recovery. The analysis relies on a key technical tool which represents points in a polytope by convex combinations of sparse vectors. The technique is elementary while leads to sharp results. It is shown that, in compressed sensing, delta kA < 1/3, deltak A+ thetak,kA < 1, or deltatkA < √( t - 1)/t for any given constant t ≥ 4/3 guarantee the exact recovery of all k sparse signals in the noiseless case through the constrained ℓ1 minimization, and similarly in affine rank minimization delta rM < 1/3, deltar M + thetar, rM < 1, or deltatrM< √( t - 1)/t ensure the exact reconstruction of all matrices with rank at most r in the noiseless case via the constrained nuclear norm minimization. Moreover, for any epsilon > 0, delta kA < 1/3 + epsilon, deltak A + thetak,kA < 1 + epsilon, or deltatkA< √(t - 1) / t + epsilon are not sufficient to guarantee the exact recovery of all k-sparse signals for large k. Similar result also holds for matrix recovery. In addition, the conditions delta kA<1/3, deltak A+ thetak,kA<1, delta tkA < √(t - 1)/t and deltarM<1/3, delta rM+ thetar,rM<1, delta trM< √(t - 1)/ t are also shown to be sufficient respectively for stable recovery of approximately sparse signals and low-rank matrices in the noisy case. For the second part of the thesis, we introduce a rank-one projection model for low-rank matrix recovery and propose a constrained nuclear norm minimization method for stable recovery of low-rank matrices in the noisy case. The procedure is adaptive to the rank and robust against small perturbations. Both upper and lower bounds for the estimation accuracy under the Frobenius norm loss are obtained. The proposed estimator is shown to be rate-optimal under certain conditions. The estimator is easy to implement via convex programming and performs well numerically. The techniques and main results developed in the chapter also have implications to other related statistical problems. An application to estimation of spiked covariance matrices from one-dimensional random projections is considered. The results demonstrate that it is still possible to accurately estimate the covariance matrix of a high-dimensional distribution based only on one-dimensional projections. For the third part of the thesis, we consider another setting of low-rank matrix completion. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  9. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

    PubMed Central

    Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

    2011-01-01

    Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957

  10. Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks

    NASA Astrophysics Data System (ADS)

    Frahm, Klaus M.; Shepelyansky, Dima L.

    2014-04-01

    We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.

  11. Escape the Black Hole of Lecturing: Put Collaborative Ranking Tasks on Your Event Horizon

    NASA Astrophysics Data System (ADS)

    Hudgins, D. W.; Prather, E. E.; Grayson, D. J.

    2005-05-01

    At the University of Arizona, we have been developing and testing a new type of introductory astronomy curriculum material called Ranking Tasks. Ranking Tasks are a form of conceptual exercise that presents students with four to six physical situations, usually by pictures or diagrams, and asks students to rank order the situations based on some resulting effect. Our study developed design guidelines for Ranking Tasks based on learning theory and classroom pilot studies. Our research questions were: Do in-class collaborative Ranking Task exercises result in student conceptual gains when used in conjunction with traditional lecture-based instruction? And are these gains sufficient to justify implementing them into the astronomy classroom? We conducted a single-group repeated measures experiment across eight core introductory astronomy topics with 250 students at the University of Arizona in the Fall of 2004. The study found that traditional lecture-based instruction alone produced statistically significant gains - raising test scores to 61% post-lecture from 32% on the pretest. While significant, we find these gains to be unsatisfactory from a teaching and learning perspective. The study data shows that adding a collaborative learning component to the class structured around Ranking Task exercises helped students achieve statistically significant gains - with post-Ranking Task scores over the eight astronomy topic rising to 77%. Interestingly, we found that the normalized gain from the Ranking Tasks was equal to the entire previous gain from traditional instruction. Further analysis of the data revealed that Ranking Tasks equally benefited both genders; they also equally benefited both high and low-scoring median groups based on their pretest scores. Based on these results, we conclude that adding collaborative Ranking Task exercises to traditional lecture-based instruction can significantly improve student conceptual understanding of core topics in astronomy.

  12. The Stability of Rankings Derived from Composite Indicators: Analysis of the "IL Sole 24 Ore" Quality of Life Report

    ERIC Educational Resources Information Center

    Lun, G.; Holzer, D.; Tappeiner, G.; Tappeiner, U.

    2006-01-01

    The calculation of composite indicators and the derivation of respective rankings is a common method used to benchmark countries or regions. However, although the statistical robustness of these rankings is often criticised, they often still spark off heated political debate. Here, we assess the sensitivity of the province ranking published by the…

  13. Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

    PubMed

    Fu, Liya; Wang, You-Gan

    2011-02-15

    Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.

  14. Pearson's chi-square test and rank correlation inferences for clustered data.

    PubMed

    Shih, Joanna H; Fay, Michael P

    2017-09-01

    Pearson's chi-square test has been widely used in testing for association between two categorical responses. Spearman rank correlation and Kendall's tau are often used for measuring and testing association between two continuous or ordered categorical responses. However, the established statistical properties of these tests are only valid when each pair of responses are independent, where each sampling unit has only one pair of responses. When each sampling unit consists of a cluster of paired responses, the assumption of independent pairs is violated. In this article, we apply the within-cluster resampling technique to U-statistics to form new tests and rank-based correlation estimators for possibly tied clustered data. We develop large sample properties of the new proposed tests and estimators and evaluate their performance by simulations. The proposed methods are applied to a data set collected from a PET/CT imaging study for illustration. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  15. A comparison of different statistical methods analyzing hypoglycemia data using bootstrap simulations.

    PubMed

    Jiang, Honghua; Ni, Xiao; Huster, William; Heilmann, Cory

    2015-01-01

    Hypoglycemia has long been recognized as a major barrier to achieving normoglycemia with intensive diabetic therapies. It is a common safety concern for the diabetes patients. Therefore, it is important to apply appropriate statistical methods when analyzing hypoglycemia data. Here, we carried out bootstrap simulations to investigate the performance of the four commonly used statistical models (Poisson, negative binomial, analysis of covariance [ANCOVA], and rank ANCOVA) based on the data from a diabetes clinical trial. Zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model were also evaluated. Simulation results showed that Poisson model inflated type I error, while negative binomial model was overly conservative. However, after adjusting for dispersion, both Poisson and negative binomial models yielded slightly inflated type I errors, which were close to the nominal level and reasonable power. Reasonable control of type I error was associated with ANCOVA model. Rank ANCOVA model was associated with the greatest power and with reasonable control of type I error. Inflated type I error was observed with ZIP and ZINB models.

  16. Ranking metrics in gene set enrichment analysis: do they matter?

    PubMed

    Zyla, Joanna; Marczyk, Michal; Weiner, January; Polanska, Joanna

    2017-05-12

    There exist many methods for describing the complex relation between changes of gene expression in molecular pathways or gene ontologies under different experimental conditions. Among them, Gene Set Enrichment Analysis seems to be one of the most commonly used (over 10,000 citations). An important parameter, which could affect the final result, is the choice of a metric for the ranking of genes. Applying a default ranking metric may lead to poor results. In this work 28 benchmark data sets were used to evaluate the sensitivity and false positive rate of gene set analysis for 16 different ranking metrics including new proposals. Furthermore, the robustness of the chosen methods to sample size was tested. Using k-means clustering algorithm a group of four metrics with the highest performance in terms of overall sensitivity, overall false positive rate and computational load was established i.e. absolute value of Moderated Welch Test statistic, Minimum Significant Difference, absolute value of Signal-To-Noise ratio and Baumgartner-Weiss-Schindler test statistic. In case of false positive rate estimation, all selected ranking metrics were robust with respect to sample size. In case of sensitivity, the absolute value of Moderated Welch Test statistic and absolute value of Signal-To-Noise ratio gave stable results, while Baumgartner-Weiss-Schindler and Minimum Significant Difference showed better results for larger sample size. Finally, the Gene Set Enrichment Analysis method with all tested ranking metrics was parallelised and implemented in MATLAB, and is available at https://github.com/ZAEDPolSl/MrGSEA . Choosing a ranking metric in Gene Set Enrichment Analysis has critical impact on results of pathway enrichment analysis. The absolute value of Moderated Welch Test has the best overall sensitivity and Minimum Significant Difference has the best overall specificity of gene set analysis. When the number of non-normally distributed genes is high, using Baumgartner-Weiss-Schindler test statistic gives better outcomes. Also, it finds more enriched pathways than other tested metrics, which may induce new biological discoveries.

  17. Identification of reliable gridded reference data for statistical downscaling methods in Alberta

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Gupta, A.

    2017-12-01

    Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system was developed to allow users to easily select the most reliable reference climate data at each target point based on the elevation of grid cell. By constructing the best combination of reference data for the study domain, the accurate and reliable statistically downscaled climate projections could be significantly improved.

  18. Groupwise registration of MR brain images with tumors.

    PubMed

    Tang, Zhenyu; Wu, Yihong; Fan, Yong

    2017-08-04

    A novel groupwise image registration framework is developed for registering MR brain images with tumors. Our method iteratively estimates a normal-appearance counterpart for each tumor image to be registered and constructs a directed graph (digraph) of normal-appearance images to guide the groupwise image registration. Particularly, our method maps each tumor image to its normal appearance counterpart by identifying and inpainting brain tumor regions with intensity information estimated using a low-rank plus sparse matrix decomposition based image representation technique. The estimated normal-appearance images are groupwisely registered to a group center image guided by a digraph of images so that the total length of 'image registration paths' to be the minimum, and then the original tumor images are warped to the group center image using the resulting deformation fields. We have evaluated our method based on both simulated and real MR brain tumor images. The registration results were evaluated with overlap measures of corresponding brain regions and average entropy of image intensity information, and Wilcoxon signed rank tests were adopted to compare different methods with respect to their regional overlap measures. Compared with a groupwise image registration method that is applied to normal-appearance images estimated using the traditional low-rank plus sparse matrix decomposition based image inpainting, our method achieved higher image registration accuracy with statistical significance (p  =  7.02  ×  10 -9 ).

  19. Assessment of the relationship between the output of the educational systems and the assumed effective factors in Medical Education written in Data Banks and Ranking of Iran Medical Faculties book

    PubMed Central

    Mishmast Nehy, GhA

    2015-01-01

    Developing and expanding the universities and increasing the admission of medical students did resolve the physician shortage, but it brought down the educational quality in return. To face this problem, the administrates needed to promote the quality of education which in turn needed accurate up to date information about conditions in different universities. Information about these issues was collected by the Medical Education Council Secretariat and finally published as the Data Bank and Ranking of the Medical Faculties. Method: Although nowadays ranking is more qualitative rather than quantitative, the above ranking was done by a statistical method. In this research, the intended statistic population consisted of the data included in the database and the ranking of all 38 medical faculties. To perform this research, the ranking of faculties in the comprehensive entrance exam which indicated the input of educational system was considered the index at first, and later, the ranking of the faculties in the effective factors in education, was arranged according to the regulation of the input system; then outputs of the educational system were adjusted according to the input system and finally a comprehensive table of all the educational information was provided. Then, the relationship of various factors in education with outputs of educational system were discussed. Result: The correlations of each and all factors, which have an effective part on education were considered separately, collectively, and together, based on the information of the above book. No connection was detected within the factors, which affected the education and the output in different universities. The only relation notable was the admission degree and the outcomes of the national basic science exams. Since no meaningful connection was found within the present parameters, it seemed to be wrong to follow the path that the other sections of the world have taken in choosing the ranking factors. PMID:28316660

  20. Assessment of the relationship between the output of the educational systems and the assumed effective factors in Medical Education written in Data Banks and Ranking of Iran Medical Faculties book.

    PubMed

    Mishmast Nehy, GhA

    2015-01-01

    Developing and expanding the universities and increasing the admission of medical students did resolve the physician shortage, but it brought down the educational quality in return. To face this problem, the administrates needed to promote the quality of education which in turn needed accurate up to date information about conditions in different universities. Information about these issues was collected by the Medical Education Council Secretariat and finally published as the Data Bank and Ranking of the Medical Faculties. Method: Although nowadays ranking is more qualitative rather than quantitative, the above ranking was done by a statistical method. In this research, the intended statistic population consisted of the data included in the database and the ranking of all 38 medical faculties. To perform this research, the ranking of faculties in the comprehensive entrance exam which indicated the input of educational system was considered the index at first, and later, the ranking of the faculties in the effective factors in education, was arranged according to the regulation of the input system; then outputs of the educational system were adjusted according to the input system and finally a comprehensive table of all the educational information was provided. Then, the relationship of various factors in education with outputs of educational system were discussed. Result: The correlations of each and all factors, which have an effective part on education were considered separately, collectively, and together, based on the information of the above book. No connection was detected within the factors, which affected the education and the output in different universities. The only relation notable was the admission degree and the outcomes of the national basic science exams. Since no meaningful connection was found within the present parameters, it seemed to be wrong to follow the path that the other sections of the world have taken in choosing the ranking factors.

  1. An Automated Energy Detection Algorithm Based on Consecutive Mean Excision

    DTIC Science & Technology

    2018-01-01

    present in the RF spectrum. 15. SUBJECT TERMS RF spectrum, detection threshold algorithm, consecutive mean excision, rank order filter , statistical...Median 4 3.1.9 Rank Order Filter (ROF) 4 3.1.10 Crest Factor (CF) 5 3.2 Statistical Summary 6 4. Algorithm 7 5. Conclusion 8 6. References 9...energy detection algorithm based on morphological filter processing with a semi- disk structure. Adelphi (MD): Army Research Laboratory (US); 2018 Jan

  2. A network-based method to evaluate quality of reproducibility of differential expression in cancer genomics studies

    PubMed Central

    Geng, Haijiang; Li, Zhihui; Li, Jiabing; Lu, Tao; Yan, Fangrong

    2015-01-01

    BACKGROUND Personalized cancer treatments depend on the determination of a patient's genetic status according to known genetic profiles for which targeted treatments exist. Such genetic profiles must be scientifically validated before they is applied to general patient population. Reproducibility of findings that support such genetic profiles is a fundamental challenge in validation studies. The percentage of overlapping genes (POG) criterion and derivative methods produce unstable and misleading results. Furthermore, in a complex disease, comparisons between different tumor subtypes can produce high POG scores that do not capture the consistencies in the functions. RESULTS We focused on the quality rather than the quantity of the overlapping genes. We defined the rank value of each gene according to importance or quality by PageRank on basis of a particular topological structure. Then, we used the p-value of the rank-sum of the overlapping genes (PRSOG) to evaluate the quality of reproducibility. Though the POG scores were low in different studies of the same disease, the PRSOG was statistically significant, which suggests that sets of differentially expressed genes might be highly reproducible. CONCLUSIONS Evaluations of eight datasets from breast cancer, lung cancer and four other disorders indicate that quality-based PRSOG method performs better than a quantity-based method. Our analysis of the components of the sets of overlapping genes supports the utility of the PRSOG method. PMID:26556852

  3. Socially transmitted diffusion of a novel behaviour from subordinate chimpanzees

    PubMed Central

    Watson, Stuart K; Reamer, Lisa A; Mareno, Mary Catherine; Vale, Gillian; Harrison, Rachel A; Lambeth, Susan P; Schapiro, Steven J; Whiten, Andrew

    2017-01-01

    Chimpanzees (Pan troglodytes) demonstrate much cultural diversity in the wild, yet a majority of novel behaviours do not become group-wide traditions. Since many such novel behaviours are introduced by low-ranking individuals, a bias toward copying dominant individuals (‘rank-bias’) has been proposed as an explanation for their limited diffusion. Previous experimental work showed that chimpanzees (Pan troglodytes) preferentially copy dominant over low-rank models. We investigated whether low ranking individuals may nevertheless successfully seed a beneficial behaviour as a tradition if there are no ‘competing’ models. In each of four captive groups, either a single high-rank (HR, n=2) or a low-rank (LR, n=2) chimpanzee model was trained on one method of opening a two-action puzzle-box, before demonstrating the trained method in a group context. This was followed by eight hours of group-wide, open-access to the puzzle-box. Successful manipulations and observers of each manipulation were recorded. Barnard’s exact tests showed that individuals in the LR groups used the seeded method as their first-choice option at significantly above chance levels, whereas those in the HR groups did not. Furthermore, individuals in the LR condition used the seeded method on their first attempt significantly more often than those in the HR condition. A network-based diffusion analysis revealed that the best supported statistical models were those in which social transmission occurred only in groups with subordinate models. Finally, we report an innovation by a subordinate individual that built cumulatively on existing methods of opening the puzzle-box and was subsequently copied by a dominant observer. These findings illustrate that chimpanzees are motivated to copy rewarding novel behaviours that are demonstrated by subordinate individuals and that, in some cases, social transmission may be constrained by high-rank demonstrators. PMID:28171684

  4. What is a species? A new universal method to measure differentiation and assess the taxonomic rank of allopatric populations, using continuous variables

    PubMed Central

    Donegan, Thomas M.

    2018-01-01

    Abstract Existing models for assigning species, subspecies, or no taxonomic rank to populations which are geographically separated from one another were analyzed. This was done by subjecting over 3,000 pairwise comparisons of vocal or biometric data based on birds to a variety of statistical tests that have been proposed as measures of differentiation. One current model which aims to test diagnosability (Isler et al. 1998) is highly conservative, applying a hard cut-off, which excludes from consideration differentiation below diagnosis. It also includes non-overlap as a requirement, a measure which penalizes increases to sample size. The “species scoring” model of Tobias et al. (2010) involves less drastic cut-offs, but unlike Isler et al. (1998), does not control adequately for sample size and attributes scores in many cases to differentiation which is not statistically significant. Four different models of assessing effect sizes were analyzed: using both pooled and unpooled standard deviations and controlling for sample size using t-distributions or omitting to do so. Pooled standard deviations produced more conservative effect sizes when uncontrolled for sample size but less conservative effect sizes when so controlled. Pooled models require assumptions to be made that are typically elusive or unsupported for taxonomic studies. Modifications to improving these frameworks are proposed, including: (i) introducing statistical significance as a gateway to attributing any weighting to findings of differentiation; (ii) abandoning non-overlap as a test; (iii) recalibrating Tobias et al. (2010) scores based on effect sizes controlled for sample size using t-distributions. A new universal method is proposed for measuring differentiation in taxonomy using continuous variables and a formula is proposed for ranking allopatric populations. This is based first on calculating effect sizes using unpooled standard deviations, controlled for sample size using t-distributions, for a series of different variables. All non-significant results are excluded by scoring them as zero. Distance between any two populations is calculated using Euclidian summation of non-zeroed effect size scores. If the score of an allopatric pair exceeds that of a related sympatric pair, then the allopatric population can be ranked as species and, if not, then at most subspecies rank should be assigned. A spreadsheet has been programmed and is being made available which allows this and other tests of differentiation and rank studied in this paper to be rapidly analyzed. PMID:29780266

  5. Likelihoods for fixed rank nomination networks

    PubMed Central

    HOFF, PETER; FOSDICK, BAILEY; VOLFOVSKY, ALEX; STOVEL, KATHERINE

    2014-01-01

    Many studies that gather social network data use survey methods that lead to censored, missing, or otherwise incomplete information. For example, the popular fixed rank nomination (FRN) scheme, often used in studies of schools and businesses, asks study participants to nominate and rank at most a small number of contacts or friends, leaving the existence of other relations uncertain. However, most statistical models are formulated in terms of completely observed binary networks. Statistical analyses of FRN data with such models ignore the censored and ranked nature of the data and could potentially result in misleading statistical inference. To investigate this possibility, we compare Bayesian parameter estimates obtained from a likelihood for complete binary networks with those obtained from likelihoods that are derived from the FRN scheme, and therefore accommodate the ranked and censored nature of the data. We show analytically and via simulation that the binary likelihood can provide misleading inference, particularly for certain model parameters that relate network ties to characteristics of individuals and pairs of individuals. We also compare these different likelihoods in a data analysis of several adolescent social networks. For some of these networks, the parameter estimates from the binary and FRN likelihoods lead to different conclusions, indicating the importance of analyzing FRN data with a method that accounts for the FRN survey design. PMID:25110586

  6. The New "LJ" Index

    ERIC Educational Resources Information Center

    Lance, Keith Curry; Lyons, Ray

    2008-01-01

    As published critics of Hennen's American Public Library Ratings (HAPLR), the authors propose a new ranking system that focuses more transparently on ranking libraries based on their performance. These annual rankings are intended to contribute to self-evaluation and peer comparison, prompt questions about the statistics and how to improve them,…

  7. A Statistical Ontology-Based Approach to Ranking for Multiword Search

    ERIC Educational Resources Information Center

    Kim, Jinwoo

    2013-01-01

    Keyword search is a prominent data retrieval method for the Web, largely because the simple and efficient nature of keyword processing allows a large amount of information to be searched with fast response. However, keyword search approaches do not formally capture the clear meaning of a keyword query and fail to address the semantic relationships…

  8. Accounting for Uncertainty in Decision Analytic Models Using Rank Preserving Structural Failure Time Modeling: Application to Parametric Survival Models.

    PubMed

    Bennett, Iain; Paracha, Noman; Abrams, Keith; Ray, Joshua

    2018-01-01

    Rank Preserving Structural Failure Time models are one of the most commonly used statistical methods to adjust for treatment switching in oncology clinical trials. The method is often applied in a decision analytic model without appropriately accounting for additional uncertainty when determining the allocation of health care resources. The aim of the study is to describe novel approaches to adequately account for uncertainty when using a Rank Preserving Structural Failure Time model in a decision analytic model. Using two examples, we tested and compared the performance of the novel Test-based method with the resampling bootstrap method and with the conventional approach of no adjustment. In the first example, we simulated life expectancy using a simple decision analytic model based on a hypothetical oncology trial with treatment switching. In the second example, we applied the adjustment method on published data when no individual patient data were available. Mean estimates of overall and incremental life expectancy were similar across methods. However, the bootstrapped and test-based estimates consistently produced greater estimates of uncertainty compared with the estimate without any adjustment applied. Similar results were observed when using the test based approach on a published data showing that failing to adjust for uncertainty led to smaller confidence intervals. Both the bootstrapping and test-based approaches provide a solution to appropriately incorporate uncertainty, with the benefit that the latter can implemented by researchers in the absence of individual patient data. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  9. Application of meta-analysis methods for identifying proteomic expression level differences.

    PubMed

    Amess, Bob; Kluge, Wolfgang; Schwarz, Emanuel; Haenisch, Frieder; Alsaif, Murtada; Yolken, Robert H; Leweke, F Markus; Guest, Paul C; Bahn, Sabine

    2013-07-01

    We present new statistical approaches for identification of proteins with expression levels that are significantly changed when applying meta-analysis to two or more independent experiments. We showed that the Euclidean distance measure has reduced risk of false positives compared to the rank product method. Our Ψ-ranking method has advantages over the traditional fold-change approach by incorporating both the fold-change direction as well as the p-value. In addition, the second novel method, Π-ranking, considers the ratio of the fold-change and thus integrates all three parameters. We further improved the latter by introducing our third technique, Σ-ranking, which combines all three parameters in a balanced nonparametric approach. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles

    PubMed Central

    2011-01-01

    Background Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Results Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. Conclusions In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches. PMID:21342534

  11. Dynamic programming re-ranking for PPI interactor and pair extraction in full-text articles.

    PubMed

    Tsai, Richard Tzong-Han; Lai, Po-Ting

    2011-02-23

    Experimentally verified protein-protein interactions (PPIs) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be facilitated by employing text-mining systems to identify genes which play the interactor role in PPIs and to map these genes to unique database identifiers (interactor normalization task or INT) and then to return a list of interaction pairs for each article (interaction pair task or IPT). These two tasks are evaluated in terms of the area under curve of the interpolated precision/recall (AUC iP/R) score because the order of identifiers in the output list is important for ease of curation. Our INT system developed for the BioCreAtIvE II.5 INT challenge achieved a promising AUC iP/R of 43.5% by using a support vector machine (SVM)-based ranking procedure. Using our new re-ranking algorithm, we have been able to improve system performance (AUC iP/R) by 1.84%. Our experimental results also show that with the re-ranked INT results, our unsupervised IPT system can achieve a competitive AUC iP/R of 23.86%, which outperforms the best BC II.5 INT system by 1.64%. Compared to using only SVM ranked INT results, using re-ranked INT results boosts AUC iP/R by 7.84%. Statistical significance t-test results show that our INT/IPT system with re-ranking outperforms that without re-ranking by a statistically significant difference. In this paper, we present a new re-ranking algorithm that considers co-occurrence among identifiers in an article to improve INT and IPT ranking results. Combining the re-ranked INT results with an unsupervised approach to find associations among interactors, the proposed method can boost the IPT performance. We also implement score computation using dynamic programming, which is faster and more efficient than traditional approaches.

  12. Cross ranking of cities and regions: population versus income

    NASA Astrophysics Data System (ADS)

    Cerqueti, Roy; Ausloos, Marcel

    2015-07-01

    This paper explores the relationship between the inner economical structure of communities and their population distribution through a rank-rank analysis of official data, along statistical physics ideas within two techniques. The data is taken on Italian cities. The analysis is performed both at a global (national) and at a more local (regional) level in order to distinguish ‘macro’ and ‘micro’ aspects. First, the rank-size rule is found not to be a standard power law, as in many other studies, but a doubly decreasing power law. Next, the Kendall τ and the Spearman ρ rank correlation coefficients which measure pair concordance and the correlation between fluctuations in two rankings, respectively,—as a correlation function does in thermodynamics, are calculated for finding rank correlation (if any) between demography and wealth. Results show non only global disparities for the whole (country) set, but also (regional) disparities, when comparing the number of cities in regions, the number of inhabitants in cities and that in regions, as well as when comparing the aggregated tax income of the cities and that of regions. Different outliers are pointed out and justified. Interestingly, two classes of cities in the country and two classes of regions in the country are found. ‘Common sense’ social, political, and economic considerations sustain the findings. More importantly, the methods show that they allow to distinguish communities, very clearly, when specific criteria are numerically sound. A specific modeling for the findings is presented, i.e. for the doubly decreasing power law and the two phase system, based on statistics theory, e.g. urn filling. The model ideas can be expected to hold when similar rank relationship features are observed in fields. It is emphasized that the analysis makes more sense than one through a Pearson Π value-value correlation analysis

  13. Effect of different mixing methods on the bacterial microleakage of calcium-enriched mixture cement.

    PubMed

    Shahi, Shahriar; Jeddi Khajeh, Soniya; Rahimi, Saeed; Yavari, Hamid R; Jafari, Farnaz; Samiei, Mohammad; Ghasemi, Negin; Milani, Amin S

    2016-10-01

    Calcium-enriched mixture (CEM) cement is used in the field of endodontics. It is similar to mineral trioxide aggregate in its main ingredients. The present study investigated the effect of different mixing methods on the bacterial microleakage of CEM cement. A total of 55 human single-rooted human permanent teeth were decoronated so that 14-mm-long samples were obtained and obturated with AH26 sealer and gutta-percha using lateral condensation technique. Three millimeters of the root end were cut off and randomly divided into 3 groups of 15 each (3 mixing methods of amalgamator, ultrasonic and conventional) and 2 negative and positive control groups (each containing 5 samples). BHI (brain-heart infusion agar) suspension containing Enterococcus faecalis was used for bacterial leakage assessment. Statistical analysis was carried out using descriptive statistics, Kaplan-Meier survival analysis with censored data and log rank test. Statistical significance was set at P<0.05. The survival means for conventional, amalgamator and ultrasonic methods were 62.13±12.44, 68.87±12.79 and 77.53±12.52 days, respectively. The log rank test showed no significant differences between the groups. Based on the results of the present study it can be concluded that different mixing methods had no significant effect on the bacterial microleakage of CEM cement.

  14. THE SCREENING AND RANKING ALGORITHM FOR CHANGE-POINTS DETECTION IN MULTIPLE SAMPLES

    PubMed Central

    Song, Chi; Min, Xiaoyi; Zhang, Heping

    2016-01-01

    The chromosome copy number variation (CNV) is the deviation of genomic regions from their normal copy number states, which may associate with many human diseases. Current genetic studies usually collect hundreds to thousands of samples to study the association between CNV and diseases. CNVs can be called by detecting the change-points in mean for sequences of array-based intensity measurements. Although multiple samples are of interest, the majority of the available CNV calling methods are single sample based. Only a few multiple sample methods have been proposed using scan statistics that are computationally intensive and designed toward either common or rare change-points detection. In this paper, we propose a novel multiple sample method by adaptively combining the scan statistic of the screening and ranking algorithm (SaRa), which is computationally efficient and is able to detect both common and rare change-points. We prove that asymptotically this method can find the true change-points with almost certainty and show in theory that multiple sample methods are superior to single sample methods when shared change-points are of interest. Additionally, we report extensive simulation studies to examine the performance of our proposed method. Finally, using our proposed method as well as two competing approaches, we attempt to detect CNVs in the data from the Primary Open-Angle Glaucoma Genes and Environment study, and conclude that our method is faster and requires less information while our ability to detect the CNVs is comparable or better. PMID:28090239

  15. Exact p-values for pairwise comparison of Friedman rank sums, with application to comparing classifiers.

    PubMed

    Eisinga, Rob; Heskes, Tom; Pelzer, Ben; Te Grotenhuis, Manfred

    2017-01-25

    The Friedman rank sum test is a widely-used nonparametric method in computational biology. In addition to examining the overall null hypothesis of no significant difference among any of the rank sums, it is typically of interest to conduct pairwise comparison tests. Current approaches to such tests rely on large-sample approximations, due to the numerical complexity of computing the exact distribution. These approximate methods lead to inaccurate estimates in the tail of the distribution, which is most relevant for p-value calculation. We propose an efficient, combinatorial exact approach for calculating the probability mass distribution of the rank sum difference statistic for pairwise comparison of Friedman rank sums, and compare exact results with recommended asymptotic approximations. Whereas the chi-squared approximation performs inferiorly to exact computation overall, others, particularly the normal, perform well, except for the extreme tail. Hence exact calculation offers an improvement when small p-values occur following multiple testing correction. Exact inference also enhances the identification of significant differences whenever the observed values are close to the approximate critical value. We illustrate the proposed method in the context of biological machine learning, were Friedman rank sum difference tests are commonly used for the comparison of classifiers over multiple datasets. We provide a computationally fast method to determine the exact p-value of the absolute rank sum difference of a pair of Friedman rank sums, making asymptotic tests obsolete. Calculation of exact p-values is easy to implement in statistical software and the implementation in R is provided in one of the Additional files and is also available at http://www.ru.nl/publish/pages/726696/friedmanrsd.zip .

  16. The Biomarker-Surrogacy Evaluation Schema: a review of the biomarker-surrogate literature and a proposal for a criterion-based, quantitative, multidimensional hierarchical levels of evidence schema for evaluating the status of biomarkers as surrogate endpoints.

    PubMed

    Lassere, Marissa N

    2008-06-01

    There are clear advantages to using biomarkers and surrogate endpoints, but concerns about clinical and statistical validity and systematic methods to evaluate these aspects hinder their efficient application. Section 2 is a systematic, historical review of the biomarker-surrogate endpoint literature with special reference to the nomenclature, the systems of classification and statistical methods developed for their evaluation. In Section 3 an explicit, criterion-based, quantitative, multidimensional hierarchical levels of evidence schema - Biomarker-Surrogacy Evaluation Schema - is proposed to evaluate and co-ordinate the multiple dimensions (biological, epidemiological, statistical, clinical trial and risk-benefit evidence) of the biomarker clinical endpoint relationships. The schema systematically evaluates and ranks the surrogacy status of biomarkers and surrogate endpoints using defined levels of evidence. The schema incorporates the three independent domains: Study Design, Target Outcome and Statistical Evaluation. Each domain has items ranked from zero to five. An additional category called Penalties incorporates additional considerations of biological plausibility, risk-benefit and generalizability. The total score (0-15) determines the level of evidence, with Level 1 the strongest and Level 5 the weakest. The term ;surrogate' is restricted to markers attaining Levels 1 or 2 only. Surrogacy status of markers can then be directly compared within and across different areas of medicine to guide individual, trial-based or drug-development decisions. This schema would facilitate communication between clinical, researcher, regulatory, industry and consumer participants necessary for evaluation of the biomarker-surrogate-clinical endpoint relationship in their different settings.

  17. Deaths: Leading Causes for 2015.

    PubMed

    Heron, Melonie

    2017-11-01

    Objectives-This report presents final 2015 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements "Deaths: Final Data for 2015," the National Center for Health Statistics' annual report of final mortality statistics. Methods-Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2015. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. Results-In 2015, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Accidents (unintentional injuries); Cerebrovascular diseases; Alzheimer's disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). They accounted for 74% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2015 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

  18. The Marketing of Canadian University Rankings: A Misadventure Now 24 Years Old

    ERIC Educational Resources Information Center

    Cramer, Kenneth M.; Page, Stewart; Burrows, Vanessa; Lamoureux, Chastine; Mackay, Sarah; Pedri, Victoria; Pschibul, Rebecca

    2016-01-01

    Based on analyses of Maclean's ranking data pertaining to Canadian universities published over the last 24 years, we present a summary of statistical findings of annual ranking exercises, as well as discussion about their current status and the effects upon student welfare. Some illustrative tables are also presented. Using correlational and…

  19. Rank Dynamics of Word Usage at Multiple Scales

    NASA Astrophysics Data System (ADS)

    Morales, José A.; Colman, Ewan; Sánchez, Sergio; Sánchez-Puig, Fernanda; Pineda, Carlos; Iñiguez, Gerardo; Cocho, Germinal; Flores, Jorge; Gershenson, Carlos

    2018-05-01

    The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore whether word use is similar across languages, and if so, whether these generic features appear at different scales of language structure. Here we use the Google Books N-grams dataset to analyze the temporal evolution of word usage in several languages. We apply measures proposed recently to study rank dynamics, such as the diversity of N-grams in a given rank, the probability that an N-gram changes rank between successive time intervals, the rank entropy, and the rank complexity. Using different methods, results show that there are generic properties for different languages at different scales, such as a core of words necessary to minimally understand a language. We also propose a null model to explore the relevance of linguistic structure across multiple scales, concluding that N-gram statistics cannot be reduced to word statistics. We expect our results to be useful in improving text prediction algorithms, as well as in shedding light on the large-scale features of language use, beyond linguistic and cultural differences across human populations.

  20. Ranked solutions to a class of combinatorial optimizations—with applications in mass spectrometry based peptide sequencing and a variant of directed paths in random media

    NASA Astrophysics Data System (ADS)

    Doerr, Timothy P.; Alves, Gelio; Yu, Yi-Kuo

    2005-08-01

    Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions-find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks-peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.

  1. Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

    PubMed

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V; Robles, Montserrat; Aparici, F; Martí-Bonmatí, L; García-Gómez, Juan M

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation.

  2. A hybrid method in combining treatment effects from matched and unmatched studies.

    PubMed

    Byun, Jinyoung; Lai, Dejian; Luo, Sheng; Risser, Jan; Tung, Betty; Hardy, Robert J

    2013-12-10

    The most common data structures in the biomedical studies have been matched or unmatched designs. Data structures resulting from a hybrid of the two may create challenges for statistical inferences. The question may arise whether to use parametric or nonparametric methods on the hybrid data structure. The Early Treatment for Retinopathy of Prematurity study was a multicenter clinical trial sponsored by the National Eye Institute. The design produced data requiring a statistical method of a hybrid nature. An infant in this multicenter randomized clinical trial had high-risk prethreshold retinopathy of prematurity that was eligible for treatment in one or both eyes at entry into the trial. During follow-up, recognition visual acuity was accessed for both eyes. Data from both eyes (matched) and from only one eye (unmatched) were eligible to be used in the trial. The new hybrid nonparametric method is a meta-analysis based on combining the Hodges-Lehmann estimates of treatment effects from the Wilcoxon signed rank and rank sum tests. To compare the new method, we used the classic meta-analysis with the t-test method to combine estimates of treatment effects from the paired and two sample t-tests. We used simulations to calculate the empirical size and power of the test statistics, as well as the bias, mean square and confidence interval width of the corresponding estimators. The proposed method provides an effective tool to evaluate data from clinical trials and similar comparative studies. Copyright © 2013 John Wiley & Sons, Ltd.

  3. Fluorescence Excitation Spectroscopy for Phytoplankton Species Classification Using an All-Pairs Method: Characterization of a System with Unexpectedly Low Rank.

    PubMed

    Rekully, Cameron M; Faulkner, Stefan T; Lachenmyer, Eric M; Cunningham, Brady R; Shaw, Timothy J; Richardson, Tammi L; Myrick, Michael L

    2018-03-01

    An all-pairs method is used to analyze phytoplankton fluorescence excitation spectra. An initial set of nine phytoplankton species is analyzed in pairwise fashion to select two optical filter sets, and then the two filter sets are used to explore variations among a total of 31 species in a single-cell fluorescence imaging photometer. Results are presented in terms of pair analyses; we report that 411 of the 465 possible pairings of the larger group of 31 species can be distinguished using the initial nine-species-based selection of optical filters. A bootstrap analysis based on the larger data set shows that the distribution of possible pair separation results based on a randomly selected nine-species initial calibration set is strongly peaked in the 410-415 pair separation range, consistent with our experimental result. Further, the result for filter selection using all 31 species is also 411 pair separations; The set of phytoplankton fluorescence excitation spectra is intuitively high in rank due to the number and variety of pigments that contribute to the spectrum. However, the results in this report are consistent with an effective rank as determined by a variety of heuristic and statistical methods in the range of 2-3. These results are reviewed in consideration of how consistent the filter selections are from model to model for the data presented here. We discuss the common observation that rank is generally found to be relatively low even in many seemingly complex circumstances, so that it may be productive to assume a low rank from the beginning. If a low-rank hypothesis is valid, then relatively few samples are needed to explore an experimental space. Under very restricted circumstances for uniformly distributed samples, the minimum number for an initial analysis might be as low as 8-11 random samples for 1-3 factors.

  4. [Study on commercial specification of atractylodes based on Delphi method].

    PubMed

    Wang, Hao; Chen, Li-Xiao; Huang, Lu-Qi; Zhang, Tian-Tian; Li, Ying; Zheng, Yu-Guang

    2016-03-01

    This research adopts "Delphi method" to evaluate atractylodes traditional traits and rank correlation. By using methods of mathematical statistics the relationship of the traditional identification indicators and atractylodes goods rank correlation was analyzed, It is found that the main characteristics affectingatractylodes commodity specifications and grades of main characters wereoil points of transaction,color of transaction,color of surface,grain of transaction,texture of transaction andspoilage. The study points out that the original "seventy-six kinds of medicinal materials commodity specification standards of atractylodes differentiate commodity specification" is not in conformity with the actual market situation, we need to formulate corresponding atractylodes medicinal products specifications and grades.This study combined with experimental results "Delphi method" and the market actual situation, proposed the new draft atractylodes commodity specifications and grades, as the new atractylodes commodity specifications and grades standards. It provides a reference and theoretical basis. Copyright© by the Chinese Pharmaceutical Association.

  5. 75 FR 72611 - Assessments, Large Bank Pricing

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-11-24

    ... the worst risk ranking and are included in the statistical analysis. Appendix 1 to the NPR describes the statistical analysis in detail. \\12\\ The percentage approximated by factors is based on the statistical model for that particual year. Actual weights assigned to each scorecard measure are largely based...

  6. A comparison of ensemble post-processing approaches that preserve correlation structures

    NASA Astrophysics Data System (ADS)

    Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane

    2016-04-01

    Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.

  7. Health Information on Internet: Quality, Importance, and Popularity of Persian Health Websites

    PubMed Central

    Samadbeik, Mahnaz; Ahmadi, Maryam; Mohammadi, Ali; Mohseni Saravi, Beniamin

    2014-01-01

    Background: The Internet has provided great opportunities for disseminating both accurate and inaccurate health information. Therefore, the quality of information is considered as a widespread concern affecting the human life. Despite the increasingly substantial growth in the number of users, Persian health websites and the proportion of internet-using patients, little is known about the quality of Persian medical and health websites. Objectives: The current study aimed to first assess the quality, popularity and importance of websites providing Persian health-related information, and second to evaluate the correlation of the popularity and importance ranking with quality score on the Internet. Materials and Methods: The sample websites were identified by entering the health-related keywords into four most popular search engines of Iranian users based on the Alexa ranking at the time of study. Each selected website was assessed using three qualified tools including the Bomba and Land Index, Google PageRank and the Alexa ranking. Results: The evaluated sites characteristics (ownership structure, database, scope and objective) really did not have an effect on the Alexa traffic global rank, Alexa traffic rank in Iran, Google PageRank and Bomba total score. Most websites (78.9 percent, n = 56) were in the moderate category (8 ≤ x ≤ 11.99) based on their quality levels. There was no statistically significant association between Google PageRank with Bomba index variables and Alexa traffic global rank (P > 0.05). Conclusions: The Persian health websites had better Bomba quality scores in availability and usability guidelines as compared to other guidelines. The Google PageRank did not properly reflect the real quality of evaluated websites and Internet users seeking online health information should not merely rely on it for any kind of prejudgment regarding Persian health websites. However, they can use Iran Alexa rank as a primary filtering tool of these websites. Therefore, designing search engines dedicated to explore accredited Persian health-related Web sites can be an effective method to access high-quality Persian health websites. PMID:24910795

  8. Modeling Area-Level Health Rankings.

    PubMed

    Courtemanche, Charles; Soneji, Samir; Tchernis, Rusty

    2015-10-01

    Rank county health using a Bayesian factor analysis model. Secondary county data from the National Center for Health Statistics (through 2007) and Behavioral Risk Factor Surveillance System (through 2009). Our model builds on the existing county health rankings (CHRs) by using data-derived weights to compute ranks from mortality and morbidity variables, and by quantifying uncertainty based on population, spatial correlation, and missing data. We apply our model to Wisconsin, which has comprehensive data, and Texas, which has substantial missing information. The data were downloaded from www.countyhealthrankings.org. Our estimated rankings are more similar to the CHRs for Wisconsin than Texas, as the data-derived factor weights are closer to the assigned weights for Wisconsin. The correlations between the CHRs and our ranks are 0.89 for Wisconsin and 0.65 for Texas. Uncertainty is especially severe for Texas given the state's substantial missing data. The reliability of comprehensive CHRs varies from state to state. We advise focusing on the counties that remain among the least healthy after incorporating alternate weighting methods and accounting for uncertainty. Our results also highlight the need for broader geographic coverage in health data. © Health Research and Educational Trust.

  9. Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

    NASA Astrophysics Data System (ADS)

    Pernot, Pascal; Savin, Andreas

    2018-06-01

    Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.

  10. Comparison of multianalyte proficiency test results by sum of ranking differences, principal component analysis, and hierarchical cluster analysis.

    PubMed

    Škrbić, Biljana; Héberger, Károly; Durišić-Mladenović, Nataša

    2013-10-01

    Sum of ranking differences (SRD) was applied for comparing multianalyte results obtained by several analytical methods used in one or in different laboratories, i.e., for ranking the overall performances of the methods (or laboratories) in simultaneous determination of the same set of analytes. The data sets for testing of the SRD applicability contained the results reported during one of the proficiency tests (PTs) organized by EU Reference Laboratory for Polycyclic Aromatic Hydrocarbons (EU-RL-PAH). In this way, the SRD was also tested as a discriminant method alternative to existing average performance scores used to compare mutlianalyte PT results. SRD should be used along with the z scores--the most commonly used PT performance statistics. SRD was further developed to handle the same rankings (ties) among laboratories. Two benchmark concentration series were selected as reference: (a) the assigned PAH concentrations (determined precisely beforehand by the EU-RL-PAH) and (b) the averages of all individual PAH concentrations determined by each laboratory. Ranking relative to the assigned values and also to the average (or median) values pointed to the laboratories with the most extreme results, as well as revealed groups of laboratories with similar overall performances. SRD reveals differences between methods or laboratories even if classical test(s) cannot. The ranking was validated using comparison of ranks by random numbers (a randomization test) and using seven folds cross-validation, which highlighted the similarities among the (methods used in) laboratories. Principal component analysis and hierarchical cluster analysis justified the findings based on SRD ranking/grouping. If the PAH-concentrations are row-scaled, (i.e., z scores are analyzed as input for ranking) SRD can still be used for checking the normality of errors. Moreover, cross-validation of SRD on z scores groups the laboratories similarly. The SRD technique is general in nature, i.e., it can be applied to any experimental problem in which multianalyte results obtained either by several analytical procedures, analysts, instruments, or laboratories need to be compared.

  11. A stable systemic risk ranking in China's banking sector: Based on principal component analysis

    NASA Astrophysics Data System (ADS)

    Fang, Libing; Xiao, Binqing; Yu, Honghai; You, Qixing

    2018-02-01

    In this paper, we compare five popular systemic risk rankings, and apply principal component analysis (PCA) model to provide a stable systemic risk ranking for the Chinese banking sector. Our empirical results indicate that five methods suggest vastly different systemic risk rankings for the same bank, while the combined systemic risk measure based on PCA provides a reliable ranking. Furthermore, according to factor loadings of the first component, PCA combined ranking is mainly based on fundamentals instead of market price data. We clearly find that price-based rankings are not as practical a method as fundamentals-based ones. This PCA combined ranking directly shows systemic risk contributions of each bank for banking supervision purpose and reminds banks to prevent and cope with the financial crisis in advance.

  12. Evaluation of a hydrophilic interaction liquid chromatography design space for sugars and sugar alcohols.

    PubMed

    Hetrick, Evan M; Kramer, Timothy T; Risley, Donald S

    2017-03-17

    Based on a column-screening exercise, a column ranking system was developed for sample mixtures containing any combination of 26 sugar and sugar alcohol analytes using 16 polar stationary phases in the HILIC mode with acetonitrile/water or acetone/water mobile phases. Each analyte was evaluated on the HILIC columns with gradient elution and the subsequent chromatography data was compiled into a statistical software package where any subset of the analytes can be selected and the columns are then ranked by the greatest separation. Since these analytes lack chromophores, aerosol-based detectors, including an evaporative light scattering detector (ELSD) and a charged aerosol detector (CAD) were employed for qualitative and quantitative detection. Example qualitative applications are provided to illustrate the practicality and efficiency of this HILIC column ranking. Furthermore, the design-space approach was used as a starting point for a quantitative method for the trace analysis of glucose in trehalose samples in a complex matrix. Knowledge gained from evaluating the design-space led to rapid development of a capable method as demonstrated through validation of the following parameters: specificity, accuracy, precision, linearity, limit of quantitation, limit of detection, and range. Copyright © 2017 Elsevier B.V. All rights reserved.

  13. Rank-based pooling for deep convolutional neural networks.

    PubMed

    Shi, Zenglin; Ye, Yangdong; Wu, Yunpeng

    2016-11-01

    Pooling is a key mechanism in deep convolutional neural networks (CNNs) which helps to achieve translation invariance. Numerous studies, both empirically and theoretically, show that pooling consistently boosts the performance of the CNNs. The conventional pooling methods are operated on activation values. In this work, we alternatively propose rank-based pooling. It is derived from the observations that ranking list is invariant under changes of activation values in a pooling region, and thus rank-based pooling operation may achieve more robust performance. In addition, the reasonable usage of rank can avoid the scale problems encountered by value-based methods. The novel pooling mechanism can be regarded as an instance of weighted pooling where a weighted sum of activations is used to generate the pooling output. This pooling mechanism can also be realized as rank-based average pooling (RAP), rank-based weighted pooling (RWP) and rank-based stochastic pooling (RSP) according to different weighting strategies. As another major contribution, we present a novel criterion to analyze the discriminant ability of various pooling methods, which is heavily under-researched in machine learning and computer vision community. Experimental results on several image benchmarks show that rank-based pooling outperforms the existing pooling methods in classification performance. We further demonstrate better performance on CIFAR datasets by integrating RSP into Network-in-Network. Copyright © 2016 Elsevier Ltd. All rights reserved.

  14. Automated Glioblastoma Segmentation Based on a Multiparametric Structured Unsupervised Classification

    PubMed Central

    Juan-Albarracín, Javier; Fuster-Garcia, Elies; Manjón, José V.; Robles, Montserrat; Aparici, F.; Martí-Bonmatí, L.; García-Gómez, Juan M.

    2015-01-01

    Automatic brain tumour segmentation has become a key component for the future of brain tumour treatment. Currently, most of brain tumour segmentation approaches arise from the supervised learning standpoint, which requires a labelled training dataset from which to infer the models of the classes. The performance of these models is directly determined by the size and quality of the training corpus, whose retrieval becomes a tedious and time-consuming task. On the other hand, unsupervised approaches avoid these limitations but often do not reach comparable results than the supervised methods. In this sense, we propose an automated unsupervised method for brain tumour segmentation based on anatomical Magnetic Resonance (MR) images. Four unsupervised classification algorithms, grouped by their structured or non-structured condition, were evaluated within our pipeline. Considering the non-structured algorithms, we evaluated K-means, Fuzzy K-means and Gaussian Mixture Model (GMM), whereas as structured classification algorithms we evaluated Gaussian Hidden Markov Random Field (GHMRF). An automated postprocess based on a statistical approach supported by tissue probability maps is proposed to automatically identify the tumour classes after the segmentations. We evaluated our brain tumour segmentation method with the public BRAin Tumor Segmentation (BRATS) 2013 Test and Leaderboard datasets. Our approach based on the GMM model improves the results obtained by most of the supervised methods evaluated with the Leaderboard set and reaches the second position in the ranking. Our variant based on the GHMRF achieves the first position in the Test ranking of the unsupervised approaches and the seventh position in the general Test ranking, which confirms the method as a viable alternative for brain tumour segmentation. PMID:25978453

  15. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  16. A Comparison of Didactic and Inquiry Teaching Methods in a Rural Community College Earth Science Course

    NASA Astrophysics Data System (ADS)

    Beam, Margery Elizabeth

    The combination of increasing enrollment and the importance of providing transfer students a solid foundation in science calls for science faculty to evaluate teaching methods in rural community colleges. The purpose of this study was to examine and compare the effectiveness of two teaching methods, inquiry teaching methods and didactic teaching methods, applied in a rural community college earth science course. Two groups of students were taught the same content via inquiry and didactic teaching methods. Analysis of quantitative data included a non-parametric ranking statistical testing method in which the difference between the rankings and the median of the post-test scores was analyzed for significance. Results indicated there was not a significant statistical difference between the teaching methods for the group of students participating in the research. The practical and educational significance of this study provides valuable perspectives on teaching methods and student learning styles in rural community colleges.

  17. A Developed Meta-model for Selection of Cotton Fabrics Using Design of Experiments and TOPSIS Method

    NASA Astrophysics Data System (ADS)

    Chakraborty, Shankar; Chatterjee, Prasenjit

    2017-12-01

    Selection of cotton fabrics for providing optimal clothing comfort is often considered as a multi-criteria decision making problem consisting of an array of candidate alternatives to be evaluated based of several conflicting properties. In this paper, design of experiments and technique for order preference by similarity to ideal solution (TOPSIS) are integrated so as to develop regression meta-models for identifying the most suitable cotton fabrics with respect to the computed TOPSIS scores. The applicability of the adopted method is demonstrated using two real time examples. These developed models can also identify the statistically significant fabric properties and their interactions affecting the measured TOPSIS scores and final selection decisions. There exists good degree of congruence between the ranking patterns as derived using these meta-models and the existing methods for cotton fabric ranking and subsequent selection.

  18. Data-Driven Benchmarking of Building Energy Efficiency Utilizing Statistical Frontier Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kavousian, A; Rajagopal, R

    2014-01-01

    Frontier methods quantify the energy efficiency of buildings by forming an efficient frontier (best-practice technology) and by comparing all buildings against that frontier. Because energy consumption fluctuates over time, the efficiency scores are stochastic random variables. Existing applications of frontier methods in energy efficiency either treat efficiency scores as deterministic values or estimate their uncertainty by resampling from one set of measurements. Availability of smart meter data (repeated measurements of energy consumption of buildings) enables using actual data to estimate the uncertainty in efficiency scores. Additionally, existing applications assume a linear form for an efficient frontier; i.e.,they assume that themore » best-practice technology scales up and down proportionally with building characteristics. However, previous research shows that buildings are nonlinear systems. This paper proposes a statistical method called stochastic energy efficiency frontier (SEEF) to estimate a bias-corrected efficiency score and its confidence intervals from measured data. The paper proposes an algorithm to specify the functional form of the frontier, identify the probability distribution of the efficiency score of each building using measured data, and rank buildings based on their energy efficiency. To illustrate the power of SEEF, this paper presents the results from applying SEEF on a smart meter data set of 307 residential buildings in the United States. SEEF efficiency scores are used to rank individual buildings based on energy efficiency, to compare subpopulations of buildings, and to identify irregular behavior of buildings across different time-of-use periods. SEEF is an improvement to the energy-intensity method (comparing kWh/sq.ft.): whereas SEEF identifies efficient buildings across the entire spectrum of building sizes, the energy-intensity method showed bias toward smaller buildings. The results of this research are expected to assist researchers and practitioners compare and rank (i.e.,benchmark) buildings more robustly and over a wider range of building types and sizes. Eventually, doing so is expected to result in improved resource allocation in energy-efficiency programs.« less

  19. Desirability-based methods of multiobjective optimization and ranking for global QSAR studies. Filtering safe and potent drug candidates from combinatorial libraries.

    PubMed

    Cruz-Monteagudo, Maykel; Borges, Fernanda; Cordeiro, M Natália D S; Cagide Fajin, J Luis; Morell, Carlos; Ruiz, Reinaldo Molina; Cañizares-Carmenate, Yudith; Dominguez, Elena Rosa

    2008-01-01

    Up to now, very few applications of multiobjective optimization (MOOP) techniques to quantitative structure-activity relationship (QSAR) studies have been reported in the literature. However, none of them report the optimization of objectives related directly to the final pharmaceutical profile of a drug. In this paper, a MOOP method based on Derringer's desirability function that allows conducting global QSAR studies, simultaneously considering the potency, bioavailability, and safety of a set of drug candidates, is introduced. The results of the desirability-based MOOP (the levels of the predictor variables concurrently producing the best possible compromise between the properties determining an optimal drug candidate) are used for the implementation of a ranking method that is also based on the application of desirability functions. This method allows ranking drug candidates with unknown pharmaceutical properties from combinatorial libraries according to the degree of similarity with the previously determined optimal candidate. Application of this method will make it possible to filter the most promising drug candidates of a library (the best-ranked candidates), which should have the best pharmaceutical profile (the best compromise between potency, safety and bioavailability). In addition, a validation method of the ranking process, as well as a quantitative measure of the quality of a ranking, the ranking quality index (Psi), is proposed. The usefulness of the desirability-based methods of MOOP and ranking is demonstrated by its application to a library of 95 fluoroquinolones, reporting their gram-negative antibacterial activity and mammalian cell cytotoxicity. Finally, the combined use of the desirability-based methods of MOOP and ranking proposed here seems to be a valuable tool for rational drug discovery and development.

  20. On the ranking of chemicals based on their PBT characteristics: comparison of different ranking methodologies using selected POPs as an illustrative example.

    PubMed

    Sailaukhanuly, Yerbolat; Zhakupbekova, Arai; Amutova, Farida; Carlsen, Lars

    2013-01-01

    Knowledge of the environmental behavior of chemicals is a fundamental part of the risk assessment process. The present paper discusses various methods of ranking of a series of persistent organic pollutants (POPs) according to the persistence, bioaccumulation and toxicity (PBT) characteristics. Traditionally ranking has been done as an absolute (total) ranking applying various multicriteria data analysis methods like simple additive ranking (SAR) or various utility functions (UFs) based rankings. An attractive alternative to these ranking methodologies appears to be partial order ranking (POR). The present paper compares different ranking methods like SAR, UF and POR. Significant discrepancies between the rankings are noted and it is concluded that partial order ranking, as a method without any pre-assumptions concerning possible relation between the single parameters, appears as the most attractive ranking methodology. In addition to the initial ranking partial order methodology offers a wide variety of analytical tools to elucidate the interplay between the objects to be ranked and the ranking parameters. In the present study is included an analysis of the relative importance of the single P, B and T parameters. Copyright © 2012 Elsevier Ltd. All rights reserved.

  1. Information categorization approach to literary authorship disputes

    NASA Astrophysics Data System (ADS)

    Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

    2003-11-01

    Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.

  2. Implication of correlations among some common stability statistics - a Monte Carlo simulations.

    PubMed

    Piepho, H P

    1995-03-01

    Stability analysis of multilocation trials is often based on a mixed two-way model. Two stability measures in frequent use are the environmental variance (S i (2) )and the ecovalence (W i). Under the two-way model the rank orders of the expected values of these two statistics are identical for a given set of genotypes. By contrast, empirical rank correlations among these measures are consistently low. This suggests that the two-way mixed model may not be appropriate for describing real data. To check this hypothesis, a Monte Carlo simulation was conducted. It revealed that the low empirical rank correlation amongS i (2) and W i is most likely due to sampling errors. It is concluded that the observed low rank correlation does not invalidate the two-way model. The paper also discusses tests for homogeneity of S i (2) as well as implications of the two-way model for the classification of stability statistics.

  3. A novel three-stage distance-based consensus ranking method

    NASA Astrophysics Data System (ADS)

    Aghayi, Nazila; Tavana, Madjid

    2018-05-01

    In this study, we propose a three-stage weighted sum method for identifying the group ranks of alternatives. In the first stage, a rank matrix, similar to the cross-efficiency matrix, is obtained by computing the individual rank position of each alternative based on importance weights. In the second stage, a secondary goal is defined to limit the vector of weights since the vector of weights obtained in the first stage is not unique. Finally, in the third stage, the group rank position of alternatives is obtained based on a distance of individual rank positions. The third stage determines a consensus solution for the group so that the ranks obtained have a minimum distance from the ranks acquired by each alternative in the previous stage. A numerical example is presented to demonstrate the applicability and exhibit the efficacy of the proposed method and algorithms.

  4. A Gaussian-based rank approximation for subspace clustering

    NASA Astrophysics Data System (ADS)

    Xu, Fei; Peng, Chong; Hu, Yunhong; He, Guoping

    2018-04-01

    Low-rank representation (LRR) has been shown successful in seeking low-rank structures of data relationships in a union of subspaces. Generally, LRR and LRR-based variants need to solve the nuclear norm-based minimization problems. Beyond the success of such methods, it has been widely noted that the nuclear norm may not be a good rank approximation because it simply adds all singular values of a matrix together and thus large singular values may dominant the weight. This results in far from satisfactory rank approximation and may degrade the performance of lowrank models based on the nuclear norm. In this paper, we propose a novel nonconvex rank approximation based on the Gaussian distribution function, which has demanding properties to be a better rank approximation than the nuclear norm. Then a low-rank model is proposed based on the new rank approximation with application to motion segmentation. Experimental results have shown significant improvements and verified the effectiveness of our method.

  5. A ranking index for quality assessment of forensic DNA profiles forensic DNA profiles

    PubMed Central

    2010-01-01

    Background Assessment of DNA profile quality is vital in forensic DNA analysis, both in order to determine the evidentiary value of DNA results and to compare the performance of different DNA analysis protocols. Generally the quality assessment is performed through manual examination of the DNA profiles based on empirical knowledge, or by comparing the intensities (allelic peak heights) of the capillary electrophoresis electropherograms. Results We recently developed a ranking index for unbiased and quantitative quality assessment of forensic DNA profiles, the forensic DNA profile index (FI) (Hedman et al. Improved forensic DNA analysis through the use of alternative DNA polymerases and statistical modeling of DNA profiles, Biotechniques 47 (2009) 951-958). FI uses electropherogram data to combine the intensities of the allelic peaks with the balances within and between loci, using Principal Components Analysis. Here we present the construction of FI. We explain the mathematical and statistical methodologies used and present details about the applied data reduction method. Thereby we show how to adapt the ranking index for any Short Tandem Repeat-based forensic DNA typing system through validation against a manual grading scale and calibration against a specific set of DNA profiles. Conclusions The developed tool provides unbiased quality assessment of forensic DNA profiles. It can be applied for any DNA profiling system based on Short Tandem Repeat markers. Apart from crime related DNA analysis, FI can therefore be used as a quality tool in paternal or familial testing as well as in disaster victim identification. PMID:21062433

  6. Permutational distribution of the log-rank statistic under random censorship with applications to carcinogenicity assays.

    PubMed

    Heimann, G; Neuhaus, G

    1998-03-01

    In the random censorship model, the log-rank test is often used for comparing a control group with different dose groups. If the number of tumors is small, so-called exact methods are often applied for computing critical values from a permutational distribution. Two of these exact methods are discussed and shown to be incorrect. The correct permutational distribution is derived and studied with respect to its behavior under unequal censoring in the light of recent results proving that the permutational version and the unconditional version of the log-rank test are asymptotically equivalent even under unequal censoring. The log-rank test is studied by simulations of a realistic scenario from a bioassay with small numbers of tumors.

  7. A gender-based comparison of academic rank and scholarly productivity in academic neurological surgery.

    PubMed

    Tomei, Krystal L; Nahass, Meghan M; Husain, Qasim; Agarwal, Nitin; Patel, Smruti K; Svider, Peter F; Eloy, Jean Anderson; Liu, James K

    2014-07-01

    The number of women pursuing training opportunities in neurological surgery has increased, although they are still underrepresented at senior positions relative to junior academic ranks. Research productivity is an important component of the academic advancement process. We sought to use the h-index, a bibliometric previously analyzed among neurological surgeons, to evaluate whether there are gender differences in academic rank and research productivity among academic neurological surgeons. The h-index was calculated for 1052 academic neurological surgeons from 84 institutions, and organized by gender and academic rank. Overall men had statistically higher research productivity (mean 13.3) than their female colleagues (mean 9.5), as measured by the h-index, in the overall sample (p<0.0007). When separating by academic rank, there were no statistical differences (p>0.05) in h-index at the assistant professor (mean 7.2 male, 6.3 female), associate professor (11.2 male, 10.8 female), and professor (20.0 male, 18.0 female) levels based on gender. There was insufficient data to determine significance at the chairperson rank, as there was only one female chairperson. Although overall gender differences in scholarly productivity were detected, these differences did not reach statistical significance upon controlling for academic rank. Women were grossly underrepresented at the level of chairpersons in this sample of 1052 academic neurological surgeons, likely a result of the low proportion of females in this specialty. Future studies may be needed to investigate gender-specific research trends for neurosurgical residents, a cohort that in recent years has seen increased representation by women. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. CNN-based ranking for biomedical entity normalization.

    PubMed

    Li, Haodi; Chen, Qingcai; Tang, Buzhou; Wang, Xiaolong; Xu, Hua; Wang, Baohua; Huang, Dong

    2017-10-03

    Most state-of-the-art biomedical entity normalization systems, such as rule-based systems, merely rely on morphological information of entity mentions, but rarely consider their semantic information. In this paper, we introduce a novel convolutional neural network (CNN) architecture that regards biomedical entity normalization as a ranking problem and benefits from semantic information of biomedical entities. The CNN-based ranking method first generates candidates using handcrafted rules, and then ranks the candidates according to their semantic information modeled by CNN as well as their morphological information. Experiments on two benchmark datasets for biomedical entity normalization show that our proposed CNN-based ranking method outperforms traditional rule-based method with state-of-the-art performance. We propose a CNN architecture that regards biomedical entity normalization as a ranking problem. Comparison results show that semantic information is beneficial to biomedical entity normalization and can be well combined with morphological information in our CNN architecture for further improvement.

  9. Efficient l1 -norm-based low-rank matrix approximations for large-scale problems using alternating rectified gradient method.

    PubMed

    Kim, Eunwoo; Lee, Minsik; Choi, Chong-Ho; Kwak, Nojun; Oh, Songhwai

    2015-02-01

    Low-rank matrix approximation plays an important role in the area of computer vision and image processing. Most of the conventional low-rank matrix approximation methods are based on the l2 -norm (Frobenius norm) with principal component analysis (PCA) being the most popular among them. However, this can give a poor approximation for data contaminated by outliers (including missing data), because the l2 -norm exaggerates the negative effect of outliers. Recently, to overcome this problem, various methods based on the l1 -norm, such as robust PCA methods, have been proposed for low-rank matrix approximation. Despite the robustness of the methods, they require heavy computational effort and substantial memory for high-dimensional data, which is impractical for real-world problems. In this paper, we propose two efficient low-rank factorization methods based on the l1 -norm that find proper projection and coefficient matrices using the alternating rectified gradient method. The proposed methods are applied to a number of low-rank matrix approximation problems to demonstrate their efficiency and robustness. The experimental results show that our proposals are efficient in both execution time and reconstruction performance unlike other state-of-the-art methods.

  10. Solutions of interval type-2 fuzzy polynomials using a new ranking method

    NASA Astrophysics Data System (ADS)

    Rahman, Nurhakimah Ab.; Abdullah, Lazim; Ghani, Ahmad Termimi Ab.; Ahmad, Noor'Ani

    2015-10-01

    A few years ago, a ranking method have been introduced in the fuzzy polynomial equations. Concept of the ranking method is proposed to find actual roots of fuzzy polynomials (if exists). Fuzzy polynomials are transformed to system of crisp polynomials, performed by using ranking method based on three parameters namely, Value, Ambiguity and Fuzziness. However, it was found that solutions based on these three parameters are quite inefficient to produce answers. Therefore in this study a new ranking method have been developed with the aim to overcome the inherent weakness. The new ranking method which have four parameters are then applied in the interval type-2 fuzzy polynomials, covering the interval type-2 of fuzzy polynomial equation, dual fuzzy polynomial equations and system of fuzzy polynomials. The efficiency of the new ranking method then numerically considered in the triangular fuzzy numbers and the trapezoidal fuzzy numbers. Finally, the approximate solutions produced from the numerical examples indicate that the new ranking method successfully produced actual roots for the interval type-2 fuzzy polynomials.

  11. Selecting long-term care facilities with high use of acute hospitalisations: issues and options

    PubMed Central

    2014-01-01

    Background This paper considers approaches to the question “Which long-term care facilities have residents with high use of acute hospitalisations?” It compares four methods of identifying long-term care facilities with high use of acute hospitalisations by demonstrating four selection methods, identifies key factors to be resolved when deciding which methods to employ, and discusses their appropriateness for different research questions. Methods OPAL was a census-type survey of aged care facilities and residents in Auckland, New Zealand, in 2008. It collected information about facility management and resident demographics, needs and care. Survey records (149 aged care facilities, 6271 residents) were linked to hospital and mortality records routinely assembled by health authorities. The main ranking endpoint was acute hospitalisations for diagnoses that were classified as potentially avoidable. Facilities were ranked using 1) simple event counts per person, 2) event rates per year of resident follow-up, 3) statistical model of rates using four predictors, and 4) change in ranks between methods 2) and 3). A generalized mixed model was used for Method 3 to handle the clustered nature of the data. Results 3048 potentially avoidable hospitalisations were observed during 22 months’ follow-up. The same “top ten” facilities were selected by Methods 1 and 2. The statistical model (Method 3), predicting rates from resident and facility characteristics, ranked facilities differently than these two simple methods. The change-in-ranks method identified a very different set of “top ten” facilities. All methods showed a continuum of use, with no clear distinction between facilities with higher use. Conclusion Choice of selection method should depend upon the purpose of selection. To monitor performance during a period of change, a recent simple rate, count per resident, or even count per bed, may suffice. To find high–use facilities regardless of resident needs, recent history of admissions is highly predictive. To target a few high-use facilities that have high rates after considering facility and resident characteristics, model residuals or a large increase in rank may be preferable. PMID:25052433

  12. ELICIT: An alternative imprecise weight elicitation technique for use in multi-criteria decision analysis for healthcare

    PubMed Central

    Diaby, Vakaramoko; Sanogo, Vassiki; Moussa, Kouame Richard

    2015-01-01

    Objective In this paper, the readers are introduced to ELICIT, an imprecise weight elicitation technique for multicriteria decision analysis for healthcare. Methods The application of ELICIT consists of two steps: the rank ordering of evaluation criteria based on decision-makers’ (DMs) preferences using the principal component analysis; and the estimation of criteria weights and their descriptive statistics using the variable interdependent analysis and the Monte Carlo method. The application of ELICIT is illustrated with a hypothetical case study involving the elicitation of weights for five criteria used to select the best device for eye surgery. Results The criteria were ranked from 1–5, based on a strict preference relationship established by the DMs. For each criterion, the deterministic weight was estimated as well as the standard deviation and 95% credibility interval. Conclusions ELICIT is appropriate in situations where only ordinal DMs’ preferences are available to elicit decision criteria weights. PMID:26361235

  13. [Rank distributions in community ecology from the statistical viewpoint].

    PubMed

    Maksimov, V N

    2004-01-01

    Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.

  14. Automatic vetting of planet candidates from ground based surveys: Machine learning with NGTS

    NASA Astrophysics Data System (ADS)

    Armstrong, David J.; Günther, Maximilian N.; McCormac, James; Smith, Alexis M. S.; Bayliss, Daniel; Bouchy, François; Burleigh, Matthew R.; Casewell, Sarah; Eigmüller, Philipp; Gillen, Edward; Goad, Michael R.; Hodgkin, Simon T.; Jenkins, James S.; Louden, Tom; Metrailler, Lionel; Pollacco, Don; Poppenhaeger, Katja; Queloz, Didier; Raynard, Liam; Rauer, Heike; Udry, Stéphane; Walker, Simon R.; Watson, Christopher A.; West, Richard G.; Wheatley, Peter J.

    2018-05-01

    State of the art exoplanet transit surveys are producing ever increasing quantities of data. To make the best use of this resource, in detecting interesting planetary systems or in determining accurate planetary population statistics, requires new automated methods. Here we describe a machine learning algorithm that forms an integral part of the pipeline for the NGTS transit survey, demonstrating the efficacy of machine learning in selecting planetary candidates from multi-night ground based survey data. Our method uses a combination of random forests and self-organising-maps to rank planetary candidates, achieving an AUC score of 97.6% in ranking 12368 injected planets against 27496 false positives in the NGTS data. We build on past examples by using injected transit signals to form a training set, a necessary development for applying similar methods to upcoming surveys. We also make the autovet code used to implement the algorithm publicly accessible. autovet is designed to perform machine learned vetting of planetary candidates, and can utilise a variety of methods. The apparent robustness of machine learning techniques, whether on space-based or the qualitatively different ground-based data, highlights their importance to future surveys such as TESS and PLATO and the need to better understand their advantages and pitfalls in an exoplanetary context.

  15. Health information on internet: quality, importance, and popularity of persian health websites.

    PubMed

    Samadbeik, Mahnaz; Ahmadi, Maryam; Mohammadi, Ali; Mohseni Saravi, Beniamin

    2014-04-01

    The Internet has provided great opportunities for disseminating both accurate and inaccurate health information. Therefore, the quality of information is considered as a widespread concern affecting the human life. Despite the increasingly substantial growth in the number of users, Persian health websites and the proportion of internet-using patients, little is known about the quality of Persian medical and health websites. The current study aimed to first assess the quality, popularity and importance of websites providing Persian health-related information, and second to evaluate the correlation of the popularity and importance ranking with quality score on the Internet. The sample websites were identified by entering the health-related keywords into four most popular search engines of Iranian users based on the Alexa ranking at the time of study. Each selected website was assessed using three qualified tools including the Bomba and Land Index, Google PageRank and the Alexa ranking. The evaluated sites characteristics (ownership structure, database, scope and objective) really did not have an effect on the Alexa traffic global rank, Alexa traffic rank in Iran, Google PageRank and Bomba total score. Most websites (78.9 percent, n = 56) were in the moderate category (8 ≤ x ≤ 11.99) based on their quality levels. There was no statistically significant association between Google PageRank with Bomba index variables and Alexa traffic global rank (P > 0.05). The Persian health websites had better Bomba quality scores in availability and usability guidelines as compared to other guidelines. The Google PageRank did not properly reflect the real quality of evaluated websites and Internet users seeking online health information should not merely rely on it for any kind of prejudgment regarding Persian health websites. However, they can use Iran Alexa rank as a primary filtering tool of these websites. Therefore, designing search engines dedicated to explore accredited Persian health-related Web sites can be an effective method to access high-quality Persian health websites.

  16. Gas chimney detection based on improving the performance of combined multilayer perceptron and support vector classifier

    NASA Astrophysics Data System (ADS)

    Hashemi, H.; Tax, D. M. J.; Duin, R. P. W.; Javaherian, A.; de Groot, P.

    2008-11-01

    Seismic object detection is a relatively new field in which 3-D bodies are visualized and spatial relationships between objects of different origins are studied in order to extract geologic information. In this paper, we propose a method for finding an optimal classifier with the help of a statistical feature ranking technique and combining different classifiers. The method, which has general applicability, is demonstrated here on a gas chimney detection problem. First, we evaluate a set of input seismic attributes extracted at locations labeled by a human expert using regularized discriminant analysis (RDA). In order to find the RDA score for each seismic attribute, forward and backward search strategies are used. Subsequently, two non-linear classifiers: multilayer perceptron (MLP) and support vector classifier (SVC) are run on the ranked seismic attributes. Finally, to capitalize on the intrinsic differences between both classifiers, the MLP and SVC results are combined using logical rules of maximum, minimum and mean. The proposed method optimizes the ranked feature space size and yields the lowest classification error in the final combined result. We will show that the logical minimum reveals gas chimneys that exhibit both the softness of MLP and the resolution of SVC classifiers.

  17. SibRank: Signed bipartite network analysis for neighbor-based collaborative ranking

    NASA Astrophysics Data System (ADS)

    Shams, Bita; Haratizadeh, Saman

    2016-09-01

    Collaborative ranking is an emerging field of recommender systems that utilizes users' preference data rather than rating values. Unfortunately, neighbor-based collaborative ranking has gained little attention despite its more flexibility and justifiability. This paper proposes a novel framework, called SibRank that seeks to improve the state of the art neighbor-based collaborative ranking methods. SibRank represents users' preferences as a signed bipartite network, and finds similar users, through a novel personalized ranking algorithm in signed networks.

  18. Quality of Public Hospitals Websites: A Cross-Sectional Analytical Study in Iran

    PubMed Central

    Salarvand, Shahin; Samadbeik, Mahnaz; Tarrahi, Mohammad Javad; Salarvand, Hamed

    2016-01-01

    Introduction: Nowadays, hospitals have turned increasingly towards the Internet and develop their own web presence. Hospital Websites could be operating as effective web resources of information and interactive communication mediums to enhance hospital services to the public. Aim: Therefore, the aim of this study was to assess the quality of websites in Tehran’s public hospitals. Material and methods: This cross-sectional analysis involved all public hospitals in Iran’s capital city, Tehran, with a working website or subsites between April and June, 2014 (N=59). The websites were evaluated using three validated instruments: a localized checklist, Google page rank, and the Alexa traffic ranking. The mentioned checklist consisted of 112 items divided into five sections: technical characteristics, hospital information and facilities, medical services, interactive on-line services and external activities. Data were analyzed using descriptive and analytical statistics. Results: The mean website evaluation score was 45.7 out of 224 for selected public hospitals. All the studied websites were in the weak category based on the earned quality scores. There was no statistically significant association between the website evaluation score with Google page rank (P=0.092), Alexa global traffic rank and Alexa traffic rank in Iran (P>0.05). The hospital websites had a lower quality score in the interactive online services and external activities criteria in comparing to other criteria. Due to the low quality level of the studied websites and the importance of hospital portals in providing information and services on the Internet, the authorities should do precise planning for the appreciable improvement in the quality of hospital websites. PMID:27147806

  19. Comparison and ranking of superelasticity of different austenite active nickel-titanium orthodontic archwires using mechanical tensile testing and correlating with its electrical resistivity

    PubMed Central

    Nagarajan, D.; Baskaranarayanan, Balashanmugam; Usha, K.; Jayanthi, M. S.; Vijjaykanth, M.

    2016-01-01

    Introduction: The application of light and continuous forces for optimum physiological response and the least damage to the tooth supporting structures should be the primary aim of an orthodontist. Nickel-titanium (NiTi) alloys with their desirable properties are one of the natural choices of the clinicians. Aim: This study was aimed to compare and rank them based on its tensile strength and electrical resistivity. Materials and Methods: The sample consisted of eight groups of 0.017 inch × 0.025 inch rectangular archwires from eight different manufacturers, and five samples from each group for tensile testing and nine samples for electrical resistivity tests were used. Data for stress at 10% strain and the initial slope were statistically analyzed with an analysis of variance and Scheffe tests with P < 0.05. The stress/strain plots of each product were ranked for superelastic behavior. The rankings of the wires tested were based primarily on the unloading curve's slope which is indicative of the magnitude of the deactivation force and secondarily on the length of the horizontal segment which is indicative of continuous forces during deactivation. For calculating the electric resistivity, the change in resistance after inducing strain in the wires was taken into account for the calculation of degree of martensite transformation and for ranking. Results: In tensile testing Ortho Organizers wires ranked first and GAC Lowland NiTi wires ranked last. For resistivity tests Ormco A wires were found superior and Morelli remained last. Conclusion: these rankings should be correlated clinically and need further studies. PMID:27829751

  20. DrugE-Rank: improving drug–target interaction prediction of new candidate drugs or targets by ensemble learning to rank

    PubMed Central

    Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-01-01

    Motivation: Identifying drug–target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug–target interactions of new candidate drugs or targets. Methods: Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. Results: The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. Availability: http://datamining-iip.fudan.edu.cn/service/DrugE-Rank Contact: zhusf@fudan.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307615

  1. Ranking viruses: measures of positional importance within networks define core viruses for rational polyvalent vaccine development.

    PubMed

    Anderson, Tavis K; Laegreid, William W; Cerutti, Francesco; Osorio, Fernando A; Nelson, Eric A; Christopher-Hennings, Jane; Goldberg, Tony L

    2012-06-15

    The extraordinary genetic and antigenic variability of RNA viruses is arguably the greatest challenge to the development of broadly effective vaccines. No single viral variant can induce sufficiently broad immunity, and incorporating all known naturally circulating variants into one multivalent vaccine is not feasible. Furthermore, no objective strategies currently exist to select actual viral variants that should be included or excluded in polyvalent vaccines. To address this problem, we demonstrate a method based on graph theory that quantifies the relative importance of viral variants. We demonstrate our method through application to the envelope glycoprotein gene of a particularly diverse RNA virus of pigs: porcine reproductive and respiratory syndrome virus (PRRSV). Using distance matrices derived from sequence nucleotide difference, amino acid difference and evolutionary distance, we constructed viral networks and used common network statistics to assign each sequence an objective ranking of relative 'importance'. To validate our approach, we use an independent published algorithm to score our top-ranked wild-type variants for coverage of putative T-cell epitopes across the 9383 sequences in our dataset. Top-ranked viruses achieve significantly higher coverage than low-ranked viruses, and top-ranked viruses achieve nearly equal coverage as a synthetic mosaic protein constructed in silico from the same set of 9383 sequences. Our approach relies on the network structure of PRRSV but applies to any diverse RNA virus because it identifies subsets of viral variants that are most important to overall viral diversity. We suggest that this method, through the objective quantification of variant importance, provides criteria for choosing viral variants for further characterization, diagnostics, surveillance and ultimately polyvalent vaccine development.

  2. [The relationship between Ridit analysis and rank sum test for one-way ordinal contingency table in medical research].

    PubMed

    Wang, Ling; Xia, Jie-lai; Yu, Li-li; Li, Chan-juan; Wang, Su-zhen

    2008-06-01

    To explore several numerical methods of ordinal variable in one-way ordinal contingency table and their interrelationship, and to compare corresponding statistical analysis methods such as Ridit analysis and rank sum test. Formula deduction was based on five simplified grading approaches including rank_r(i), ridit_r(i), ridit_r(ci), ridit_r(mi), and table scores. Practical data set was verified by SAS8.2 in clinical practice (to test the effect of Shiwei solution in treatment for chronic tracheitis). Because of the linear relationship of rank_r(i) = N ridit_r(i) + 1/2 = N ridit_r(ci) = (N + 1) ridit_r(mi), the exact chi2 values in Ridit analysis based on ridit_r(i), ridit_r(ci), and ridit_r(mi), were completely the same, and they were equivalent to the Kruskal-Wallis H test. Traditional Ridit analysis was based on ridit_r(i), and its corresponding chi2 value calculated with an approximate variance (1/12) was conservative. The exact chi2 test of Ridit analysis should be used when comparing multiple groups in the clinical researches because of its special merits such as distribution of mean ridit value on (0,1) and clear graph expression. The exact chi2 test of Ridit analysis can be output directly by proc freq of SAS8.2 with ridit and modridit option (SCORES =). The exact chi2 test of Ridit analysis is equivalent to the Kruskal-Wallis H test, and should be used when comparing multiple groups in the clinical researches.

  3. Assessment and improvement of statistical tools for comparative proteomics analysis of sparse data sets with few experimental replicates.

    PubMed

    Schwämmle, Veit; León, Ileana Rodríguez; Jensen, Ole Nørregaard

    2013-09-06

    Large-scale quantitative analyses of biological systems are often performed with few replicate experiments, leading to multiple nonidentical data sets due to missing values. For example, mass spectrometry driven proteomics experiments are frequently performed with few biological or technical replicates due to sample-scarcity or due to duty-cycle or sensitivity constraints, or limited capacity of the available instrumentation, leading to incomplete results where detection of significant feature changes becomes a challenge. This problem is further exacerbated for the detection of significant changes on the peptide level, for example, in phospho-proteomics experiments. In order to assess the extent of this problem and the implications for large-scale proteome analysis, we investigated and optimized the performance of three statistical approaches by using simulated and experimental data sets with varying numbers of missing values. We applied three tools, including standard t test, moderated t test, also known as limma, and rank products for the detection of significantly changing features in simulated and experimental proteomics data sets with missing values. The rank product method was improved to work with data sets containing missing values. Extensive analysis of simulated and experimental data sets revealed that the performance of the statistical analysis tools depended on simple properties of the data sets. High-confidence results were obtained by using the limma and rank products methods for analyses of triplicate data sets that exhibited more than 1000 features and more than 50% missing values. The maximum number of differentially represented features was identified by using limma and rank products methods in a complementary manner. We therefore recommend combined usage of these methods as a novel and optimal way to detect significantly changing features in these data sets. This approach is suitable for large quantitative data sets from stable isotope labeling and mass spectrometry experiments and should be applicable to large data sets of any type. An R script that implements the improved rank products algorithm and the combined analysis is available.

  4. AptRank: an adaptive PageRank model for protein function prediction on   bi-relational graphs.

    PubMed

    Jiang, Biaobin; Kloster, Kyle; Gleich, David F; Gribskov, Michael

    2017-06-15

    Diffusion-based network models are widely used for protein function prediction using protein network data and have been shown to outperform neighborhood-based and module-based methods. Recent studies have shown that integrating the hierarchical structure of the Gene Ontology (GO) data dramatically improves prediction accuracy. However, previous methods usually either used the GO hierarchy to refine the prediction results of multiple classifiers, or flattened the hierarchy into a function-function similarity kernel. No study has taken the GO hierarchy into account together with the protein network as a two-layer network model. We first construct a Bi-relational graph (Birg) model comprised of both protein-protein association and function-function hierarchical networks. We then propose two diffusion-based methods, BirgRank and AptRank, both of which use PageRank to diffuse information on this two-layer graph model. BirgRank is a direct application of traditional PageRank with fixed decay parameters. In contrast, AptRank utilizes an adaptive diffusion mechanism to improve the performance of BirgRank. We evaluate the ability of both methods to predict protein function on yeast, fly and human protein datasets, and compare with four previous methods: GeneMANIA, TMC, ProteinRank and clusDCA. We design four different validation strategies: missing function prediction, de novo function prediction, guided function prediction and newly discovered function prediction to comprehensively evaluate predictability of all six methods. We find that both BirgRank and AptRank outperform the previous methods, especially in missing function prediction when using only 10% of the data for training. The MATLAB code is available at https://github.rcac.purdue.edu/mgribsko/aptrank . gribskov@purdue.edu. Supplementary data are available at Bioinformatics online. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  5. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy.

    PubMed

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic-there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions.

  6. Research on B Cell Algorithm for Learning to Rank Method Based on Parallel Strategy

    PubMed Central

    Tian, Yuling; Zhang, Hongxian

    2016-01-01

    For the purposes of information retrieval, users must find highly relevant documents from within a system (and often a quite large one comprised of many individual documents) based on input query. Ranking the documents according to their relevance within the system to meet user needs is a challenging endeavor, and a hot research topic–there already exist several rank-learning methods based on machine learning techniques which can generate ranking functions automatically. This paper proposes a parallel B cell algorithm, RankBCA, for rank learning which utilizes a clonal selection mechanism based on biological immunity. The novel algorithm is compared with traditional rank-learning algorithms through experimentation and shown to outperform the others in respect to accuracy, learning time, and convergence rate; taken together, the experimental results show that the proposed algorithm indeed effectively and rapidly identifies optimal ranking functions. PMID:27487242

  7. Statistical interpretation of machine learning-based feature importance scores for biomarker discovery.

    PubMed

    Huynh-Thu, Vân Anh; Saeys, Yvan; Wehenkel, Louis; Geurts, Pierre

    2012-07-01

    Univariate statistical tests are widely used for biomarker discovery in bioinformatics. These procedures are simple, fast and their output is easily interpretable by biologists but they can only identify variables that provide a significant amount of information in isolation from the other variables. As biological processes are expected to involve complex interactions between variables, univariate methods thus potentially miss some informative biomarkers. Variable relevance scores provided by machine learning techniques, however, are potentially able to highlight multivariate interacting effects, but unlike the p-values returned by univariate tests, these relevance scores are usually not statistically interpretable. This lack of interpretability hampers the determination of a relevance threshold for extracting a feature subset from the rankings and also prevents the wide adoption of these methods by practicians. We evaluated several, existing and novel, procedures that extract relevant features from rankings derived from machine learning approaches. These procedures replace the relevance scores with measures that can be interpreted in a statistical way, such as p-values, false discovery rates, or family wise error rates, for which it is easier to determine a significance level. Experiments were performed on several artificial problems as well as on real microarray datasets. Although the methods differ in terms of computing times and the tradeoff, they achieve in terms of false positives and false negatives, some of them greatly help in the extraction of truly relevant biomarkers and should thus be of great practical interest for biologists and physicians. As a side conclusion, our experiments also clearly highlight that using model performance as a criterion for feature selection is often counter-productive. Python source codes of all tested methods, as well as the MATLAB scripts used for data simulation, can be found in the Supplementary Material.

  8. Health systems around the world - a comparison of existing health system rankings.

    PubMed

    Schütte, Stefanie; Acevedo, Paula N Marin; Flahault, Antoine

    2018-06-01

    Existing health systems all over the world are different due to the different combinations of components that can be considered for their establishment. The ranking of health systems has been a focal points for many years especially the issue of performance. In 2000 the World Health Organization (WHO) performed a ranking to compare the Performance of the health system of the member countries. Since then other health system rankings have been performed and it became an issue of public discussion. A point of contention regarding these rankings is the methodology employed by each of them, since no gold standard exists. Therefore, this review focuses on evaluating the methodologies of each existing health system performance ranking to assess their reproducibility and transparency. A search was conducted to identify existing health system rankings, and a questionnaire was developed for the comparison of the methodologies based on the following indicators: (1) General information, (2) Statistical methods, (3) Data (4) Indicators. Overall nine rankings were identified whereas six of them focused rather on the measurement of population health without any financial component and were therefore excluded. Finally, three health system rankings were selected for this review: "Health Systems: Improving Performance" by the WHO, "Mirror, Mirror on the wall: How the Performance of the US Health Care System Compares Internationally" by the Commonwealth Fund and "the Most efficient Health Care" by Bloomberg. After the completion of the comparison of the rankings by giving them scores according to the indicators, the ranking performed the WHO was considered the most complete regarding the ability of reproducibility and transparency of the methodology. This review and comparison could help in establishing consensus in the field of health system research. This may also help giving recommendations for future health rankings and evaluating the current gap in the literature.

  9. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  10. Yager’s ranking method for solving the trapezoidal fuzzy number linear programming

    NASA Astrophysics Data System (ADS)

    Karyati; Wutsqa, D. U.; Insani, N.

    2018-03-01

    In the previous research, the authors have studied the fuzzy simplex method for trapezoidal fuzzy number linear programming based on the Maleki’s ranking function. We have found some theories related to the term conditions for the optimum solution of fuzzy simplex method, the fuzzy Big-M method, the fuzzy two-phase method, and the sensitivity analysis. In this research, we study about the fuzzy simplex method based on the other ranking function. It is called Yager's ranking function. In this case, we investigate the optimum term conditions. Based on the result of research, it is found that Yager’s ranking function is not like Maleki’s ranking function. Using the Yager’s function, the simplex method cannot work as well as when using the Maleki’s function. By using the Yager’s function, the value of the subtraction of two equal fuzzy numbers is not equal to zero. This condition makes the optimum table of the fuzzy simplex table is undetected. As a result, the simplified fuzzy simplex table becomes stopped and does not reach the optimum solution.

  11. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank.

    PubMed

    Yuan, Qingjun; Gao, Junning; Wu, Dongliang; Zhang, Shihua; Mamitsuka, Hiroshi; Zhu, Shanfeng

    2016-06-15

    Identifying drug-target interactions is an important task in drug discovery. To reduce heavy time and financial cost in experimental way, many computational approaches have been proposed. Although these approaches have used many different principles, their performance is far from satisfactory, especially in predicting drug-target interactions of new candidate drugs or targets. Approaches based on machine learning for this problem can be divided into two types: feature-based and similarity-based methods. Learning to rank is the most powerful technique in the feature-based methods. Similarity-based methods are well accepted, due to their idea of connecting the chemical and genomic spaces, represented by drug and target similarities, respectively. We propose a new method, DrugE-Rank, to improve the prediction performance by nicely combining the advantages of the two different types of methods. That is, DrugE-Rank uses LTR, for which multiple well-known similarity-based methods can be used as components of ensemble learning. The performance of DrugE-Rank is thoroughly examined by three main experiments using data from DrugBank: (i) cross-validation on FDA (US Food and Drug Administration) approved drugs before March 2014; (ii) independent test on FDA approved drugs after March 2014; and (iii) independent test on FDA experimental drugs. Experimental results show that DrugE-Rank outperforms competing methods significantly, especially achieving more than 30% improvement in Area under Prediction Recall curve for FDA approved new drugs and FDA experimental drugs. http://datamining-iip.fudan.edu.cn/service/DrugE-Rank zhusf@fudan.edu.cn Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  12. A note on generalized Genome Scan Meta-Analysis statistics

    PubMed Central

    Koziol, James A; Feng, Anne C

    2005-01-01

    Background Wise et al. introduced a rank-based statistical technique for meta-analysis of genome scans, the Genome Scan Meta-Analysis (GSMA) method. Levinson et al. recently described two generalizations of the GSMA statistic: (i) a weighted version of the GSMA statistic, so that different studies could be ascribed different weights for analysis; and (ii) an order statistic approach, reflecting the fact that a GSMA statistic can be computed for each chromosomal region or bin width across the various genome scan studies. Results We provide an Edgeworth approximation to the null distribution of the weighted GSMA statistic, and, we examine the limiting distribution of the GSMA statistics under the order statistic formulation, and quantify the relevance of the pairwise correlations of the GSMA statistics across different bins on this limiting distribution. We also remark on aggregate criteria and multiple testing for determining significance of GSMA results. Conclusion Theoretical considerations detailed herein can lead to clarification and simplification of testing criteria for generalizations of the GSMA statistic. PMID:15717930

  13. Using Trained Pixel Classifiers to Select Images of Interest

    NASA Technical Reports Server (NTRS)

    Mazzoni, D.; Wagstaff, K.; Castano, R.

    2004-01-01

    We present a machine-learning-based approach to ranking images based on learned priorities. Unlike previous methods for image evaluation, which typically assess the value of each image based on the presence of predetermined specific features, this method involves using two levels of machine-learning classifiers: one level is used to classify each pixel as belonging to one of a group of rather generic classes, and another level is used to rank the images based on these pixel classifications, given some example rankings from a scientist as a guide. Initial results indicate that the technique works well, producing new rankings that match the scientist's rankings significantly better than would be expected by chance. The method is demonstrated for a set of images collected by a Mars field-test rover.

  14. Identification of Functionally Related Enzymes by Learning-to-Rank Methods.

    PubMed

    Stock, Michiel; Fober, Thomas; Hüllermeier, Eyke; Glinca, Serghei; Klebe, Gerhard; Pahikkala, Tapio; Airola, Antti; De Baets, Bernard; Waegeman, Willem

    2014-01-01

    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.

  15. Survival analysis using inverse probability of treatment weighted methods based on the generalized propensity score.

    PubMed

    Sugihara, Masahiro

    2010-01-01

    In survival analysis, treatment effects are commonly evaluated based on survival curves and hazard ratios as causal treatment effects. In observational studies, these estimates may be biased due to confounding factors. The inverse probability of treatment weighted (IPTW) method based on the propensity score is one of the approaches utilized to adjust for confounding factors between binary treatment groups. As a generalization of this methodology, we developed an exact formula for an IPTW log-rank test based on the generalized propensity score for survival data. This makes it possible to compare the group differences of IPTW Kaplan-Meier estimators of survival curves using an IPTW log-rank test for multi-valued treatments. As causal treatment effects, the hazard ratio can be estimated using the IPTW approach. If the treatments correspond to ordered levels of a treatment, the proposed method can be easily extended to the analysis of treatment effect patterns with contrast statistics. In this paper, the proposed method is illustrated with data from the Kyushu Lipid Intervention Study (KLIS), which investigated the primary preventive effects of pravastatin on coronary heart disease (CHD). The results of the proposed method suggested that pravastatin treatment reduces the risk of CHD and that compliance to pravastatin treatment is important for the prevention of CHD. (c) 2009 John Wiley & Sons, Ltd.

  16. ArrayVigil: a methodology for statistical comparison of gene signatures using segregated-one-tailed (SOT) Wilcoxon's signed-rank test.

    PubMed

    Khan, Haseeb Ahmad

    2005-01-28

    Due to versatile diagnostic and prognostic fidelity molecular signatures or fingerprints are anticipated as the most powerful tools for cancer management in the near future. Notwithstanding the experimental advancements in microarray technology, methods for analyzing either whole arrays or gene signatures have not been firmly established. Recently, an algorithm, ArraySolver has been reported by Khan for two-group comparison of microarray gene expression data using two-tailed Wilcoxon signed-rank test. Most of the molecular signatures are composed of two sets of genes (hybrid signatures) wherein up-regulation of one set and down-regulation of the other set collectively define the purpose of a gene signature. Since the direction of a selected gene's expression (positive or negative) with respect to a particular disease condition is known, application of one-tailed statistics could be a more relevant choice. A novel method, ArrayVigil, is described for comparing hybrid signatures using segregated-one-tailed (SOT) Wilcoxon signed-rank test and the results compared with integrated-two-tailed (ITT) procedures (SPSS and ArraySolver). ArrayVigil resulted in lower P values than those obtained from ITT statistics while comparing real data from four signatures.

  17. Evaluating user reputation in online rating systems via an iterative group-based ranking method

    NASA Astrophysics Data System (ADS)

    Gao, Jian; Zhou, Tao

    2017-05-01

    Reputation is a valuable asset in online social lives and it has drawn increased attention. Due to the existence of noisy ratings and spamming attacks, how to evaluate user reputation in online rating systems is especially significant. However, most of the previous ranking-based methods either follow a debatable assumption or have unsatisfied robustness. In this paper, we propose an iterative group-based ranking method by introducing an iterative reputation-allocation process into the original group-based ranking method. More specifically, the reputation of users is calculated based on the weighted sizes of the user rating groups after grouping all users by their rating similarities, and the high reputation users' ratings have larger weights in dominating the corresponding user rating groups. The reputation of users and the user rating group sizes are iteratively updated until they become stable. Results on two real data sets with artificial spammers suggest that the proposed method has better performance than the state-of-the-art methods and its robustness is considerably improved comparing with the original group-based ranking method. Our work highlights the positive role of considering users' grouping behaviors towards a better online user reputation evaluation.

  18. A Case-Based Reasoning Method with Rank Aggregation

    NASA Astrophysics Data System (ADS)

    Sun, Jinhua; Du, Jiao; Hu, Jian

    2018-03-01

    In order to improve the accuracy of case-based reasoning (CBR), this paper addresses a new CBR framework with the basic principle of rank aggregation. First, the ranking methods are put forward in each attribute subspace of case. The ordering relation between cases on each attribute is got between cases. Then, a sorting matrix is got. Second, the similar case retrieval process from ranking matrix is transformed into a rank aggregation optimal problem, which uses the Kemeny optimal. On the basis, a rank aggregation case-based reasoning algorithm, named RA-CBR, is designed. The experiment result on UCI data sets shows that case retrieval accuracy of RA-CBR algorithm is higher than euclidean distance CBR and mahalanobis distance CBR testing.So we can get the conclusion that RA-CBR method can increase the performance and efficiency of CBR.

  19. RRCRank: a fusion method using rank strategy for residue-residue contact prediction.

    PubMed

    Jing, Xiaoyang; Dong, Qiwen; Lu, Ruqian

    2017-09-02

    In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment.

  20. Statistical analysis of effective singular values in matrix rank determination

    NASA Technical Reports Server (NTRS)

    Konstantinides, Konstantinos; Yao, Kung

    1988-01-01

    A major problem in using SVD (singular-value decomposition) as a tool in determining the effective rank of a perturbed matrix is that of distinguishing between significantly small and significantly large singular values to the end, conference regions are derived for the perturbed singular values of matrices with noisy observation data. The analysis is based on the theories of perturbations of singular values and statistical significance test. Threshold bounds for perturbation due to finite-precision and i.i.d. random models are evaluated. In random models, the threshold bounds depend on the dimension of the matrix, the noisy variance, and predefined statistical level of significance. Results applied to the problem of determining the effective order of a linear autoregressive system from the approximate rank of a sample autocorrelation matrix are considered. Various numerical examples illustrating the usefulness of these bounds and comparisons to other previously known approaches are given.

  1. Rankings Methodology Hurts Public Institutions

    ERIC Educational Resources Information Center

    Van Der Werf, Martin

    2007-01-01

    In the 1980s, when the "U.S. News & World Report" rankings of colleges were based solely on reputation, the nation's public universities were well represented at the top. However, as soon as the magazine began including its "measures of excellence," statistics intended to define quality, public universities nearly disappeared from the top. As the…

  2. Quantum probability ranking principle for ligand-based virtual screening.

    PubMed

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal

    2017-04-01

    Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.

  3. Quantum probability ranking principle for ligand-based virtual screening

    NASA Astrophysics Data System (ADS)

    Al-Dabbagh, Mohammed Mumtaz; Salim, Naomie; Himmat, Mubarak; Ahmed, Ali; Saeed, Faisal

    2017-04-01

    Chemical libraries contain thousands of compounds that need screening, which increases the need for computational methods that can rank or prioritize compounds. The tools of virtual screening are widely exploited to enhance the cost effectiveness of lead drug discovery programs by ranking chemical compounds databases in decreasing probability of biological activity based upon probability ranking principle (PRP). In this paper, we developed a novel ranking approach for molecular compounds inspired by quantum mechanics, called quantum probability ranking principle (QPRP). The QPRP ranking criteria would make an attempt to draw an analogy between the physical experiment and molecular structure ranking process for 2D fingerprints in ligand based virtual screening (LBVS). The development of QPRP criteria in LBVS has employed the concepts of quantum at three different levels, firstly at representation level, this model makes an effort to develop a new framework of molecular representation by connecting the molecular compounds with mathematical quantum space. Secondly, estimate the similarity between chemical libraries and references based on quantum-based similarity searching method. Finally, rank the molecules using QPRP approach. Simulated virtual screening experiments with MDL drug data report (MDDR) data sets showed that QPRP outperformed the classical ranking principle (PRP) for molecular chemical compounds.

  4. Ranking Support Vector Machine with Kernel Approximation

    PubMed Central

    Dou, Yong

    2017-01-01

    Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms. PMID:28293256

  5. Ranking Support Vector Machine with Kernel Approximation.

    PubMed

    Chen, Kai; Li, Rongchun; Dou, Yong; Liang, Zhengfa; Lv, Qi

    2017-01-01

    Learning to rank algorithm has become important in recent years due to its successful application in information retrieval, recommender system, and computational biology, and so forth. Ranking support vector machine (RankSVM) is one of the state-of-art ranking models and has been favorably used. Nonlinear RankSVM (RankSVM with nonlinear kernels) can give higher accuracy than linear RankSVM (RankSVM with a linear kernel) for complex nonlinear ranking problem. However, the learning methods for nonlinear RankSVM are still time-consuming because of the calculation of kernel matrix. In this paper, we propose a fast ranking algorithm based on kernel approximation to avoid computing the kernel matrix. We explore two types of kernel approximation methods, namely, the Nyström method and random Fourier features. Primal truncated Newton method is used to optimize the pairwise L2-loss (squared Hinge-loss) objective function of the ranking model after the nonlinear kernel approximation. Experimental results demonstrate that our proposed method gets a much faster training speed than kernel RankSVM and achieves comparable or better performance over state-of-the-art ranking algorithms.

  6. ELICIT: An alternative imprecise weight elicitation technique for use in multi-criteria decision analysis for healthcare.

    PubMed

    Diaby, Vakaramoko; Sanogo, Vassiki; Moussa, Kouame Richard

    2016-01-01

    In this paper, the readers are introduced to ELICIT, an imprecise weight elicitation technique for multicriteria decision analysis for healthcare. The application of ELICIT consists of two steps: the rank ordering of evaluation criteria based on decision-makers' (DMs) preferences using the principal component analysis; and the estimation of criteria weights and their descriptive statistics using the variable interdependent analysis and the Monte Carlo method. The application of ELICIT is illustrated with a hypothetical case study involving the elicitation of weights for five criteria used to select the best device for eye surgery. The criteria were ranked from 1-5, based on a strict preference relationship established by the DMs. For each criterion, the deterministic weight was estimated as well as the standard deviation and 95% credibility interval. ELICIT is appropriate in situations where only ordinal DMs' preferences are available to elicit decision criteria weights.

  7. Colorimetric determination of nitrate plus nitrite in water by enzymatic reduction, automated discrete analyzer methods

    USGS Publications Warehouse

    Patton, Charles J.; Kryskalla, Jennifer R.

    2011-01-01

    In addition to operational details and performance benchmarks for these new DA-AtNaR2 nitrate + nitrite assays, this report also provides results of interference studies for common inorganic and organic matrix constituents at 1, 10, and 100 times their median concentrations in surface-water and groundwater samples submitted annually to the NWQL for nitrate + nitrite analyses. Paired t-test and Wilcoxon signed-rank statistical analyses of results determined by CFA-CdR methods and DA-AtNaR2 methods indicate that nitrate concentration differences between population means or sign ranks were either statistically equivalent to zero at the 95 percent confidence level (p ≥ 0.05) or analytically equivalent to zero-that is, when p < 0.05, concentration differences between population means or medians were less than MDLs.

  8. A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances.

    PubMed

    D'Ambrosio, Antonio; Heiser, Willem J

    2016-09-01

    Preference rankings usually depend on the characteristics of both the individuals judging a set of objects and the objects being judged. This topic has been handled in the literature with log-linear representations of the generalized Bradley-Terry model and, recently, with distance-based tree models for rankings. A limitation of these approaches is that they only work with full rankings or with a pre-specified pattern governing the presence of ties, and/or they are based on quite strict distributional assumptions. To overcome these limitations, we propose a new prediction tree method for ranking data that is totally distribution-free. It combines Kemeny's axiomatic approach to define a unique distance between rankings with the CART approach to find a stable prediction tree. Furthermore, our method is not limited by any particular design of the pattern of ties. The method is evaluated in an extensive full-factorial Monte Carlo study with a new simulation design.

  9. Automated system for characterization and classification of malaria-infected stages using light microscopic images of thin blood smears.

    PubMed

    Das, D K; Maiti, A K; Chakraborty, C

    2015-03-01

    In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.

  10. An in silico method to identify computer-based protocols worthy of clinical study: An insulin infusion protocol use case

    PubMed Central

    Wong, Anthony F; Pielmeier, Ulrike; Haug, Peter J; Andreassen, Steen

    2016-01-01

    Objective Develop an efficient non-clinical method for identifying promising computer-based protocols for clinical study. An in silico comparison can provide information that informs the decision to proceed to a clinical trial. The authors compared two existing computer-based insulin infusion protocols: eProtocol-insulin from Utah, USA, and Glucosafe from Denmark. Materials and Methods The authors used eProtocol-insulin to manage intensive care unit (ICU) hyperglycemia with intravenous (IV) insulin from 2004 to 2010. Recommendations accepted by the bedside clinicians directly link the subsequent blood glucose values to eProtocol-insulin recommendations and provide a unique clinical database. The authors retrospectively compared in silico 18 984 eProtocol-insulin continuous IV insulin infusion rate recommendations from 408 ICU patients with those of Glucosafe, the candidate computer-based protocol. The subsequent blood glucose measurement value (low, on target, high) was used to identify if the insulin recommendation was too high, on target, or too low. Results Glucosafe consistently provided more favorable continuous IV insulin infusion rate recommendations than eProtocol-insulin for on target (64% of comparisons), low (80% of comparisons), or high (70% of comparisons) blood glucose. Aggregated eProtocol-insulin and Glucosafe continuous IV insulin infusion rates were clinically similar though statistically significantly different (Wilcoxon signed rank test P = .01). In contrast, when stratified by low, on target, or high subsequent blood glucose measurement, insulin infusion rates from eProtocol-insulin and Glucosafe were statistically significantly different (Wilcoxon signed rank test, P < .001), and clinically different. Discussion This in silico comparison appears to be an efficient nonclinical method for identifying promising computer-based protocols. Conclusion Preclinical in silico comparison analytical framework allows rapid and inexpensive identification of computer-based protocol care strategies that justify expensive and burdensome clinical trials. PMID:26228765

  11. Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

    PubMed

    Kuan, Pei Fen; Chiang, Derek Y

    2012-09-01

    DNA methylation has emerged as an important hallmark of epigenetics. Numerous platforms including tiling arrays and next generation sequencing, and experimental protocols are available for profiling DNA methylation. Similar to other tiling array data, DNA methylation data shares the characteristics of inherent correlation structure among nearby probes. However, unlike gene expression or protein DNA binding data, the varying CpG density which gives rise to CpG island, shore and shelf definition provides exogenous information in detecting differential methylation. This article aims to introduce a robust testing and probe ranking procedure based on a nonhomogeneous hidden Markov model that incorporates the above-mentioned features for detecting differential methylation. We revisit the seminal work of Sun and Cai (2009, Journal of the Royal Statistical Society: Series B (Statistical Methodology)71, 393-424) and propose modeling the nonnull using a nonparametric symmetric distribution in two-sided hypothesis testing. We show that this model improves probe ranking and is robust to model misspecification based on extensive simulation studies. We further illustrate that our proposed framework achieves good operating characteristics as compared to commonly used methods in real DNA methylation data that aims to detect differential methylation sites. © 2012, The International Biometric Society.

  12. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  13. Plus Disease in Retinopathy of Prematurity: Improving Diagnosis by Ranking Disease Severity and Using Quantitative Image Analysis.

    PubMed

    Kalpathy-Cramer, Jayashree; Campbell, J Peter; Erdogmus, Deniz; Tian, Peng; Kedarisetti, Dharanish; Moleta, Chace; Reynolds, James D; Hutcheson, Kelly; Shapiro, Michael J; Repka, Michael X; Ferrone, Philip; Drenser, Kimberly; Horowitz, Jason; Sonmez, Kemal; Swan, Ryan; Ostmo, Susan; Jonas, Karyn E; Chan, R V Paul; Chiang, Michael F

    2016-11-01

    To determine expert agreement on relative retinopathy of prematurity (ROP) disease severity and whether computer-based image analysis can model relative disease severity, and to propose consideration of a more continuous severity score for ROP. We developed 2 databases of clinical images of varying disease severity (100 images and 34 images) as part of the Imaging and Informatics in ROP (i-ROP) cohort study and recruited expert physician, nonexpert physician, and nonphysician graders to classify and perform pairwise comparisons on both databases. Six participating expert ROP clinician-scientists, each with a minimum of 10 years of clinical ROP experience and 5 ROP publications, and 5 image graders (3 physicians and 2 nonphysician graders) who analyzed images that were obtained during routine ROP screening in neonatal intensive care units. Images in both databases were ranked by average disease classification (classification ranking), by pairwise comparison using the Elo rating method (comparison ranking), and by correlation with the i-ROP computer-based image analysis system. Interexpert agreement (weighted κ statistic) compared with the correlation coefficient (CC) between experts on pairwise comparisons and correlation between expert rankings and computer-based image analysis modeling. There was variable interexpert agreement on diagnostic classification of disease (plus, preplus, or normal) among the 6 experts (mean weighted κ, 0.27; range, 0.06-0.63), but good correlation between experts on comparison ranking of disease severity (mean CC, 0.84; range, 0.74-0.93) on the set of 34 images. Comparison ranking provided a severity ranking that was in good agreement with ranking obtained by classification ranking (CC, 0.92). Comparison ranking on the larger dataset by both expert and nonexpert graders demonstrated good correlation (mean CC, 0.97; range, 0.95-0.98). The i-ROP system was able to model this continuous severity with good correlation (CC, 0.86). Experts diagnose plus disease on a continuum, with poor absolute agreement on classification but good relative agreement on disease severity. These results suggest that the use of pairwise rankings and a continuous severity score, such as that provided by the i-ROP system, may improve agreement on disease severity in the future. Copyright © 2016 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.

  14. Ranking targets in structure-based virtual screening of three-dimensional protein libraries: methods and problems.

    PubMed

    Kellenberger, Esther; Foata, Nicolas; Rognan, Didier

    2008-05-01

    Structure-based virtual screening is a promising tool to identify putative targets for a specific ligand. Instead of docking multiple ligands into a single protein cavity, a single ligand is docked in a collection of binding sites. In inverse screening, hits are in fact targets which have been prioritized within the pool of best ranked proteins. The target rate depends on specificity and promiscuity in protein-ligand interactions and, to a considerable extent, on the effectiveness of the scoring function, which still is the Achilles' heel of molecular docking. In the present retrospective study, virtual screening of the sc-PDB target library by GOLD docking was carried out for four compounds (biotin, 4-hydroxy-tamoxifen, 6-hydroxy-1,6-dihydropurine ribonucleoside, and methotrexate) of known sc-PDB targets and, several ranking protocols based on GOLD fitness score and topological molecular interaction fingerprint (IFP) comparison were evaluated. For the four investigated ligands, the fusion of GOLD fitness and two IFP scores allowed the recovery of most targets, including the rare proteins which are not readily suitable for statistical analysis, while significantly filtering out most false positive entries. The current survey suggests that selecting a small number of targets (<20) for experimental evaluation is achievable with a pure structure-based approach.

  15. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method.

    PubMed

    Li, Yaohang; Rata, Ionel; Chiu, See-wing; Jakobsson, Eric

    2010-07-20

    Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of approximately 20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.

  16. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method

    PubMed Central

    2010-01-01

    Background Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. Results We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. Conclusions By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set. PMID:20642859

  17. Reproducibility-optimized test statistic for ranking genes in microarray studies.

    PubMed

    Elo, Laura L; Filén, Sanna; Lahesmaa, Riitta; Aittokallio, Tero

    2008-01-01

    A principal goal of microarray studies is to identify the genes showing differential expression under distinct conditions. In such studies, the selection of an optimal test statistic is a crucial challenge, which depends on the type and amount of data under analysis. While previous studies on simulated or spike-in datasets do not provide practical guidance on how to choose the best method for a given real dataset, we introduce an enhanced reproducibility-optimization procedure, which enables the selection of a suitable gene- anking statistic directly from the data. In comparison with existing ranking methods, the reproducibilityoptimized statistic shows good performance consistently under various simulated conditions and on Affymetrix spike-in dataset. Further, the feasibility of the novel statistic is confirmed in a practical research setting using data from an in-house cDNA microarray study of asthma-related gene expression changes. These results suggest that the procedure facilitates the selection of an appropriate test statistic for a given dataset without relying on a priori assumptions, which may bias the findings and their interpretation. Moreover, the general reproducibilityoptimization procedure is not limited to detecting differential expression only but could be extended to a wide range of other applications as well.

  18. Can Percentiles Replace Raw Scores in the Statistical Analysis of Test Data?

    ERIC Educational Resources Information Center

    Zimmerman, Donald W.; Zumbo, Bruno D.

    2005-01-01

    Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…

  19. Incorporating uncertainty into the ranking of SPARROW model nutrient yields from Mississippi/Atchafalaya River basin watersheds

    USGS Publications Warehouse

    Robertson, Dale M.; Schwarz, Gregory E.; Saad, David A.; Alexander, Richard B.

    2009-01-01

    Excessive loads of nutrients transported by tributary rivers have been linked to hypoxia in the Gulf of Mexico. Management efforts to reduce the hypoxic zone in the Gulf of Mexico and improve the water quality of rivers and streams could benefit from targeting nutrient reductions toward watersheds with the highest nutrient yields delivered to sensitive downstream waters. One challenge is that most conventional watershed modeling approaches (e.g., mechanistic models) used in these management decisions do not consider uncertainties in the predictions of nutrient yields and their downstream delivery. The increasing use of parameter estimation procedures to statistically estimate model coefficients, however, allows uncertainties in these predictions to be reliably estimated. Here, we use a robust bootstrapping procedure applied to the results of a previous application of the hybrid statistical/mechanistic watershed model SPARROW (Spatially Referenced Regression On Watershed attributes) to develop a statistically reliable method for identifying “high priority” areas for management, based on a probabilistic ranking of delivered nutrient yields from watersheds throughout a basin. The method is designed to be used by managers to prioritize watersheds where additional stream monitoring and evaluations of nutrient-reduction strategies could be undertaken. Our ranking procedure incorporates information on the confidence intervals of model predictions and the corresponding watershed rankings of the delivered nutrient yields. From this quantified uncertainty, we estimate the probability that individual watersheds are among a collection of watersheds that have the highest delivered nutrient yields. We illustrate the application of the procedure to 818 eight-digit Hydrologic Unit Code watersheds in the Mississippi/Atchafalaya River basin by identifying 150 watersheds having the highest delivered nutrient yields to the Gulf of Mexico. Highest delivered yields were from watersheds in the Central Mississippi, Ohio, and Lower Mississippi River basins. With 90% confidence, only a few watersheds can be reliably placed into the highest 150 category; however, many more watersheds can be removed from consideration as not belonging to the highest 150 category. Results from this ranking procedure provide robust information on watershed nutrient yields that can benefit management efforts to reduce nutrient loadings to downstream coastal waters, such as the Gulf of Mexico, or to local receiving streams and reservoirs.

  20. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  1. Assessing groundwater vulnerability to agrichemical contamination in the Midwest US

    USGS Publications Warehouse

    Burkart, M.R.; Kolpin, D.W.; James, D.E.

    1999-01-01

    Agrichemicals (herbicides and nitrate) are significant sources of diffuse pollution to groundwater. Indirect methods are needed to assess the potential for groundwater contamination by diffuse sources because groundwater monitoring is too costly to adequately define the geographic extent of contamination at a regional or national scale. This paper presents examples of the application of statistical, overlay and index, and process-based modeling methods for groundwater vulnerability assessments to a variety of data from the Midwest U.S. The principles for vulnerability assessment include both intrinsic (pedologic, climatologic, and hydrogeologic factors) and specific (contaminant and other anthropogenic factors) vulnerability of a location. Statistical methods use the frequency of contaminant occurrence, contaminant concentration, or contamination probability as a response variable. Statistical assessments are useful for defining the relations among explanatory and response variables whether they define intrinsic or specific vulnerability. Multivariate statistical analyses are useful for ranking variables critical to estimating water quality responses of interest. Overlay and index methods involve intersecting maps of intrinsic and specific vulnerability properties and indexing the variables by applying appropriate weights. Deterministic models use process-based equations to simulate contaminant transport and are distinguished from the other methods in their potential to predict contaminant transport in both space and time. An example of a one-dimensional leaching model linked to a geographic information system (GIS) to define a regional metamodel for contamination in the Midwest is included.

  2. A Ranking Approach on Large-Scale Graph With Multidimensional Heterogeneous Information.

    PubMed

    Wei, Wei; Gao, Bin; Liu, Tie-Yan; Wang, Taifeng; Li, Guohui; Li, Hang

    2016-04-01

    Graph-based ranking has been extensively studied and frequently applied in many applications, such as webpage ranking. It aims at mining potentially valuable information from the raw graph-structured data. Recently, with the proliferation of rich heterogeneous information (e.g., node/edge features and prior knowledge) available in many real-world graphs, how to effectively and efficiently leverage all information to improve the ranking performance becomes a new challenging problem. Previous methods only utilize part of such information and attempt to rank graph nodes according to link-based methods, of which the ranking performances are severely affected by several well-known issues, e.g., over-fitting or high computational complexity, especially when the scale of graph is very large. In this paper, we address the large-scale graph-based ranking problem and focus on how to effectively exploit rich heterogeneous information of the graph to improve the ranking performance. Specifically, we propose an innovative and effective semi-supervised PageRank (SSP) approach to parameterize the derived information within a unified semi-supervised learning framework (SSLF-GR), then simultaneously optimize the parameters and the ranking scores of graph nodes. Experiments on the real-world large-scale graphs demonstrate that our method significantly outperforms the algorithms that consider such graph information only partially.

  3. Natural Time, Nowcasting and the Physics of Earthquakes: Estimation of Seismic Risk to Global Megacities

    NASA Astrophysics Data System (ADS)

    Rundle, John B.; Luginbuhl, Molly; Giguere, Alexis; Turcotte, Donald L.

    2018-02-01

    Natural Time ("NT") refers to the concept of using small earthquake counts, for example of M > 3 events, to mark the intervals between large earthquakes, for example M > 6 events. The term was first used by Varotsos et al. (2005) and later by Holliday et al. (2006) in their studies of earthquakes. In this paper, we discuss ideas and applications arising from the use of NT to understand earthquake dynamics, in particular by use of the idea of nowcasting. Nowcasting differs from forecasting, in that the goal of nowcasting is to estimate the current state of the system, rather than the probability of a future event. Rather than focus on an individual earthquake faults, we focus on a defined local geographic region surrounding a particular location. This local region is considered to be embedded in a larger regional setting from which we accumulate the relevant statistics. We apply the nowcasting idea to the practical development of methods to estimate the current state of risk for dozens of the world's seismically exposed megacities, defined as cities having populations of over 1 million persons. We compute a ranking of these cities based on their current nowcast value, and discuss the advantages and limitations of this approach. We note explicitly that the nowcast method is not a model, in that there are no free parameters to be fit to data. Rather, the method is simply a presentation of statistical data, which the user can interpret. Among other results, we find, for example, that the current nowcast ranking of the Los Angeles region is comparable to its ranking just prior to the January 17, 1994 Northridge earthquake.

  4. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

    PubMed Central

    Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.

    2015-01-01

    Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Eliazar, Iddo, E-mail: eliazar@post.tau.ac.il

    Rank distributions are collections of positive sizes ordered either increasingly or decreasingly. Many decreasing rank distributions, formed by the collective collaboration of human actions, follow an inverse power-law relation between ranks and sizes. This remarkable empirical fact is termed Zipf’s law, and one of its quintessential manifestations is the demography of human settlements — which exhibits a harmonic relation between ranks and sizes. In this paper we present a comprehensive statistical-physics analysis of rank distributions, establish that power-law and exponential rank distributions stand out as optimal in various entropy-based senses, and unveil the special role of the harmonic relation betweenmore » ranks and sizes. Our results extend the contemporary entropy-maximization view of Zipf’s law to a broader, panoramic, Gibbsian perspective of increasing and decreasing power-law and exponential rank distributions — of which Zipf’s law is one out of four pillars.« less

  6. [Value influence of different compatibilities of main active parts in yangyintongnao granule on pharmacokinetics parameters in rats with cerebral ischemia reperfusion injury by total amount statistic moment method].

    PubMed

    Guo, Ying; Yang, Jiehong; Znang, Hengyi; Fu, Xuchun; Zhnag, Yuyan; Wan, Haitong

    2010-02-01

    To study the influence of the different combinations of the main active parts in Yangyintongnao granule on the pharmacokinetics parameters of the two active components--ligustrazine and puerarin using the method of total amount statistic moment for pharmacokinetics. Combinations were formed according to the dosages of the four active parts (alkaloid, flavone, saponin, naphtha) by orthogonal experiment L9 (3(4)). Blood concentrations of ligustrazine and puerarin were determinated by HPLC at different time. Zero rank moment (AUC) and one rank moment (MRT, mean residence time) of ligustrazine and puerarin have been worked out to calculate the total amount statistic moment parameters was analyzed of Yangyintongnao granule by the method of the total amount statistic moment. The influence of different compatibilities on the pharmacokinetics parameters was analyzed by orthogonal test. Flavone has the strongest effect than saponin on the total AUC. Ligustrazine has the strongest effect on the total MRT. Saponin has little effect on the two parameters, but naphtha has more effect on both of them. It indicates that naphtha may promote metabolism of ligustrazine and puerarin in rat. Total amount statistic moment parameters can be used to guide for compatibilities of TCM.

  7. Distributional fold change test – a statistical approach for detecting differential expression in microarray experiments

    PubMed Central

    2012-01-01

    Background Because of the large volume of data and the intrinsic variation of data intensity observed in microarray experiments, different statistical methods have been used to systematically extract biological information and to quantify the associated uncertainty. The simplest method to identify differentially expressed genes is to evaluate the ratio of average intensities in two different conditions and consider all genes that differ by more than an arbitrary cut-off value to be differentially expressed. This filtering approach is not a statistical test and there is no associated value that can indicate the level of confidence in the designation of genes as differentially expressed or not differentially expressed. At the same time the fold change by itself provide valuable information and it is important to find unambiguous ways of using this information in expression data treatment. Results A new method of finding differentially expressed genes, called distributional fold change (DFC) test is introduced. The method is based on an analysis of the intensity distribution of all microarray probe sets mapped to a three dimensional feature space composed of average expression level, average difference of gene expression and total variance. The proposed method allows one to rank each feature based on the signal-to-noise ratio and to ascertain for each feature the confidence level and power for being differentially expressed. The performance of the new method was evaluated using the total and partial area under receiver operating curves and tested on 11 data sets from Gene Omnibus Database with independently verified differentially expressed genes and compared with the t-test and shrinkage t-test. Overall the DFC test performed the best – on average it had higher sensitivity and partial AUC and its elevation was most prominent in the low range of differentially expressed features, typical for formalin-fixed paraffin-embedded sample sets. Conclusions The distributional fold change test is an effective method for finding and ranking differentially expressed probesets on microarrays. The application of this test is advantageous to data sets using formalin-fixed paraffin-embedded samples or other systems where degradation effects diminish the applicability of correlation adjusted methods to the whole feature set. PMID:23122055

  8. Cancer Survival Estimates Due to Non-Uniform Loss to Follow-Up and Non-Proportional Hazards

    PubMed

    K M, Jagathnath Krishna; Mathew, Aleyamma; Sara George, Preethi

    2017-06-25

    Background: Cancer survival depends on loss to follow-up (LFU) and non-proportional hazards (non-PH). If LFU is high, survival will be over-estimated. If hazard is non-PH, rank tests will provide biased inference and Cox-model will provide biased hazard-ratio. We assessed the bias due to LFU and non-PH factor in cancer survival and provided alternate methods for unbiased inference and hazard-ratio. Materials and Methods: Kaplan-Meier survival were plotted using a realistic breast cancer (BC) data-set, with >40%, 5-year LFU and compared it using another BC data-set with <15%, 5-year LFU to assess the bias in survival due to high LFU. Age at diagnosis of the latter data set was used to illustrate the bias due to a non-PH factor. Log-rank test was employed to assess the bias in p-value and Cox-model was used to assess the bias in hazard-ratio for the non-PH factor. Schoenfeld statistic was used to test the non-PH of age. For the non-PH factor, we employed Renyi statistic for inference and time dependent Cox-model for hazard-ratio. Results: Five-year BC survival was 69% (SE: 1.1%) vs. 90% (SE: 0.7%) for data with low vs. high LFU respectively. Age (<45, 46-54 & >54 years) was a non-PH factor (p-value: 0.036). However, survival by age was significant (log-rank p-value: 0.026), but not significant using Renyi statistic (p=0.067). Hazard ratio (HR) for age using Cox-model was 1.012 (95%CI: 1.004 -1.019) and the same using time-dependent Cox-model was in the other direction (HR: 0.997; 95% CI: 0.997- 0.998). Conclusion: Over-estimated survival was observed for cancer with high LFU. Log-rank statistic and Cox-model provided biased results for non-PH factor. For data with non-PH factors, Renyi statistic and time dependent Cox-model can be used as alternate methods to obtain unbiased inference and estimates. Creative Commons Attribution License

  9. Sensitivity ranking for freshwater invertebrates towards hydrocarbon contaminants.

    PubMed

    Gerner, Nadine V; Cailleaud, Kevin; Bassères, Anne; Liess, Matthias; Beketov, Mikhail A

    2017-11-01

    Hydrocarbons have an utmost economical importance but may also cause substantial ecological impacts due to accidents or inadequate transportation and use. Currently, freshwater biomonitoring methods lack an indicator that can unequivocally reflect the impacts caused by hydrocarbons while being independent from effects of other stressors. The aim of the present study was to develop a sensitivity ranking for freshwater invertebrates towards hydrocarbon contaminants, which can be used in hydrocarbon-specific bioindicators. We employed the Relative Sensitivity method and developed the sensitivity ranking S hydrocarbons based on literature ecotoxicological data supplemented with rapid and mesocosm test results. A first validation of the sensitivity ranking based on an earlier field study has been conducted and revealed the S hydrocarbons ranking to be promising for application in sensitivity based indicators. Thus, the first results indicate that the ranking can serve as the core component of future hydrocarbon-specific and sensitivity trait based bioindicators.

  10. Five radiographic methods for assessing skeletal maturity in a Spanish population: is there a correlation?

    PubMed

    Camacho-Basallo, Paula; Yáñez-Vico, Rosa-María; Solano-Reina, Enrique; Iglesias-Linares, Alejandro

    2017-03-01

    The need for accurate techniques of estimating age has sharply increased in line with the rise in illegal migration and the political, economic and socio-demographic problems that this poses in developed countries today. The methods routinely employed for determining chronological age are mainly based on determining skeletal maturation using radiological techniques. The objective of this study was to correlate five different methods for assessing skeletal maturation. 606 radiographs of growing patients were analyzed, and each patient was classified according to two cervical vertebral-based methods, two hand-wrist-based methods and one tooth-based method. Spearman's rank-order correlation coefficient was applied to assess the relationship between chronological age and the five methods of assessing maturation, as well as correlations between the five methods (p < 0.05). Spearman's rank correlation coefficients for chronological age and cervical vertebral maturation stage using both methods were 0.656/0.693 (p < 0.001), respectively, for males. For females, the correlation was stronger for both methods. The correlation coefficients for chronological age against the two hand-wrist assessment methods were statistically significant only for Fishman's method, 0.722 (p < 0.001) and 0.839 (p < 0.001), respectively for males and females. The cervical vertebral, hand-wrist and dental maturation methods of assessment were all found to correlate strongly with each other, irrespective of gender, except for Grave and Brown's method. The results found the strongest correlation between the second molars and females, and the second premolar and males. This study sheds light on and correlates with the five radiographic methods most commonly used for assessing skeletal maturation in a Spanish population in southern Europe.

  11. Probabilistic Analysis for Comparing Fatigue Data Based on Johnson-Weibull Parameters

    NASA Technical Reports Server (NTRS)

    Hendricks, Robert C.; Zaretsky, Erwin V.; Vicek, Brian L.

    2007-01-01

    Probabilistic failure analysis is essential when analysis of stress-life (S-N) curves is inconclusive in determining the relative ranking of two or more materials. In 1964, L. Johnson published a methodology for establishing the confidence that two populations of data are different. Simplified algebraic equations for confidence numbers were derived based on the original work of L. Johnson. Using the ratios of mean life, the resultant values of confidence numbers deviated less than one percent from those of Johnson. It is possible to rank the fatigue lives of different materials with a reasonable degree of statistical certainty based on combined confidence numbers. These equations were applied to rotating beam fatigue tests that were conducted on three aluminum alloys at three stress levels each. These alloys were AL 2024, AL 6061, and AL 7075. The results were analyzed and compared using ASTM Standard E739-91 and the Johnson-Weibull analysis. The ASTM method did not statistically distinguish between AL 6010 and AL 7075. Based on the Johnson-Weibull analysis confidence numbers greater than 99 percent, AL 2024 was found to have the longest fatigue life, followed by AL 7075, and then AL 6061. The ASTM Standard and the Johnson-Weibull analysis result in the same stress-life exponent p for each of the three aluminum alloys at the median or L(sub 50) lives.

  12. Adaptive linear rank tests for eQTL studies

    PubMed Central

    Szymczak, Silke; Scheinhardt, Markus O.; Zeller, Tanja; Wild, Philipp S.; Blankenberg, Stefan; Ziegler, Andreas

    2013-01-01

    Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal–Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies. PMID:22933317

  13. Adaptive linear rank tests for eQTL studies.

    PubMed

    Szymczak, Silke; Scheinhardt, Markus O; Zeller, Tanja; Wild, Philipp S; Blankenberg, Stefan; Ziegler, Andreas

    2013-02-10

    Expression quantitative trait loci (eQTL) studies are performed to identify single-nucleotide polymorphisms that modify average expression values of genes, proteins, or metabolites, depending on the genotype. As expression values are often not normally distributed, statistical methods for eQTL studies should be valid and powerful in these situations. Adaptive tests are promising alternatives to standard approaches, such as the analysis of variance or the Kruskal-Wallis test. In a two-stage procedure, skewness and tail length of the distributions are estimated and used to select one of several linear rank tests. In this study, we compare two adaptive tests that were proposed in the literature using extensive Monte Carlo simulations of a wide range of different symmetric and skewed distributions. We derive a new adaptive test that combines the advantages of both literature-based approaches. The new test does not require the user to specify a distribution. It is slightly less powerful than the locally most powerful rank test for the correct distribution and at least as powerful as the maximin efficiency robust rank test. We illustrate the application of all tests using two examples from different eQTL studies. Copyright © 2012 John Wiley & Sons, Ltd.

  14. Exact and Monte carlo resampling procedures for the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests.

    PubMed

    Berry, K J; Mielke, P W

    2000-12-01

    Exact and Monte Carlo resampling FORTRAN programs are described for the Wilcoxon-Mann-Whitney rank sum test and the Kruskal-Wallis one-way analysis of variance for ranks test. The program algorithms compensate for tied values and do not depend on asymptotic approximations for probability values, unlike most algorithms contained in PC-based statistical software packages.

  15. An M-estimator for reduced-rank system identification.

    PubMed

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S; Vogelstein, Joshua T

    2017-01-15

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ 1 and ℓ 2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models.

  16. An M-estimator for reduced-rank system identification

    PubMed Central

    Chen, Shaojie; Liu, Kai; Yang, Yuguang; Xu, Yuting; Lee, Seonjoo; Lindquist, Martin; Caffo, Brian S.; Vogelstein, Joshua T.

    2018-01-01

    High-dimensional time-series data from a wide variety of domains, such as neuroscience, are being generated every day. Fitting statistical models to such data, to enable parameter estimation and time-series prediction, is an important computational primitive. Existing methods, however, are unable to cope with the high-dimensional nature of these data, due to both computational and statistical reasons. We mitigate both kinds of issues by proposing an M-estimator for Reduced-rank System IDentification ( MR. SID). A combination of low-rank approximations, ℓ1 and ℓ2 penalties, and some numerical linear algebra tricks, yields an estimator that is computationally efficient and numerically stable. Simulations and real data examples demonstrate the usefulness of this approach in a variety of problems. In particular, we demonstrate that MR. SID can accurately estimate spatial filters, connectivity graphs, and time-courses from native resolution functional magnetic resonance imaging data. MR. SID therefore enables big time-series data to be analyzed using standard methods, readying the field for further generalizations including non-linear and non-Gaussian state-space models. PMID:29391659

  17. Are your covariates under control? How normalization can re-introduce covariate effects.

    PubMed

    Pain, Oliver; Dudbridge, Frank; Ronald, Angelica

    2018-04-30

    Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.

  18. Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data.

    PubMed

    Tekwe, Carmen D; Carroll, Raymond J; Dabney, Alan R

    2012-08-01

    Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets. Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions. The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials. ctekwe@stat.tamu.edu.

  19. Comparison of unitary associations and probabilistic ranking and scaling as applied to mesozoic radiolarians

    NASA Astrophysics Data System (ADS)

    Baumgartner, Peter O.

    A database on Middle Jurassic-Early Cretaceous radiolarians consisting of first and final occurrences of 110 species in 226 samples from 43 localities was used to compute Unitary Associations and probabilistic ranking and scaling (RASC), in order to test deterministic versus probabilistic quantitative biostratigraphic methods. Because the Mesozoic radiolarian fossil record is mainly dissolution-controlled, the sequence of events differs greatly from section to section. The scatter of local first and final appearances along a time scale is large compared to the species range; it is asymmetrical, with a maximum near the ends of the range and it is non-random. Thus, these data do not satisfy the statistical assumptions made in ranking and scaling. Unitary Associations produce maximum ranges of the species relative to each other by stacking cooccurrence data from all sections and therefore compensate for the local dissolution effects. Ranking and scaling, based on the assumption of a normal random distribution of the events, produces average ranges which are for most species much shorter than the maximum UA-ranges. There are, however, a number of species with similar ranges in both solutions. These species are believed to be the most dissolution-resistant and, therefore, the most reliable ones for the definition of biochronozones. The comparison of maximum and average ranges may be a powerful tool to test reliability of species for biochronology. Dissolution-controlled fossil data yield high crossover frequencies and therefore small, statistically insignificant interfossil distances. Scaling has not produced a useful sequence for this type of data.

  20. A Ranking Approach to Genomic Selection.

    PubMed

    Blondel, Mathieu; Onogi, Akio; Iwata, Hiroyoshi; Ueda, Naonori

    2015-01-01

    Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual's breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.

  1. Expert opinion on landslide susceptibility elicted by probabilistic inversion from scenario rankings

    NASA Astrophysics Data System (ADS)

    Lee, Katy; Dashwood, Claire; Lark, Murray

    2016-04-01

    For many natural hazards the opinion of experts, with experience in assessing susceptibility under different circumstances, is a valuable source of information on which to base risk assessments. This is particularly important where incomplete process understanding, and limited data, limit the scope to predict susceptibility by mechanistic or statistical modelling. The expert has a tacit model of a system, based on their understanding of processes and their field experience. This model may vary in quality, depending on the experience of the expert. There is considerable interest in how one may elicit expert understanding by a process which is transparent and robust, to provide a basis for decision support. One approach is to provide experts with a set of scenarios, and then to ask them to rank small overlapping subsets of these with respect to susceptibility. Methods of probabilistic inversion have been used to compute susceptibility scores for each scenario, implicit in the expert ranking. It is also possible to model these scores as functions of measurable properties of the scenarios. This approach has been used to assess susceptibility of animal populations to invasive diseases, to assess risk to vulnerable marine environments and to assess the risk in hypothetical novel technologies for food production. We will present the results of a study in which a group of geologists with varying degrees of expertise in assessing landslide hazards were asked to rank sets of hypothetical simplified scenarios with respect to land slide susceptibility. We examine the consistency of their rankings and the importance of different properties of the scenarios in the tacit susceptibility model that their rankings implied. Our results suggest that this is a promising approach to the problem of how experts can communicate their tacit model of uncertain systems to those who want to make use of their expertise.

  2. TCC: an R package for comparing tag count data with robust normalization strategies

    PubMed Central

    2013-01-01

    Background Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. Results TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. Conclusion DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. PMID:23837715

  3. Weighted Discriminative Dictionary Learning based on Low-rank Representation

    NASA Astrophysics Data System (ADS)

    Chang, Heyou; Zheng, Hao

    2017-01-01

    Low-rank representation has been widely used in the field of pattern classification, especially when both training and testing images are corrupted with large noise. Dictionary plays an important role in low-rank representation. With respect to the semantic dictionary, the optimal representation matrix should be block-diagonal. However, traditional low-rank representation based dictionary learning methods cannot effectively exploit the discriminative information between data and dictionary. To address this problem, this paper proposed weighted discriminative dictionary learning based on low-rank representation, where a weighted representation regularization term is constructed. The regularization associates label information of both training samples and dictionary atoms, and encourages to generate a discriminative representation with class-wise block-diagonal structure, which can further improve the classification performance where both training and testing images are corrupted with large noise. Experimental results demonstrate advantages of the proposed method over the state-of-the-art methods.

  4. Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson--Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package PRIMsrc (Patient Rule Induction Method in Survival, Regression and Classification settings) is available on CRAN (Comprehensive R Archive Network) and GitHub. PMID:27034730

  5. Multiple graph regularized protein domain ranking.

    PubMed

    Wang, Jim Jing-Yan; Bensmail, Halima; Gao, Xin

    2012-11-19

    Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.

  6. Multiple graph regularized protein domain ranking

    PubMed Central

    2012-01-01

    Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG-Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an objective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications. PMID:23157331

  7. What weather variables are important in predicting heat-related mortality? A new application of statistical learning methods

    PubMed Central

    Zhang, Kai; Li, Yun; Schwartz, Joel D.; O'Neill, Marie S.

    2014-01-01

    Hot weather increases risk of mortality. Previous studies used different sets of weather variables to characterize heat stress, resulting in variation in heat-mortality- associations depending on the metric used. We employed a statistical learning method – random forests – to examine which of various weather variables had the greatest impact on heat-related mortality. We compiled a summertime daily weather and mortality counts dataset from four U.S. cities (Chicago, IL; Detroit, MI; Philadelphia, PA; and Phoenix, AZ) from 1998 to 2006. A variety of weather variables were ranked in predicting deviation from typical daily all-cause and cause-specific death counts. Ranks of weather variables varied with city and health outcome. Apparent temperature appeared to be the most important predictor of heat-related mortality for all-cause mortality. Absolute humidity was, on average, most frequently selected one of the top variables for all-cause mortality and seven cause-specific mortality categories. Our analysis affirms that apparent temperature is a reasonable variable for activating heat alerts and warnings, which are commonly based on predictions of total mortality in next few days. Additionally, absolute humidity should be included in future heat-health studies. Finally, random forests can be used to guide choice of weather variables in heat epidemiology studies. PMID:24834832

  8. A test of concordance between patient and psychiatrist valuations of multiple treatment goals for schizophrenia

    PubMed Central

    Bridges, John F. P.; Slawik, Lara; Schmeding, Annette; Reimer, Jens; Naber, Dieter; Kuhnigk, Olaf

    2011-01-01

    Abstract Background  While much discussion has been placed on the problem of poor compliance in the treatment of schizophrenia, there has been little discussion on the concordance between patients and psychiatrists, an important contributing factor to patient‐centred care. Objective  To estimate the concordance between patients’ and psychiatrists’ (ordinal and cardinal) valuations of multiple goals for schizophrenia treatment and to illustrate the utility of the self‐explicated method in valuing a large number of treatment goals. Design  Twenty treatment goals were identified during focus groups and literature review and were presented to patients and psychiatrists during structured interviews. Respondents were asked to rank the multiple treatment goals and rate them on a 5‐point Likert scale. Three scores were calculated based on the ranking (1–20), rating (Likert scale) (1–5) and a self‐explicated method estimated as the product of rating and ranking score (1–100). Concordance was tested using Spearman’s rho for overall ordinal rankings and via anova and F‐test for the cardinal values assigned to a specific treatment goal. Participants  A total of 105 outpatients diagnosed with schizophrenia and 160 psychiatrists in Germany. Results  Patient and psychiatrist values were concordant when the ordinal properties of their valuations were assessed by rating (ρ = 0.63; P = 0.002), ranking (ρ = 0.51; P = 0.02) and self‐explicated methods (ρ = 0.54; P = 0.01). Significant discordances were found when comparing the cardinal value placed on any given treatment goal using all three approaches, but the self‐explicated method produced a more discerning statistic. Relative to patients, psychiatrists significantly (P < 0.05) overvalued reduced lack of emotion, improved sexual pleasure and improved communication while undervaluing reuptake of activities of daily living, improved satisfaction and recovered capacity for work. Conclusions  While there is an overall concordance between patients’ and psychiatrists’ valuation, significantly different valuations on specific goals can be identified. Here, psychiatrists tend to focus on ‘textbook’ outcomes, while patients are more concerned with functioning and living a normal life. This study also demonstrates the importance of comparing the concordance in treatment goals and the importance of preference‐based methods, such as the self‐explicated method, in the study of concordance. PMID:21668795

  9. Complete hazard ranking to analyze right-censored data: An ALS survival study.

    PubMed

    Huang, Zhengnan; Zhang, Hongjiu; Boss, Jonathan; Goutman, Stephen A; Mukherjee, Bhramar; Dinov, Ivo D; Guan, Yuanfang

    2017-12-01

    Survival analysis represents an important outcome measure in clinical research and clinical trials; further, survival ranking may offer additional advantages in clinical trials. In this study, we developed GuanRank, a non-parametric ranking-based technique to transform patients' survival data into a linear space of hazard ranks. The transformation enables the utilization of machine learning base-learners including Gaussian process regression, Lasso, and random forest on survival data. The method was submitted to the DREAM Amyotrophic Lateral Sclerosis (ALS) Stratification Challenge. Ranked first place, the model gave more accurate ranking predictions on the PRO-ACT ALS dataset in comparison to Cox proportional hazard model. By utilizing right-censored data in its training process, the method demonstrated its state-of-the-art predictive power in ALS survival ranking. Its feature selection identified multiple important factors, some of which conflicts with previous studies.

  10. BridgeRank: A novel fast centrality measure based on local structure of the network

    NASA Astrophysics Data System (ADS)

    Salavati, Chiman; Abdollahpouri, Alireza; Manbari, Zhaleh

    2018-04-01

    Ranking nodes in complex networks have become an important task in many application domains. In a complex network, influential nodes are those that have the most spreading ability. Thus, identifying influential nodes based on their spreading ability is a fundamental task in different applications such as viral marketing. One of the most important centrality measures to ranking nodes is closeness centrality which is efficient but suffers from high computational complexity O(n3) . This paper tries to improve closeness centrality by utilizing the local structure of nodes and presents a new ranking algorithm, called BridgeRank centrality. The proposed method computes local centrality value for each node. For this purpose, at first, communities are detected and the relationship between communities is completely ignored. Then, by applying a centrality in each community, only one best critical node from each community is extracted. Finally, the nodes are ranked based on computing the sum of the shortest path length of nodes to obtained critical nodes. We have also modified the proposed method by weighting the original BridgeRank and selecting several nodes from each community based on the density of that community. Our method can find the best nodes with high spread ability and low time complexity, which make it applicable to large-scale networks. To evaluate the performance of the proposed method, we use the SIR diffusion model. Finally, experiments on real and artificial networks show that our method is able to identify influential nodes so efficiently, and achieves better performance compared to other recent methods.

  11. Chemical and toxicological evaluation of underground coal gasification (UCG) effluents. The coal rank effect.

    PubMed

    Kapusta, Krzysztof; Stańczyk, Krzysztof

    2015-02-01

    The effect of coal rank on the composition and toxicity of water effluents resulting from two underground coal gasification experiments with distinct coal samples (lignite and hard coal) was investigated. A broad range of organic and inorganic parameters was determined in the sampled condensates. The physicochemical tests were supplemented by toxicity bioassays based on the luminescent bacteria Vibrio fischeri as the test organism. The principal component analysis and Pearson correlation analysis were adopted to assist in the interpretation of the raw experimental data, and the multiple regression statistical method was subsequently employed to enable predictions of the toxicity based on the values of the selected parameters. Significant differences in the qualitative and quantitative description of the contamination profiles were identified for both types of coal under study. Independent of the coal rank, the most characteristic organic components of the studied condensates were phenols, naphthalene and benzene. In the inorganic array, ammonia, sulphates and selected heavy metals and metalloids were identified as the dominant constituents. Except for benzene with its alkyl homologues (BTEX), selected polycyclic aromatic hydrocarbons (PAHs), zinc and selenium, the values of the remaining parameters were considerably greater for the hard coal condensates. The studies revealed that all of the tested UCG condensates were extremely toxic to V. fischeri; however, the average toxicity level for the hard coal condensates was approximately 56% higher than that obtained for the lignite. The statistical analysis provided results supporting that the toxicity of the condensates was most positively correlated with the concentrations of free ammonia, phenols and certain heavy metals. Copyright © 2014 Elsevier Inc. All rights reserved.

  12. Automatic single questionnaire intensity (SQI, EMS98 scale) estimation using ranking models built on the existing BCSF database

    NASA Astrophysics Data System (ADS)

    Schlupp, A.; Sira, C.; Schmitt, K.; Schaming, M.

    2013-12-01

    In charge of intensity estimations in France, BCSF has collected and manually analyzed more than 47000 online individual macroseismic questionnaires since 2000 up to intensity VI. These macroseismic data allow us to estimate one SQI value (Single Questionnaire Intensity) for each form following the EMS98 scale. The reliability of the automatic intensity estimation is important as they are today used for automatic shakemaps communications and crisis management. Today, the automatic intensity estimation at BCSF is based on the direct use of thumbnails selected on a menu by the witnesses. Each thumbnail corresponds to an EMS-98 intensity value, allowing us to quickly issue an intensity map of the communal intensity by averaging the SQIs at each city. Afterwards an expert, to determine a definitive SQI, manually analyzes each form. This work is time consuming and not anymore suitable considering the increasing number of testimonies at BCSF. Nevertheless, it can take into account incoherent answers. We tested several automatic methods (USGS algorithm, Correlation coefficient, Thumbnails) (Sira et al. 2013, IASPEI) and compared them with 'expert' SQIs. These methods gave us medium score (between 50 to 60% of well SQI determined and 35 to 40% with plus one or minus one intensity degree). The best fit was observed with the thumbnails. Here, we present new approaches based on 3 statistical ranking methods as 1) Multinomial logistic regression model, 2) Discriminant analysis DISQUAL and 3) Support vector machines (SVMs). The two first methods are standard methods, while the third one is more recent. Theses methods could be applied because the BCSF has already in his database more then 47000 forms and because their questions and answers are well adapted for a statistical analysis. The ranking models could then be used as automatic method constrained on expert analysis. The performance of the automatic methods and the reliability of the estimated SQI can be evaluated thanks to the fact that each definitive BCSF SQIs is determined by an expert analysis. We compare the SQIs obtained by these methods from our database and discuss the coherency and variations between automatic and manual processes. These methods lead to high scores with up to 85% of the forms well classified and most of the remaining forms classified with only a shift of one intensity degree. This allows us to use the ranking methods as the best automatic methods to fast SQIs estimation and to produce fast shakemaps. The next step, to improve the use of these methods, will be to identify explanations for the forms not classified at the correct value and a way to select the few remaining forms that should be analyzed by the expert. Note that beyond intensity VI, on-line questionnaires are insufficient and a field survey is indispensable to estimate intensity. For such survey, in France, BCSF leads a macroseismic intervention group (GIM).

  13. Probabilistic Analysis for Comparing Fatigue Data Based on Johnson-Weibull Parameters

    NASA Technical Reports Server (NTRS)

    Vlcek, Brian L.; Hendricks, Robert C.; Zaretsky, Erwin V.

    2013-01-01

    Leonard Johnson published a methodology for establishing the confidence that two populations of data are different. Johnson's methodology is dependent on limited combinations of test parameters (Weibull slope, mean life ratio, and degrees of freedom) and a set of complex mathematical equations. In this report, a simplified algebraic equation for confidence numbers is derived based on the original work of Johnson. The confidence numbers calculated with this equation are compared to those obtained graphically by Johnson. Using the ratios of mean life, the resultant values of confidence numbers at the 99 percent level deviate less than 1 percent from those of Johnson. At a 90 percent confidence level, the calculated values differ between +2 and 4 percent. The simplified equation is used to rank the experimental lives of three aluminum alloys (AL 2024, AL 6061, and AL 7075), each tested at three stress levels in rotating beam fatigue, analyzed using the Johnson- Weibull method, and compared to the ASTM Standard (E739 91) method of comparison. The ASTM Standard did not statistically distinguish between AL 6061 and AL 7075. However, it is possible to rank the fatigue lives of different materials with a reasonable degree of statistical certainty based on combined confidence numbers using the Johnson- Weibull analysis. AL 2024 was found to have the longest fatigue life, followed by AL 7075, and then AL 6061. The ASTM Standard and the Johnson-Weibull analysis result in the same stress-life exponent p for each of the three aluminum alloys at the median, or L(sub 50), lives

  14. Constrained low-rank matrix estimation: phase transitions, approximate message passing and applications

    NASA Astrophysics Data System (ADS)

    Lesieur, Thibault; Krzakala, Florent; Zdeborová, Lenka

    2017-07-01

    This article is an extended version of previous work of Lesieur et al (2015 IEEE Int. Symp. on Information Theory Proc. pp 1635-9 and 2015 53rd Annual Allerton Conf. on Communication, Control and Computing (IEEE) pp 680-7) on low-rank matrix estimation in the presence of constraints on the factors into which the matrix is factorized. Low-rank matrix factorization is one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a parallel with the study of vector-spin glass models—presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail a general form of the low-rank approximate message passing (Low-RAMP) algorithm, that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods.

  15. University and student segmentation: multilevel latent-class analysis of students' attitudes towards research methods and statistics.

    PubMed

    Mutz, Rüdiger; Daniel, Hans-Dieter

    2013-06-01

    It is often claimed that psychology students' attitudes towards research methods and statistics affect course enrollment, persistence, achievement, and course climate. However, the inter-institutional variability has been widely neglected in the research on students' attitudes towards research methods and statistics, but it is important for didactic purposes (heterogeneity of the student population). The paper presents a scale based on findings of the social psychology of attitudes (polar and emotion-based concept) in conjunction with a method for capturing beginning university students' attitudes towards research methods and statistics and identifying the proportion of students having positive attitudes at the institutional level. The study based on a re-analysis of a nationwide survey in Germany in August 2000 of all psychology students that enrolled in fall 1999/2000 (N= 1,490) and N= 44 universities. Using multilevel latent-class analysis (MLLCA), the aim was to group students in different student attitude types and at the same time to obtain university segments based on the incidences of the different student attitude types. Four student latent clusters were found that can be ranked on a bipolar attitude dimension. Membership in a cluster was predicted by age, grade point average (GPA) on school-leaving exam, and personality traits. In addition, two university segments were found: universities with an average proportion of students with positive attitudes and universities with a high proportion of students with positive attitudes (excellent segment). As psychology students make up a very heterogeneous group, the use of multiple learning activities as opposed to the classical lecture course is required. © 2011 The British Psychological Society.

  16. Population models and simulation methods: The case of the Spearman rank correlation.

    PubMed

    Astivia, Oscar L Olvera; Zumbo, Bruno D

    2017-11-01

    The purpose of this paper is to highlight the importance of a population model in guiding the design and interpretation of simulation studies used to investigate the Spearman rank correlation. The Spearman rank correlation has been known for over a hundred years to applied researchers and methodologists alike and is one of the most widely used non-parametric statistics. Still, certain misconceptions can be found, either explicitly or implicitly, in the published literature because a population definition for this statistic is rarely discussed within the social and behavioural sciences. By relying on copula distribution theory, a population model is presented for the Spearman rank correlation, and its properties are explored both theoretically and in a simulation study. Through the use of the Iman-Conover algorithm (which allows the user to specify the rank correlation as a population parameter), simulation studies from previously published articles are explored, and it is found that many of the conclusions purported in them regarding the nature of the Spearman correlation would change if the data-generation mechanism better matched the simulation design. More specifically, issues such as small sample bias and lack of power of the t-test and r-to-z Fisher transformation disappear when the rank correlation is calculated from data sampled where the rank correlation is the population parameter. A proof for the consistency of the sample estimate of the rank correlation is shown as well as the flexibility of the copula model to encompass results previously published in the mathematical literature. © 2017 The British Psychological Society.

  17. HIV quality report cards: impact of case-mix adjustment and statistical methods.

    PubMed

    Ohl, Michael E; Richardson, Kelly K; Goto, Michihiko; Vaughan-Sarrazin, Mary; Schweizer, Marin L; Perencevich, Eli N

    2014-10-15

    There will be increasing pressure to publicly report and rank the performance of healthcare systems on human immunodeficiency virus (HIV) quality measures. To inform discussion of public reporting, we evaluated the influence of case-mix adjustment when ranking individual care systems on the viral control quality measure. We used data from the Veterans Health Administration (VHA) HIV Clinical Case Registry and administrative databases to estimate case-mix adjusted viral control for 91 local systems caring for 12 368 patients. We compared results using 2 adjustment methods, the observed-to-expected estimator and the risk-standardized ratio. Overall, 10 913 patients (88.2%) achieved viral control (viral load ≤400 copies/mL). Prior to case-mix adjustment, system-level viral control ranged from 51% to 100%. Seventeen (19%) systems were labeled as low outliers (performance significantly below the overall mean) and 11 (12%) as high outliers. Adjustment for case mix (patient demographics, comorbidity, CD4 nadir, time on therapy, and income from VHA administrative databases) reduced the number of low outliers by approximately one-third, but results differed by method. The adjustment model had moderate discrimination (c statistic = 0.66), suggesting potential for unadjusted risk when using administrative data to measure case mix. Case-mix adjustment affects rankings of care systems on the viral control quality measure. Given the sensitivity of rankings to selection of case-mix adjustment methods-and potential for unadjusted risk when using variables limited to current administrative databases-the HIV care community should explore optimal methods for case-mix adjustment before moving forward with public reporting. Published by Oxford University Press on behalf of the Infectious Diseases Society of America 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.

  18. An extended data mining method for identifying differentially expressed assay-specific signatures in functional genomic studies.

    PubMed

    Rollins, Derrick K; Teh, Ailing

    2010-12-17

    Microarray data sets provide relative expression levels for thousands of genes for a small number, in comparison, of different experimental conditions called assays. Data mining techniques are used to extract specific information of genes as they relate to the assays. The multivariate statistical technique of principal component analysis (PCA) has proven useful in providing effective data mining methods. This article extends the PCA approach of Rollins et al. to the development of ranking genes of microarray data sets that express most differently between two biologically different grouping of assays. This method is evaluated on real and simulated data and compared to a current approach on the basis of false discovery rate (FDR) and statistical power (SP) which is the ability to correctly identify important genes. This work developed and evaluated two new test statistics based on PCA and compared them to a popular method that is not PCA based. Both test statistics were found to be effective as evaluated in three case studies: (i) exposing E. coli cells to two different ethanol levels; (ii) application of myostatin to two groups of mice; and (iii) a simulated data study derived from the properties of (ii). The proposed method (PM) effectively identified critical genes in these studies based on comparison with the current method (CM). The simulation study supports higher identification accuracy for PM over CM for both proposed test statistics when the gene variance is constant and for one of the test statistics when the gene variance is non-constant. PM compares quite favorably to CM in terms of lower FDR and much higher SP. Thus, PM can be quite effective in producing accurate signatures from large microarray data sets for differential expression between assays groups identified in a preliminary step of the PCA procedure and is, therefore, recommended for use in these applications.

  19. Rank Diversity of Languages: Generic Behavior in Computational Linguistics

    PubMed Central

    Cocho, Germinal; Flores, Jorge; Gershenson, Carlos; Pineda, Carlos; Sánchez, Sergio

    2015-01-01

    Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: “heads” consist of words which almost do not change their rank in time, “bodies” are words of general use, while “tails” are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied. PMID:25849150

  20. Rank diversity of languages: generic behavior in computational linguistics.

    PubMed

    Cocho, Germinal; Flores, Jorge; Gershenson, Carlos; Pineda, Carlos; Sánchez, Sergio

    2015-01-01

    Statistical studies of languages have focused on the rank-frequency distribution of words. Instead, we introduce here a measure of how word ranks change in time and call this distribution rank diversity. We calculate this diversity for books published in six European languages since 1800, and find that it follows a universal lognormal distribution. Based on the mean and standard deviation associated with the lognormal distribution, we define three different word regimes of languages: "heads" consist of words which almost do not change their rank in time, "bodies" are words of general use, while "tails" are comprised by context-specific words and vary their rank considerably in time. The heads and bodies reflect the size of language cores identified by linguists for basic communication. We propose a Gaussian random walk model which reproduces the rank variation of words in time and thus the diversity. Rank diversity of words can be understood as the result of random variations in rank, where the size of the variation depends on the rank itself. We find that the core size is similar for all languages studied.

  1. Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.

    PubMed

    Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J

    2015-07-01

    Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. Copyright © 2015 by the Genetics Society of America.

  2. A rank-based approach for correcting systematic biases in spatial disaggregation of coarse-scale climate simulations

    NASA Astrophysics Data System (ADS)

    Nahar, Jannatun; Johnson, Fiona; Sharma, Ashish

    2017-07-01

    Use of General Circulation Model (GCM) precipitation and evapotranspiration sequences for hydrologic modelling can result in unrealistic simulations due to the coarse scales at which GCMs operate and the systematic biases they contain. The Bias Correction Spatial Disaggregation (BCSD) method is a popular statistical downscaling and bias correction method developed to address this issue. The advantage of BCSD is its ability to reduce biases in the distribution of precipitation totals at the GCM scale and then introduce more realistic variability at finer scales than simpler spatial interpolation schemes. Although BCSD corrects biases at the GCM scale before disaggregation; at finer spatial scales biases are re-introduced by the assumptions made in the spatial disaggregation process. Our study focuses on this limitation of BCSD and proposes a rank-based approach that aims to reduce the spatial disaggregation bias especially for both low and high precipitation extremes. BCSD requires the specification of a multiplicative bias correction anomaly field that represents the ratio of the fine scale precipitation to the disaggregated precipitation. It is shown that there is significant temporal variation in the anomalies, which is masked when a mean anomaly field is used. This can be improved by modelling the anomalies in rank-space. Results from the application of the rank-BCSD procedure improve the match between the distributions of observed and downscaled precipitation at the fine scale compared to the original BCSD approach. Further improvements in the distribution are identified when a scaling correction to preserve mass in the disaggregation process is implemented. An assessment of the approach using a single GCM over Australia shows clear advantages especially in the simulation of particularly low and high downscaled precipitation amounts.

  3. Assessing Low-Intensity Relationships in Complex Networks

    PubMed Central

    Spitz, Andreas; Gimmler, Anna; Stoeck, Thorsten; Zweig, Katharina Anna; Horvát, Emőke-Ágnes

    2016-01-01

    Many large network data sets are noisy and contain links representing low-intensity relationships that are difficult to differentiate from random interactions. This is especially relevant for high-throughput data from systems biology, large-scale ecological data, but also for Web 2.0 data on human interactions. In these networks with missing and spurious links, it is possible to refine the data based on the principle of structural similarity, which assesses the shared neighborhood of two nodes. By using similarity measures to globally rank all possible links and choosing the top-ranked pairs, true links can be validated, missing links inferred, and spurious observations removed. While many similarity measures have been proposed to this end, there is no general consensus on which one to use. In this article, we first contribute a set of benchmarks for complex networks from three different settings (e-commerce, systems biology, and social networks) and thus enable a quantitative performance analysis of classic node similarity measures. Based on this, we then propose a new methodology for link assessment called z* that assesses the statistical significance of the number of their common neighbors by comparison with the expected value in a suitably chosen random graph model and which is a consistently top-performing algorithm for all benchmarks. In addition to a global ranking of links, we also use this method to identify the most similar neighbors of each single node in a local ranking, thereby showing the versatility of the method in two distinct scenarios and augmenting its applicability. Finally, we perform an exploratory analysis on an oceanographic plankton data set and find that the distribution of microbes follows similar biogeographic rules as those of macroorganisms, a result that rejects the global dispersal hypothesis for microbes. PMID:27096435

  4. Assessing Low-Intensity Relationships in Complex Networks.

    PubMed

    Spitz, Andreas; Gimmler, Anna; Stoeck, Thorsten; Zweig, Katharina Anna; Horvát, Emőke-Ágnes

    2016-01-01

    Many large network data sets are noisy and contain links representing low-intensity relationships that are difficult to differentiate from random interactions. This is especially relevant for high-throughput data from systems biology, large-scale ecological data, but also for Web 2.0 data on human interactions. In these networks with missing and spurious links, it is possible to refine the data based on the principle of structural similarity, which assesses the shared neighborhood of two nodes. By using similarity measures to globally rank all possible links and choosing the top-ranked pairs, true links can be validated, missing links inferred, and spurious observations removed. While many similarity measures have been proposed to this end, there is no general consensus on which one to use. In this article, we first contribute a set of benchmarks for complex networks from three different settings (e-commerce, systems biology, and social networks) and thus enable a quantitative performance analysis of classic node similarity measures. Based on this, we then propose a new methodology for link assessment called z* that assesses the statistical significance of the number of their common neighbors by comparison with the expected value in a suitably chosen random graph model and which is a consistently top-performing algorithm for all benchmarks. In addition to a global ranking of links, we also use this method to identify the most similar neighbors of each single node in a local ranking, thereby showing the versatility of the method in two distinct scenarios and augmenting its applicability. Finally, we perform an exploratory analysis on an oceanographic plankton data set and find that the distribution of microbes follows similar biogeographic rules as those of macroorganisms, a result that rejects the global dispersal hypothesis for microbes.

  5. A fragment-based approach to the SAMPL3 Challenge

    NASA Astrophysics Data System (ADS)

    Kulp, John L.; Blumenthal, Seth N.; Wang, Qiang; Bryan, Richard L.; Guarnieri, Frank

    2012-05-01

    The success of molecular fragment-based design depends critically on the ability to make predictions of binding poses and of affinity ranking for compounds assembled by linking fragments. The SAMPL3 Challenge provides a unique opportunity to evaluate the performance of a state-of-the-art fragment-based design methodology with respect to these requirements. In this article, we present results derived from linking fragments to predict affinity and pose in the SAMPL3 Challenge. The goal is to demonstrate how incorporating different aspects of modeling protein-ligand interactions impact the accuracy of the predictions, including protein dielectric models, charged versus neutral ligands, ΔΔGs solvation energies, and induced conformational stress. The core method is based on annealing of chemical potential in a Grand Canonical Monte Carlo (GC/MC) simulation. By imposing an initially very high chemical potential and then automatically running a sequence of simulations at successively decreasing chemical potentials, the GC/MC simulation efficiently discovers statistical distributions of bound fragment locations and orientations not found reliably without the annealing. This method accounts for configurational entropy, the role of bound water molecules, and results in a prediction of all the locations on the protein that have any affinity for the fragment. Disregarding any of these factors in affinity-rank prediction leads to significantly worse correlation with experimentally-determined free energies of binding. We relate three important conclusions from this challenge as applied to GC/MC: (1) modeling neutral ligands—regardless of the charged state in the active site—produced better affinity ranking than using charged ligands, although, in both cases, the poses were almost exactly overlaid; (2) simulating explicit water molecules in the GC/MC gave better affinity and pose predictions; and (3) applying a ΔΔGs solvation correction further improved the ranking of the neutral ligands. Using the GC/MC method under a variety of parameters in the blinded SAMPL3 Challenge provided important insights to the relevant parameters and boundaries in predicting binding affinities using simulated annealing of chemical potential calculations.

  6. Identification of significant features by the Global Mean Rank test.

    PubMed

    Klammer, Martin; Dybowski, J Nikolaj; Hoffmann, Daniel; Schaab, Christoph

    2014-01-01

    With the introduction of omics-technologies such as transcriptomics and proteomics, numerous methods for the reliable identification of significantly regulated features (genes, proteins, etc.) have been developed. Experimental practice requires these tests to successfully deal with conditions such as small numbers of replicates, missing values, non-normally distributed expression levels, and non-identical distributions of features. With the MeanRank test we aimed at developing a test that performs robustly under these conditions, while favorably scaling with the number of replicates. The test proposed here is a global one-sample location test, which is based on the mean ranks across replicates, and internally estimates and controls the false discovery rate. Furthermore, missing data is accounted for without the need of imputation. In extensive simulations comparing MeanRank to other frequently used methods, we found that it performs well with small and large numbers of replicates, feature dependent variance between replicates, and variable regulation across features on simulation data and a recent two-color microarray spike-in dataset. The tests were then used to identify significant changes in the phosphoproteomes of cancer cells induced by the kinase inhibitors erlotinib and 3-MB-PP1 in two independently published mass spectrometry-based studies. MeanRank outperformed the other global rank-based methods applied in this study. Compared to the popular Significance Analysis of Microarrays and Linear Models for Microarray methods, MeanRank performed similar or better. Furthermore, MeanRank exhibits more consistent behavior regarding the degree of regulation and is robust against the choice of preprocessing methods. MeanRank does not require any imputation of missing values, is easy to understand, and yields results that are easy to interpret. The software implementing the algorithm is freely available for academic and commercial use.

  7. Prototyping a Distributed Information Retrieval System That Uses Statistical Ranking.

    ERIC Educational Resources Information Center

    Harman, Donna; And Others

    1991-01-01

    Built using a distributed architecture, this prototype distributed information retrieval system uses statistical ranking techniques to provide better service to the end user. Distributed architecture was shown to be a feasible alternative to centralized or CD-ROM information retrieval, and user testing of the ranking methodology showed both…

  8. LD-SPatt: large deviations statistics for patterns on Markov chains.

    PubMed

    Nuel, G

    2004-01-01

    Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.

  9. Active learning for ontological event extraction incorporating named entity recognition and unknown word handling.

    PubMed

    Han, Xu; Kim, Jung-jae; Kwoh, Chee Keong

    2016-01-01

    Biomedical text mining may target various kinds of valuable information embedded in the literature, but a critical obstacle to the extension of the mining targets is the cost of manual construction of labeled data, which are required for state-of-the-art supervised learning systems. Active learning is to choose the most informative documents for the supervised learning in order to reduce the amount of required manual annotations. Previous works of active learning, however, focused on the tasks of entity recognition and protein-protein interactions, but not on event extraction tasks for multiple event types. They also did not consider the evidence of event participants, which might be a clue for the presence of events in unlabeled documents. Moreover, the confidence scores of events produced by event extraction systems are not reliable for ranking documents in terms of informativity for supervised learning. We here propose a novel committee-based active learning method that supports multi-event extraction tasks and employs a new statistical method for informativity estimation instead of using the confidence scores from event extraction systems. Our method is based on a committee of two systems as follows: We first employ an event extraction system to filter potential false negatives among unlabeled documents, from which the system does not extract any event. We then develop a statistical method to rank the potential false negatives of unlabeled documents 1) by using a language model that measures the probabilities of the expression of multiple events in documents and 2) by using a named entity recognition system that locates the named entities that can be event arguments (e.g. proteins). The proposed method further deals with unknown words in test data by using word similarity measures. We also apply our active learning method for the task of named entity recognition. We evaluate the proposed method against the BioNLP Shared Tasks datasets, and show that our method can achieve better performance than such previous methods as entropy and Gibbs error based methods and a conventional committee-based method. We also show that the incorporation of named entity recognition into the active learning for event extraction and the unknown word handling further improve the active learning method. In addition, the adaptation of the active learning method into named entity recognition tasks also improves the document selection for manual annotation of named entities.

  10. Investigating a Judgemental Rank-Ordering Method for Maintaining Standards in UK Examinations

    ERIC Educational Resources Information Center

    Black, Beth; Bramley, Tom

    2008-01-01

    A new judgemental method of equating raw scores on two tests, based on rank-ordering scripts from both tests, has been developed by Bramley. The rank-ordering method has potential application as a judgemental standard-maintaining mechanism, because given a mark on one test (e.g. the A grade boundary mark), the equivalent mark (i.e. at the same…

  11. Image Re-Ranking Based on Topic Diversity.

    PubMed

    Qian, Xueming; Lu, Dan; Wang, Yaxiong; Zhu, Li; Tang, Yuan Yan; Wang, Meng

    2017-08-01

    Social media sharing Websites allow users to annotate images with free tags, which significantly contribute to the development of the web image retrieval. Tag-based image search is an important method to find images shared by users in social networks. However, how to make the top ranked result relevant and with diversity is challenging. In this paper, we propose a topic diverse ranking approach for tag-based image retrieval with the consideration of promoting the topic coverage performance. First, we construct a tag graph based on the similarity between each tag. Then, the community detection method is conducted to mine the topic community of each tag. After that, inter-community and intra-community ranking are introduced to obtain the final retrieved results. In the inter-community ranking process, an adaptive random walk model is employed to rank the community based on the multi-information of each topic community. Besides, we build an inverted index structure for images to accelerate the searching process. Experimental results on Flickr data set and NUS-Wide data sets show the effectiveness of the proposed approach.

  12. Importance of scientific resources among local public health practitioners.

    PubMed

    Fields, Robert P; Stamatakis, Katherine A; Duggan, Kathleen; Brownson, Ross C

    2015-04-01

    We examined the perceived importance of scientific resources for decision-making among local health department (LHD) practitioners in the United States. We used data from LHD practitioners (n = 849). Respondents ranked important decision-making resources, methods for learning about public health research, and academic journal use. We calculated descriptive statistics and used logistic regression to measure associations of individual and LHD characteristics with importance of scientific resources. Systematic reviews of scientific literature (24.7%) were most frequently ranked as important among scientific resources, followed by scientific reports (15.9%), general literature review articles (6.5%), and 1 or a few scientific studies (4.8%). Graduate-level education (adjusted odds ratios [AORs] = 1.7-3.5), larger LHD size (AORs = 2.0-3.5), and leadership support (AOR = 1.6; 95% confidence interval = 1.1, 2.3) were associated with a higher ranking of importance of scientific resources. Graduate training, larger LHD size, and leadership that supports a culture of evidence-based decision-making may increase the likelihood of practitioners viewing scientific resources as important. Targeting communication channels that practitioners view as important can also guide research dissemination strategies.

  13. A stochastic Markov chain approach for tennis: Monte Carlo simulation and modeling

    NASA Astrophysics Data System (ADS)

    Aslam, Kamran

    This dissertation describes the computational formulation of probability density functions (pdfs) that facilitate head-to-head match simulations in tennis along with ranking systems developed from their use. A background on the statistical method used to develop the pdfs , the Monte Carlo method, and the resulting rankings are included along with a discussion on ranking methods currently being used both in professional sports and in other applications. Using an analytical theory developed by Newton and Keller in [34] that defines a tennis player's probability of winning a game, set, match and single elimination tournament, a computational simulation has been developed in Matlab that allows further modeling not previously possible with the analytical theory alone. Such experimentation consists of the exploration of non-iid effects, considers the concept the varying importance of points in a match and allows an unlimited number of matches to be simulated between unlikely opponents. The results of these studies have provided pdfs that accurately model an individual tennis player's ability along with a realistic, fair and mathematically sound platform for ranking them.

  14. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    PubMed

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

  15. Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

    DTIC Science & Technology

    2010-03-01

    Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical

  16. Applying a "Big Data" Literature System to Recommend Antihypertensive Drugs for Hypertension Patients with Diabetes Mellitus.

    PubMed

    Shu, Jing-Xian; Li, Ying; He, Ting; Chen, Ling; Li, Xue; Zou, Lin-Lin; Yin, Lu; Li, Xiao-Hui; Wang, An-Li; Liu, Xing; Yuan, Hong

    2018-01-07

    BACKGROUND The explosive increase in medical literature has changed therapeutic strategies, but it is challenging for physicians to keep up-to-date on the medical literature. Scientific literature data mining on a large-scale of can be used to refresh physician knowledge and better improve the quality of disease treatment. MATERIAL AND METHODS This paper reports on a reformulated version of a data mining method called MedRank, which is a network-based algorithm that ranks therapy for a target disease based on the MEDLINE literature database. MedRank algorithm input for this study was a clear definition of the disease model; the algorithm output was the accurate recommendation of antihypertensive drugs. Hypertension with diabetes mellitus was chosen as the input disease model. The ranking output of antihypertensive drugs are based on the Joint National Committee (JNC) guidelines, one through eight, and the publication dates, ≤1977, ≤1980, ≤1984, ≤1988, ≤1993, ≤1997, ≤2003, and ≤2013. The McNemar's test was used to evaluate the efficacy of MedRank based on specific JNC guidelines. RESULTS The ranking order of antihypertensive drugs changed with the date of the published literature, and the MedRank algorithm drug recommendations had excellent consistency with the JNC guidelines in 2013 (P=1.00 from McNemar's test, Kappa=0.78, P=1.00). Moreover, the Kappa index increased over time. Sensitivity was better than specificity for MedRank; in addition, sensitivity was maintained at a high level, and specificity increased from 1997 to 2013. CONCLUSIONS The use of MedRank in ranking medical literature on hypertension with diabetes mellitus in our study suggests possible application in clinical practice; it is a potential method for supporting antihypertensive drug-prescription decisions.

  17. Hazard Screening Methods for Nanomaterials: A Comparative Study

    PubMed Central

    Murphy, Finbarr; Mullins, Martin; Furxhi, Irini; Costa, Anna L.; Simeone, Felice C.

    2018-01-01

    Hazard identification is the key step in risk assessment and management of manufactured nanomaterials (NM). However, the rapid commercialisation of nano-enabled products continues to out-pace the development of a prudent risk management mechanism that is widely accepted by the scientific community and enforced by regulators. However, a growing body of academic literature is developing promising quantitative methods. Two approaches have gained significant currency. Bayesian networks (BN) are a probabilistic, machine learning approach while the weight of evidence (WoE) statistical framework is based on expert elicitation. This comparative study investigates the efficacy of quantitative WoE and Bayesian methodologies in ranking the potential hazard of metal and metal-oxide NMs—TiO2, Ag, and ZnO. This research finds that hazard ranking is consistent for both risk assessment approaches. The BN and WoE models both utilize physico-chemical, toxicological, and study type data to infer the hazard potential. The BN exhibits more stability when the models are perturbed with new data. The BN has the significant advantage of self-learning with new data; however, this assumes all input data is equally valid. This research finds that a combination of WoE that would rank input data along with the BN is the optimal hazard assessment framework. PMID:29495342

  18. Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

    DOE PAGES

    Li, Ruipeng; Saad, Yousef

    2017-08-01

    This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less

  19. Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Ruipeng; Saad, Yousef

    This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less

  20. Assessment of features for automatic CTG analysis based on expert annotation.

    PubMed

    Chudácek, Vacláv; Spilka, Jirí; Lhotská, Lenka; Janku, Petr; Koucký, Michal; Huptych, Michal; Bursa, Miroslav

    2011-01-01

    Cardiotocography (CTG) is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO) since 1960's used routinely by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the ever-used features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and the features are assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. Annotation derived from the panel of experts instead of the commonly utilized pH values was used for evaluation of the features on a large data set (552 records). We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. Number of acceleration and deceleration, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.

  1. The effect of uncertainties in distance-based ranking methods for multi-criteria decision making

    NASA Astrophysics Data System (ADS)

    Jaini, Nor I.; Utyuzhnikov, Sergei V.

    2017-08-01

    Data in the multi-criteria decision making are often imprecise and changeable. Therefore, it is important to carry out sensitivity analysis test for the multi-criteria decision making problem. The paper aims to present a sensitivity analysis for some ranking techniques based on the distance measures in multi-criteria decision making. Two types of uncertainties are considered for the sensitivity analysis test. The first uncertainty is related to the input data, while the second uncertainty is towards the Decision Maker preferences (weights). The ranking techniques considered in this study are TOPSIS, the relative distance and trade-off ranking methods. TOPSIS and the relative distance method measure a distance from an alternative to the ideal and antiideal solutions. In turn, the trade-off ranking calculates a distance of an alternative to the extreme solutions and other alternatives. Several test cases are considered to study the performance of each ranking technique in both types of uncertainties.

  2. Extreme learning machine for ranking: generalization analysis and applications.

    PubMed

    Chen, Hong; Peng, Jiangtao; Zhou, Yicong; Li, Luoqing; Pan, Zhibin

    2014-05-01

    The extreme learning machine (ELM) has attracted increasing attention recently with its successful applications in classification and regression. In this paper, we investigate the generalization performance of ELM-based ranking. A new regularized ranking algorithm is proposed based on the combinations of activation functions in ELM. The generalization analysis is established for the ELM-based ranking (ELMRank) in terms of the covering numbers of hypothesis space. Empirical results on the benchmark datasets show the competitive performance of the ELMRank over the state-of-the-art ranking methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  3. A ranking method for the concurrent learning of compounds with various activity profiles.

    PubMed

    Dörr, Alexander; Rosenbaum, Lars; Zell, Andreas

    2015-01-01

    In this study, we present a SVM-based ranking algorithm for the concurrent learning of compounds with different activity profiles and their varying prioritization. To this end, a specific labeling of each compound was elaborated in order to infer virtual screening models against multiple targets. We compared the method with several state-of-the-art SVM classification techniques that are capable of inferring multi-target screening models on three chemical data sets (cytochrome P450s, dehydrogenases, and a trypsin-like protease data set) containing three different biological targets each. The experiments show that ranking-based algorithms show an increased performance for single- and multi-target virtual screening. Moreover, compounds that do not completely fulfill the desired activity profile are still ranked higher than decoys or compounds with an entirely undesired profile, compared to other multi-target SVM methods. SVM-based ranking methods constitute a valuable approach for virtual screening in multi-target drug design. The utilization of such methods is most helpful when dealing with compounds with various activity profiles and the finding of many ligands with an already perfectly matching activity profile is not to be expected.

  4. SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

    PubMed Central

    Chu, Annie; Cui, Jenny; Dinov, Ivo D.

    2011-01-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994

  5. It's all relative: ranking the diversity of aquatic bacterial communities.

    PubMed

    Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H

    2008-09-01

    The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.

  6. Accurate template-based modeling in CASP12 using the IntFOLD4-TS, ModFOLD6, and ReFOLD methods.

    PubMed

    McGuffin, Liam J; Shuid, Ahmad N; Kempster, Robert; Maghrabi, Ali H A; Nealon, John O; Salehe, Bajuna R; Atkins, Jennifer D; Roche, Daniel B

    2018-03-01

    Our aim in CASP12 was to improve our Template-Based Modeling (TBM) methods through better model selection, accuracy self-estimate (ASE) scores and refinement. To meet this aim, we developed two new automated methods, which we used to score, rank, and improve upon the provided server models. Firstly, the ModFOLD6_rank method, for improved global Quality Assessment (QA), model ranking and the detection of local errors. Secondly, the ReFOLD method for fixing errors through iterative QA guided refinement. For our automated predictions we developed the IntFOLD4-TS protocol, which integrates the ModFOLD6_rank method for scoring the multiple-template models that were generated using a number of alternative sequence-structure alignments. Overall, our selection of top models and ASE scores using ModFOLD6_rank was an improvement on our previous approaches. In addition, it was worthwhile attempting to repair the detected errors in the top selected models using ReFOLD, which gave us an overall gain in performance. According to the assessors' formula, the IntFOLD4 server ranked 3rd/5th (average Z-score > 0.0/-2.0) on the server only targets, and our manual predictions (McGuffin group) ranked 1st/2nd (average Z-score > -2.0/0.0) compared to all other groups. © 2017 Wiley Periodicals, Inc.

  7. Ill Posed Problems: Numerical and Statistical Methods for Mildly, Moderately and Severely Ill Posed Problems with Noisy Data.

    DTIC Science & Technology

    1980-02-01

    to estimate f -..ell, -noderately ,-ell, or- poorly. 1 ’The sansitivity *of a rec-ilarized estimate of f to the noise is made explicit. After giving the...AD-A 7 .SA92 925 WISCONSIN UN! V-MADISON DEFT OF STATISTICS F /S 11,’ 1 ILL POSED PRORLEMS: NUMERICAL ANn STATISTICAL METHODS FOR MILOL-ETC(U FEB 80 a...estimate f given z. We first define the 1 intrinsic rank of the problem where jK(tit) f (t)dt is known exactly. This 0 definition is used to provide insight

  8. A Kernel-Based Low-Rank (KLR) Model for Low-Dimensional Manifold Recovery in Highly Accelerated Dynamic MRI.

    PubMed

    Nakarmi, Ukash; Wang, Yanhua; Lyu, Jingyuan; Liang, Dong; Ying, Leslie

    2017-11-01

    While many low rank and sparsity-based approaches have been developed for accelerated dynamic magnetic resonance imaging (dMRI), they all use low rankness or sparsity in input space, overlooking the intrinsic nonlinear correlation in most dMRI data. In this paper, we propose a kernel-based framework to allow nonlinear manifold models in reconstruction from sub-Nyquist data. Within this framework, many existing algorithms can be extended to kernel framework with nonlinear models. In particular, we have developed a novel algorithm with a kernel-based low-rank model generalizing the conventional low rank formulation. The algorithm consists of manifold learning using kernel, low rank enforcement in feature space, and preimaging with data consistency. Extensive simulation and experiment results show that the proposed method surpasses the conventional low-rank-modeled approaches for dMRI.

  9. CT Image Sequence Restoration Based on Sparse and Low-Rank Decomposition

    PubMed Central

    Gou, Shuiping; Wang, Yueyue; Wang, Zhilong; Peng, Yong; Zhang, Xiaopeng; Jiao, Licheng; Wu, Jianshe

    2013-01-01

    Blurry organ boundaries and soft tissue structures present a major challenge in biomedical image restoration. In this paper, we propose a low-rank decomposition-based method for computed tomography (CT) image sequence restoration, where the CT image sequence is decomposed into a sparse component and a low-rank component. A new point spread function of Weiner filter is employed to efficiently remove blur in the sparse component; a wiener filtering with the Gaussian PSF is used to recover the average image of the low-rank component. And then we get the recovered CT image sequence by combining the recovery low-rank image with all recovery sparse image sequence. Our method achieves restoration results with higher contrast, sharper organ boundaries and richer soft tissue structure information, compared with existing CT image restoration methods. The robustness of our method was assessed with numerical experiments using three different low-rank models: Robust Principle Component Analysis (RPCA), Linearized Alternating Direction Method with Adaptive Penalty (LADMAP) and Go Decomposition (GoDec). Experimental results demonstrated that the RPCA model was the most suitable for the small noise CT images whereas the GoDec model was the best for the large noisy CT images. PMID:24023764

  10. Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

    PubMed Central

    Silver, Matt; Montana, Giovanni

    2012-01-01

    Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small. PMID:22499682

  11. Deaths: Leading Causes for 2012.

    PubMed

    Heron, Melonie

    2015-08-31

    This report presents final 2012 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements "Deaths: Final Data for 2012," the National Center for Health Statistics' annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2012. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2012, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Cerebrovascular diseases; Accidents (unintentional injuries); Alzheimer's disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). These causes accounted for 74% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2012 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods.

  12. Deaths: leading causes for 2007.

    PubMed

    Heron, Melonie

    2011-08-26

    This report presents final 2007 data on the 10 leading causes of death in the United States by age, race, sex, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements the Division of Vital Statistics' annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2007. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2007, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Cerebrovascular diseases; Chronic lower respiratory diseases; Accidents (unintentional injuries); Alzheimer's disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Septicemia. They accounted for approximately 76 percent of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2007 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods.

  13. Deaths: leading causes for 2009.

    PubMed

    Heron, Melonie

    2012-10-26

    This report presents final 2009 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements the Division of Vital Statistics' annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2009. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2009, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Cerebrovascular diseases; Accidents (unintentional injuries); Alzheimer's disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). These causes accounted for approximately 75% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2009 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods.

  14. Deaths: leading causes for 2008.

    PubMed

    Heron, Melonie

    2012-06-06

    This report presents final 2008 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements the Division of Vital Statistics' annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2008. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. in 2008, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Cerebrovascular diseases; Accidents (unintentional injuries); Alzheimer's disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). They accounted for approximately 76 percent of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2008 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods.

  15. Multi-criteria evaluation of CMIP5 GCMs for climate change impact analysis

    NASA Astrophysics Data System (ADS)

    Ahmadalipour, Ali; Rana, Arun; Moradkhani, Hamid; Sharma, Ashish

    2017-04-01

    Climate change is expected to have severe impacts on global hydrological cycle along with food-water-energy nexus. Currently, there are many climate models used in predicting important climatic variables. Though there have been advances in the field, there are still many problems to be resolved related to reliability, uncertainty, and computing needs, among many others. In the present work, we have analyzed performance of 20 different global climate models (GCMs) from Climate Model Intercomparison Project Phase 5 (CMIP5) dataset over the Columbia River Basin (CRB) in the Pacific Northwest USA. We demonstrate a statistical multicriteria approach, using univariate and multivariate techniques, for selecting suitable GCMs to be used for climate change impact analysis in the region. Univariate methods includes mean, standard deviation, coefficient of variation, relative change (variability), Mann-Kendall test, and Kolmogorov-Smirnov test (KS-test); whereas multivariate methods used were principal component analysis (PCA), singular value decomposition (SVD), canonical correlation analysis (CCA), and cluster analysis. The analysis is performed on raw GCM data, i.e., before bias correction, for precipitation and temperature climatic variables for all the 20 models to capture the reliability and nature of the particular model at regional scale. The analysis is based on spatially averaged datasets of GCMs and observation for the period of 1970 to 2000. Ranking is provided to each of the GCMs based on the performance evaluated against gridded observational data on various temporal scales (daily, monthly, and seasonal). Results have provided insight into each of the methods and various statistical properties addressed by them employed in ranking GCMs. Further; evaluation was also performed for raw GCM simulations against different sets of gridded observational dataset in the area.

  16. Testing the significance of a correlation with nonnormal data: comparison of Pearson, Spearman, transformation, and resampling approaches.

    PubMed

    Bishara, Anthony J; Hittner, James B

    2012-09-01

    It is well known that when data are nonnormally distributed, a test of the significance of Pearson's r may inflate Type I error rates and reduce power. Statistics textbooks and the simulation literature provide several alternatives to Pearson's correlation. However, the relative performance of these alternatives has been unclear. Two simulation studies were conducted to compare 12 methods, including Pearson, Spearman's rank-order, transformation, and resampling approaches. With most sample sizes (n ≥ 20), Type I and Type II error rates were minimized by transforming the data to a normal shape prior to assessing the Pearson correlation. Among transformation approaches, a general purpose rank-based inverse normal transformation (i.e., transformation to rankit scores) was most beneficial. However, when samples were both small (n ≤ 10) and extremely nonnormal, the permutation test often outperformed other alternatives, including various bootstrap tests.

  17. Clinical Psychology Ph.D. Program Rankings: Evaluating Eminence on Faculty Publications and Citations

    ERIC Educational Resources Information Center

    Matson, Johnny L.; Malone, Carrie J.; Gonzalez, Melissa L.; McClure, David R.; Laud, Rinita B.; Minshawi, Noha F.

    2005-01-01

    Program rankings and their visibility have taken on greater and greater significance. Rarely is the accuracy of these rankings, which are typically based on a small subset of university faculty impressions, questioned. This paper presents a more comprehensive survey method based on quantifiable measures of faculty publications and citations. The…

  18. Determining the optimal number of independent components for reproducible transcriptomic data analysis.

    PubMed

    Kairov, Ulykbek; Cantini, Laura; Greco, Alessandro; Molkenov, Askhat; Czerwinska, Urszula; Barillot, Emmanuel; Zinovyev, Andrei

    2017-09-11

    Independent Component Analysis (ICA) is a method that models gene expression data as an action of a set of statistically independent hidden factors. The output of ICA depends on a fundamental parameter: the number of components (factors) to compute. The optimal choice of this parameter, related to determining the effective data dimension, remains an open question in the application of blind source separation techniques to transcriptomic data. Here we address the question of optimizing the number of statistically independent components in the analysis of transcriptomic data for reproducibility of the components in multiple runs of ICA (within the same or within varying effective dimensions) and in multiple independent datasets. To this end, we introduce ranking of independent components based on their stability in multiple ICA computation runs and define a distinguished number of components (Most Stable Transcriptome Dimension, MSTD) corresponding to the point of the qualitative change of the stability profile. Based on a large body of data, we demonstrate that a sufficient number of dimensions is required for biological interpretability of the ICA decomposition and that the most stable components with ranks below MSTD have more chances to be reproduced in independent studies compared to the less stable ones. At the same time, we show that a transcriptomics dataset can be reduced to a relatively high number of dimensions without losing the interpretability of ICA, even though higher dimensions give rise to components driven by small gene sets. We suggest a protocol of ICA application to transcriptomics data with a possibility of prioritizing components with respect to their reproducibility that strengthens the biological interpretation. Computing too few components (much less than MSTD) is not optimal for interpretability of the results. The components ranked within MSTD range have more chances to be reproduced in independent studies.

  19. Critical review of methods for risk ranking of food-related hazards, based on risks for human health.

    PubMed

    Van der Fels-Klerx, H J; Van Asselt, E D; Raley, M; Poulsen, M; Korsgaard, H; Bredsdorff, L; Nauta, M; D'agostino, M; Coles, D; Marvin, H J P; Frewer, L J

    2018-01-22

    This study aimed to critically review methods for ranking risks related to food safety and dietary hazards on the basis of their anticipated human health impacts. A literature review was performed to identify and characterize methods for risk ranking from the fields of food, environmental science and socio-economic sciences. The review used a predefined search protocol, and covered the bibliographic databases Scopus, CAB Abstracts, Web of Sciences, and PubMed over the period 1993-2013. All references deemed relevant, on the basis of predefined evaluation criteria, were included in the review, and the risk ranking method characterized. The methods were then clustered-based on their characteristics-into eleven method categories. These categories included: risk assessment, comparative risk assessment, risk ratio method, scoring method, cost of illness, health adjusted life years (HALY), multi-criteria decision analysis, risk matrix, flow charts/decision trees, stated preference techniques and expert synthesis. Method categories were described by their characteristics, weaknesses and strengths, data resources, and fields of applications. It was concluded there is no single best method for risk ranking. The method to be used should be selected on the basis of risk manager/assessor requirements, data availability, and the characteristics of the method. Recommendations for future use and application are provided.

  20. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection

    PubMed Central

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-01-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices. PMID:25177107

  1. Bias and Stability of Single Variable Classifiers for Feature Ranking and Selection.

    PubMed

    Fakhraei, Shobeir; Soltanian-Zadeh, Hamid; Fotouhi, Farshad

    2014-11-01

    Feature rankings are often used for supervised dimension reduction especially when discriminating power of each feature is of interest, dimensionality of dataset is extremely high, or computational power is limited to perform more complicated methods. In practice, it is recommended to start dimension reduction via simple methods such as feature rankings before applying more complex approaches. Single Variable Classifier (SVC) ranking is a feature ranking based on the predictive performance of a classifier built using only a single feature. While benefiting from capabilities of classifiers, this ranking method is not as computationally intensive as wrappers. In this paper, we report the results of an extensive study on the bias and stability of such feature ranking method. We study whether the classifiers influence the SVC rankings or the discriminative power of features themselves has a dominant impact on the final rankings. We show the common intuition of using the same classifier for feature ranking and final classification does not always result in the best prediction performance. We then study if heterogeneous classifiers ensemble approaches provide more unbiased rankings and if they improve final classification performance. Furthermore, we calculate an empirical prediction performance loss for using the same classifier in SVC feature ranking and final classification from the optimal choices.

  2. A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text

    PubMed Central

    Miwa, Makoto; Ohta, Tomoko; Rak, Rafal; Rowley, Andrew; Kell, Douglas B.; Pyysalo, Sampo; Ananiadou, Sophia

    2013-01-01

    Motivation: To create, verify and maintain pathway models, curators must discover and assess knowledge distributed over the vast body of biological literature. Methods supporting these tasks must understand both the pathway model representations and the natural language in the literature. These methods should identify and order documents by relevance to any given pathway reaction. No existing system has addressed all aspects of this challenge. Method: We present novel methods for associating pathway model reactions with relevant publications. Our approach extracts the reactions directly from the models and then turns them into queries for three text mining-based MEDLINE literature search systems. These queries are executed, and the resulting documents are combined and ranked according to their relevance to the reactions of interest. We manually annotate document-reaction pairs with the relevance of the document to the reaction and use this annotation to study several ranking methods, using various heuristic and machine-learning approaches. Results: Our evaluation shows that the annotated document-reaction pairs can be used to create a rule-based document ranking system, and that machine learning can be used to rank documents by their relevance to pathway reactions. We find that a Support Vector Machine-based system outperforms several baselines and matches the performance of the rule-based system. The success of the query extraction and ranking methods are used to update our existing pathway search system, PathText. Availability: An online demonstration of PathText 2 and the annotated corpus are available for research purposes at http://www.nactem.ac.uk/pathtext2/. Contact: makoto.miwa@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23813008

  3. CSAR 2014: A Benchmark Exercise Using Unpublished Data from Pharma.

    PubMed

    Carlson, Heather A; Smith, Richard D; Damm-Ganamet, Kelly L; Stuckey, Jeanne A; Ahmed, Aqeel; Convery, Maire A; Somers, Donald O; Kranz, Michael; Elkins, Patricia A; Cui, Guanglei; Peishoff, Catherine E; Lambert, Millard H; Dunbar, James B

    2016-06-27

    The 2014 CSAR Benchmark Exercise was the last community-wide exercise that was conducted by the group at the University of Michigan, Ann Arbor. For this event, GlaxoSmithKline (GSK) donated unpublished crystal structures and affinity data from in-house projects. Three targets were used: tRNA (m1G37) methyltransferase (TrmD), Spleen Tyrosine Kinase (SYK), and Factor Xa (FXa). A particularly strong feature of the GSK data is its large size, which lends greater statistical significance to comparisons between different methods. In Phase 1 of the CSAR 2014 Exercise, participants were given several protein-ligand complexes and asked to identify the one near-native pose from among 200 decoys provided by CSAR. Though decoys were requested by the community, we found that they complicated our analysis. We could not discern whether poor predictions were failures of the chosen method or an incompatibility between the participant's method and the setup protocol we used. This problem is inherent to decoys, and we strongly advise against their use. In Phase 2, participants had to dock and rank/score a set of small molecules given only the SMILES strings of the ligands and a protein structure with a different ligand bound. Overall, docking was a success for most participants, much better in Phase 2 than in Phase 1. However, scoring was a greater challenge. No particular approach to docking and scoring had an edge, and successful methods included empirical, knowledge-based, machine-learning, shape-fitting, and even those with solvation and entropy terms. Several groups were successful in ranking TrmD and/or SYK, but ranking FXa ligands was intractable for all participants. Methods that were able to dock well across all submitted systems include MDock,1 Glide-XP,2 PLANTS,3 Wilma,4 Gold,5 SMINA,6 Glide-XP2/PELE,7 FlexX,8 and MedusaDock.9 In fact, the submission based on Glide-XP2/PELE7 cross-docked all ligands to many crystal structures, and it was particularly impressive to see success across an ensemble of protein structures for multiple targets. For scoring/ranking, submissions that showed statistically significant achievement include MDock1 using ITScore1,10 with a flexible-ligand term,11 SMINA6 using Autodock-Vina,12,13 FlexX8 using HYDE,14 and Glide-XP2 using XP DockScore2 with and without ROCS15 shape similarity.16 Of course, these results are for only three protein targets, and many more systems need to be investigated to truly identify which approaches are more successful than others. Furthermore, our exercise is not a competition.

  4. Automated diagnosis of congestive heart failure using dual tree complex wavelet transform and statistical features extracted from 2s of ECG signals.

    PubMed

    Sudarshan, Vidya K; Acharya, U Rajendra; Oh, Shu Lih; Adam, Muhammad; Tan, Jen Hong; Chua, Chua Kuang; Chua, Kok Poo; Tan, Ru San

    2017-04-01

    Identification of alarming features in the electrocardiogram (ECG) signal is extremely significant for the prediction of congestive heart failure (CHF). ECG signal analysis carried out using computer-aided techniques can speed up the diagnosis process and aid in the proper management of CHF patients. Therefore, in this work, dual tree complex wavelets transform (DTCWT)-based methodology is proposed for an automated identification of ECG signals exhibiting CHF from normal. In the experiment, we have performed a DTCWT on ECG segments of 2s duration up to six levels to obtain the coefficients. From these DTCWT coefficients, statistical features are extracted and ranked using Bhattacharyya, entropy, minimum redundancy maximum relevance (mRMR), receiver-operating characteristics (ROC), Wilcoxon, t-test and reliefF methods. Ranked features are subjected to k-nearest neighbor (KNN) and decision tree (DT) classifiers for automated differentiation of CHF and normal ECG signals. We have achieved 99.86% accuracy, 99.78% sensitivity and 99.94% specificity in the identification of CHF affected ECG signals using 45 features. The proposed method is able to detect CHF patients accurately using only 2s of ECG signal length and hence providing sufficient time for the clinicians to further investigate on the severity of CHF and treatments. Copyright © 2017 Elsevier Ltd. All rights reserved.

  5. [A rank-order method for the integrated assessment of trends in all-cause and cardiovascular mortality rates in the subjects of the Russian Federation in 2006-2012].

    PubMed

    Artamonova, G V; Maksimov, S A; Tabakaev, M V; Barbarash, L S

    2016-01-01

    To rank the subjects of the Russian Federation by the trend direction in all-cause and cardiovascular mortality (including mortality from coronary heart disease and cerebrovascular diseases) as a whole and at able-bodied age. The investigation used mortality rates from to the 2006 and 2012 data available in the Federal State Statistics Service on 81 subjects of the Russian Federation. According to mortality rates, each region was assigned a rank in 2006 and 2012. Trends in rank changes in the Russian Federation's regions were analyzed. A cluster analysis was used to group the subjects of the Russian Federation by trends in rank changes. The cluster analysis of rank changes from 2006 to 2012 could combine the Russian Federation's regions into 10 groups showing the similar trends in all-cause and circulatory disease mortality rates. Overall, the results of the ranking and further clusterization of the regions of the Russian Federation correspond to the trends in all-cause and cardiovascular mortality rates according to the data of other Russian investigations, by qualitatively complementing them. The trend rank-order method permits a comprehensive comparative analysis of changes in all-cause and cardiovascular mortality in the subjects of the Russian Federation both as a whole and at able-bodied age, which provides qualitatively new information complementing the universally accepted approaches to studying the population's mortality.

  6. Maryland's high cancer mortality rate: a review of contributing demographic factors.

    PubMed

    Freedman, D M

    1999-01-01

    For many years, Maryland has ranked among the top states in cancer mortality. This study analyzed mortality data from the National Center for Health Statistics (CDC-Wonder) to help explain Maryland's cancer rate and rank. Age-adjusted rates are based on deaths per 100,000 population from 1991 through 1995. Rates and ranks overall, and stratified by age, are calculated for total cancer mortality, as well as for four major sites: lung, breast, prostate, and colorectal. Because states differ in their racial/gender mix, race/gender rates among states are also compared. Although Maryland ranks seventh in overall cancer mortality, its rates and rank by race and gender subpopulation are less high. For those under 75, white men ranked 26th, black men ranked 20th, and black and white women ranked 12th and 10th, respectively. Maryland's overall rank, as with any state, is a function of the rates of its racial and gender subpopulations and the relative size of these groups in the state. Many of the disparities between Maryland's overall high cancer rank and its lower rank by subpopulation also characterize the major cancer sites. Although a stratified presentation of cancer rates and ranks may be more favorable to Maryland, it should not be used to downplay the attention cancer mortality in Maryland deserves.

  7. A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.

    PubMed

    Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho

    2014-10-01

    Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.

  8. Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

    DOE PAGES

    Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

    2018-01-01

    Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.

  9. Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

    Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.

  10. Risk-Based Prioritization Method for the Classification of Groundwater Pollution from Hazardous Waste Landfills.

    PubMed

    Yang, Yu; Jiang, Yong-Hai; Lian, Xin-Ying; Xi, Bei-Dou; Ma, Zhi-Fei; Xu, Xiang-Jian; An, Da

    2016-12-01

    Hazardous waste landfill sites are a significant source of groundwater pollution. To ensure that these landfills with a significantly high risk of groundwater contamination are properly managed, a risk-based ranking method related to groundwater contamination is needed. In this research, a risk-based prioritization method for the classification of groundwater pollution from hazardous waste landfills was established. The method encompasses five phases, including risk pre-screening, indicator selection, characterization, classification and, lastly, validation. In the risk ranking index system employed here, 14 indicators involving hazardous waste landfills and migration in the vadose zone as well as aquifer were selected. The boundary of each indicator was determined by K-means cluster analysis and the weight of each indicator was calculated by principal component analysis. These methods were applied to 37 hazardous waste landfills in China. The result showed that the risk for groundwater contamination from hazardous waste landfills could be ranked into three classes from low to high risk. In all, 62.2 % of the hazardous waste landfill sites were classified in the low and medium risk classes. The process simulation method and standardized anomalies were used to validate the result of risk ranking; the results were consistent with the simulated results related to the characteristics of contamination. The risk ranking method was feasible, valid and can provide reference data related to risk management for groundwater contamination at hazardous waste landfill sites.

  11. Learning to Rank the Severity of Unrepaired Cleft Lip Nasal Deformity on 3D Mesh Data.

    PubMed

    Wu, Jia; Tse, Raymond; Shapiro, Linda G

    2014-08-01

    Cleft lip is a birth defect that results in deformity of the upper lip and nose. Its severity is widely variable and the results of treatment are influenced by the initial deformity. Objective assessment of severity would help to guide prognosis and treatment. However, most assessments are subjective. The purpose of this study is to develop and test quantitative computer-based methods of measuring cleft lip severity. In this paper, a grid-patch based measurement of symmetry is introduced, with which a computer program learns to rank the severity of cleft lip on 3D meshes of human infant faces. Three computer-based methods to define the midfacial reference plane were compared to two manual methods. Four different symmetry features were calculated based upon these reference planes, and evaluated. The result shows that the rankings predicted by the proposed features were highly correlated with the ranking orders provided by experts that were used as the ground truth.

  12. Multi-energy CT based on a prior rank, intensity and sparsity model (PRISM).

    PubMed

    Gao, Hao; Yu, Hengyong; Osher, Stanley; Wang, Ge

    2011-11-01

    We propose a compressive sensing approach for multi-energy computed tomography (CT), namely the prior rank, intensity and sparsity model (PRISM). To further compress the multi-energy image for allowing the reconstruction with fewer CT data and less radiation dose, the PRISM models a multi-energy image as the superposition of a low-rank matrix and a sparse matrix (with row dimension in space and column dimension in energy), where the low-rank matrix corresponds to the stationary background over energy that has a low matrix rank, and the sparse matrix represents the rest of distinct spectral features that are often sparse. Distinct from previous methods, the PRISM utilizes the generalized rank, e.g., the matrix rank of tight-frame transform of a multi-energy image, which offers a way to characterize the multi-level and multi-filtered image coherence across the energy spectrum. Besides, the energy-dependent intensity information can be incorporated into the PRISM in terms of the spectral curves for base materials, with which the restoration of the multi-energy image becomes the reconstruction of the energy-independent material composition matrix. In other words, the PRISM utilizes prior knowledge on the generalized rank and sparsity of a multi-energy image, and intensity/spectral characteristics of base materials. Furthermore, we develop an accurate and fast split Bregman method for the PRISM and demonstrate the superior performance of the PRISM relative to several competing methods in simulations.

  13. Active learning methods for interactive image retrieval.

    PubMed

    Gosselin, Philippe Henri; Cord, Matthieu

    2008-07-01

    Active learning methods have been considered with increased interest in the statistical learning community. Initially developed within a classification framework, a lot of extensions are now being proposed to handle multimedia applications. This paper provides algorithms within a statistical framework to extend active learning for online content-based image retrieval (CBIR). The classification framework is presented with experiments to compare several powerful classification techniques in this information retrieval context. Focusing on interactive methods, active learning strategy is then described. The limitations of this approach for CBIR are emphasized before presenting our new active selection process RETIN. First, as any active method is sensitive to the boundary estimation between classes, the RETIN strategy carries out a boundary correction to make the retrieval process more robust. Second, the criterion of generalization error to optimize the active learning selection is modified to better represent the CBIR objective of database ranking. Third, a batch processing of images is proposed. Our strategy leads to a fast and efficient active learning scheme to retrieve sets of online images (query concept). Experiments on large databases show that the RETIN method performs well in comparison to several other active strategies.

  14. Comparison of different eigensolvers for calculating vibrational spectra using low-rank, sum-of-product basis functions

    NASA Astrophysics Data System (ADS)

    Leclerc, Arnaud; Thomas, Phillip S.; Carrington, Tucker

    2017-08-01

    Vibrational spectra and wavefunctions of polyatomic molecules can be calculated at low memory cost using low-rank sum-of-product (SOP) decompositions to represent basis functions generated using an iterative eigensolver. Using a SOP tensor format does not determine the iterative eigensolver. The choice of the interative eigensolver is limited by the need to restrict the rank of the SOP basis functions at every stage of the calculation. We have adapted, implemented and compared different reduced-rank algorithms based on standard iterative methods (block-Davidson algorithm, Chebyshev iteration) to calculate vibrational energy levels and wavefunctions of the 12-dimensional acetonitrile molecule. The effect of using low-rank SOP basis functions on the different methods is analysed and the numerical results are compared with those obtained with the reduced rank block power method. Relative merits of the different algorithms are presented, showing that the advantage of using a more sophisticated method, although mitigated by the use of reduced-rank SOP functions, is noticeable in terms of CPU time.

  15. Solving the interval type-2 fuzzy polynomial equation using the ranking method

    NASA Astrophysics Data System (ADS)

    Rahman, Nurhakimah Ab.; Abdullah, Lazim

    2014-07-01

    Polynomial equations with trapezoidal and triangular fuzzy numbers have attracted some interest among researchers in mathematics, engineering and social sciences. There are some methods that have been developed in order to solve these equations. In this study we are interested in introducing the interval type-2 fuzzy polynomial equation and solving it using the ranking method of fuzzy numbers. The ranking method concept was firstly proposed to find real roots of fuzzy polynomial equation. Therefore, the ranking method is applied to find real roots of the interval type-2 fuzzy polynomial equation. We transform the interval type-2 fuzzy polynomial equation to a system of crisp interval type-2 fuzzy polynomial equation. This transformation is performed using the ranking method of fuzzy numbers based on three parameters, namely value, ambiguity and fuzziness. Finally, we illustrate our approach by numerical example.

  16. Computer-aided auditing of prescription drug claims.

    PubMed

    Iyengar, Vijay S; Hermiz, Keith B; Natarajan, Ramesh

    2014-09-01

    We describe a methodology for identifying and ranking candidate audit targets from a database of prescription drug claims. The relevant audit targets may include various entities such as prescribers, patients and pharmacies, who exhibit certain statistical behavior indicative of potential fraud and abuse over the prescription claims during a specified period of interest. Our overall approach is consistent with related work in statistical methods for detection of fraud and abuse, but has a relative emphasis on three specific aspects: first, based on the assessment of domain experts, certain focus areas are selected and data elements pertinent to the audit analysis in each focus area are identified; second, specialized statistical models are developed to characterize the normalized baseline behavior in each focus area; and third, statistical hypothesis testing is used to identify entities that diverge significantly from their expected behavior according to the relevant baseline model. The application of this overall methodology to a prescription claims database from a large health plan is considered in detail.

  17. Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

    NASA Astrophysics Data System (ADS)

    Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.

    2018-03-01

    We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.

  18. Clinical prognostic significance and pro-metastatic activity of RANK/RANKL via the AKT pathway in endometrial cancer.

    PubMed

    Wang, Jing; Liu, Yao; Wang, Lihua; Sun, Xiao; Wang, Yudong

    2016-02-02

    RANK/RANKL plays a key role in metastasis of certain malignant tumors, which makes it a promising target for developing novel therapeutic strategies for cancer. However, the prognostic value and pro-metastatic activity of RANK in endometrial cancer (EC) remain to be determined. Thus, the present study investigated the effect of RANK on the prognosis of EC patients, as well as the pro-metastatic activity of EC cells. The results indicated that those with high expression of RANK showed decreased overall survival and progression-free survival. Statistical analysis revealed the positive correlations between RANK/RANKL expression and metastasis-related factors. Additionally, RANK/RANKL significantly promoted cell migration/invasion via activating AKT/β-catenin/Snail pathway in vitro. However, RANK/RANKL-induced AKT activation could be suppressed after osteoprotegerin (OPG) treatment. Furthermore, the combination of medroxyprogesterone acetate (MPA) and RANKL could in turn attenuate the effect of RANKL alone. Similarly, MPA could partially inhibit the RANK-induced metastasis in an orthotopic mouse model via suppressing AKT/β-catenin/Snail pathway. Therefore, therapeutic inhibition of MPA in RANK/RANKL-induced metastasis was mediated by AKT/β-catenin/Snail pathway both in vitro and in vivo, suggesting a potential target of RANK for gene-based therapy for EC.

  19. Selection of suitable e-learning approach using TOPSIS technique with best ranked criteria weights

    NASA Astrophysics Data System (ADS)

    Mohammed, Husam Jasim; Kasim, Maznah Mat; Shaharanee, Izwan Nizal Mohd

    2017-11-01

    This paper compares the performances of four rank-based weighting assessment techniques, Rank Sum (RS), Rank Reciprocal (RR), Rank Exponent (RE), and Rank Order Centroid (ROC) on five identified e-learning criteria to select the best weights method. A total of 35 experts in a public university in Malaysia were asked to rank the criteria and to evaluate five e-learning approaches which include blended learning, flipped classroom, ICT supported face to face learning, synchronous learning, and asynchronous learning. The best ranked criteria weights are defined as weights that have the least total absolute differences with the geometric mean of all weights, were then used to select the most suitable e-learning approach by using TOPSIS method. The results show that RR weights are the best, while flipped classroom approach implementation is the most suitable approach. This paper has developed a decision framework to aid decision makers (DMs) in choosing the most suitable weighting method for solving MCDM problems.

  20. Generalization Performance of Regularized Ranking With Multiscale Kernels.

    PubMed

    Zhou, Yicong; Chen, Hong; Lan, Rushi; Pan, Zhibin

    2016-05-01

    The regularized kernel method for the ranking problem has attracted increasing attentions in machine learning. The previous regularized ranking algorithms are usually based on reproducing kernel Hilbert spaces with a single kernel. In this paper, we go beyond this framework by investigating the generalization performance of the regularized ranking with multiscale kernels. A novel ranking algorithm with multiscale kernels is proposed and its representer theorem is proved. We establish the upper bound of the generalization error in terms of the complexity of hypothesis spaces. It shows that the multiscale ranking algorithm can achieve satisfactory learning rates under mild conditions. Experiments demonstrate the effectiveness of the proposed method for drug discovery and recommendation tasks.

  1. Iterative deblending of simultaneous-source data using a coherency-pass shaping operator

    NASA Astrophysics Data System (ADS)

    Zu, Shaohuan; Zhou, Hui; Mao, Weijian; Zhang, Dong; Li, Chao; Pan, Xiao; Chen, Yangkang

    2017-10-01

    Simultaneous-source acquisition helps greatly boost an economic saving, while it brings an unprecedented challenge of removing the crosstalk interference in the recorded seismic data. In this paper, we propose a novel iterative method to separate the simultaneous source data based on a coherency-pass shaping operator. The coherency-pass filter is used to constrain the model, that is, the unblended data to be estimated, in the shaping regularization framework. In the simultaneous source survey, the incoherent interference from adjacent shots greatly increases the rank of the frequency domain Hankel matrix that is formed from the blended record. Thus, the method based on rank reduction is capable of separating the blended record to some extent. However, the shortcoming is that it may cause residual noise when there is strong blending interference. We propose to cascade the rank reduction and thresholding operators to deal with this issue. In the initial iterations, we adopt a small rank to severely separate the blended interference and a large thresholding value as strong constraints to remove the residual noise in the time domain. In the later iterations, since more and more events have been recovered, we weaken the constraint by increasing the rank and shrinking the threshold to recover weak events and to guarantee the convergence. In this way, the combined rank reduction and thresholding strategy acts as a coherency-pass filter, which only passes the coherent high-amplitude component after rank reduction instead of passing both signal and noise in traditional rank reduction based approaches. Two synthetic examples are tested to demonstrate the performance of the proposed method. In addition, the application on two field data sets (common receiver gathers and stacked profiles) further validate the effectiveness of the proposed method.

  2. Simultaneous grouping and ranking with combination of SOM and TOPSIS for selection of preferable analytical procedure for furan determination in food.

    PubMed

    Jędrkiewicz, Renata; Tsakovski, Stefan; Lavenu, Aurore; Namieśnik, Jacek; Tobiszewski, Marek

    2018-02-01

    Novel methodology for grouping and ranking with application of self-organizing maps and multicriteria decision analysis is presented. The dataset consists of 22 objects that are analytical procedures applied to furan determination in food samples. They are described by 10 variables, referred to their analytical performance, environmental and economic aspects. Multivariate statistics analysis allows to limit the amount of input data for ranking analysis. Assessment results show that the most beneficial procedures are based on microextraction techniques with GC-MS final determination. It is presented how the information obtained from both tools complement each other. The applicability of combination of grouping and ranking is also discussed. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Deaths: leading causes for 2010.

    PubMed

    Heron, Melonie

    2013-12-20

    This report presents final 2010 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements the Division of Vital Statistics' annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2010. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2010, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Cerebrovascular diseases; Accidents (unintentional injuries); Alzheimer's disease; Diabetes mellitus; Nephritis, nephrotic syndrome and nephrosis; Influenza and pneumonia; and Intentional self-harm (suicide). These 10 causes accounted for 75% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2010 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Necrotizing enterocolitis of newborn. Important variations in the leading causes of infant death are noted for the neonatal and post-neonatal periods. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

  4. Standard Errors of Equating for the Percentile Rank-Based Equipercentile Equating with Log-Linear Presmoothing

    ERIC Educational Resources Information Center

    Wang, Tianyou

    2009-01-01

    Holland and colleagues derived a formula for analytical standard error of equating using the delta-method for the kernel equating method. Extending their derivation, this article derives an analytical standard error of equating procedure for the conventional percentile rank-based equipercentile equating with log-linear smoothing. This procedure is…

  5. Monte Carlo simulations guided by imaging to predict the in vitro ranking of radiosensitizing nanoparticles

    PubMed Central

    Retif, Paul; Reinhard, Aurélie; Paquot, Héna; Jouan-Hureaux, Valérie; Chateau, Alicia; Sancey, Lucie; Barberi-Heyob, Muriel; Pinel, Sophie; Bastogne, Thierry

    2016-01-01

    This article addresses the in silico–in vitro prediction issue of organometallic nanoparticles (NPs)-based radiosensitization enhancement. The goal was to carry out computational experiments to quickly identify efficient nanostructures and then to preferentially select the most promising ones for the subsequent in vivo studies. To this aim, this interdisciplinary article introduces a new theoretical Monte Carlo computational ranking method and tests it using 3 different organometallic NPs in terms of size and composition. While the ranking predicted in a classical theoretical scenario did not fit the reference results at all, in contrast, we showed for the first time how our accelerated in silico virtual screening method, based on basic in vitro experimental data (which takes into account the NPs cell biodistribution), was able to predict a relevant ranking in accordance with in vitro clonogenic efficiency. This corroborates the pertinence of such a prior ranking method that could speed up the preclinical development of NPs in radiation therapy. PMID:27920524

  6. Monte Carlo simulations guided by imaging to predict the in vitro ranking of radiosensitizing nanoparticles.

    PubMed

    Retif, Paul; Reinhard, Aurélie; Paquot, Héna; Jouan-Hureaux, Valérie; Chateau, Alicia; Sancey, Lucie; Barberi-Heyob, Muriel; Pinel, Sophie; Bastogne, Thierry

    This article addresses the in silico-in vitro prediction issue of organometallic nanoparticles (NPs)-based radiosensitization enhancement. The goal was to carry out computational experiments to quickly identify efficient nanostructures and then to preferentially select the most promising ones for the subsequent in vivo studies. To this aim, this interdisciplinary article introduces a new theoretical Monte Carlo computational ranking method and tests it using 3 different organometallic NPs in terms of size and composition. While the ranking predicted in a classical theoretical scenario did not fit the reference results at all, in contrast, we showed for the first time how our accelerated in silico virtual screening method, based on basic in vitro experimental data (which takes into account the NPs cell biodistribution), was able to predict a relevant ranking in accordance with in vitro clonogenic efficiency. This corroborates the pertinence of such a prior ranking method that could speed up the preclinical development of NPs in radiation therapy.

  7. An evidence-based method for targeting an abusive head trauma prevention media campaign and its evaluation.

    PubMed

    Stewart, Tanya Charyk; Gilliland, Jason; Parry, Neil G; Fraser, Douglas D

    2015-11-01

    A triple-dose abusive head trauma (AHT) prevention program (Period of PURPLE Crying) was implemented. The third dose consisted of an education media campaign. The study objectives were to describe the qualitative and spatial methods developed to target AHT prevention and to evaluate this campaign. A questionnaire on the level of importance of factors, rated on a 7-point Likert scale, was distributed to a panel of experts to determine the best advertising locations. Ranked factors were used to create weights for statistical modeling and mapping within a Geographic Information Systems to determine optimal ad locations. The media campaign was evaluated via a telephone survey of randomly selected households. The survey found locations of new families, high population density, and high percentage of lone parents to be the most important factors for selecting billboard sites. Spatial analysis revealed six areas that ranked highest in our factors. Five billboards, four media posters, and six transit shelters were selected for our advertisements. A population-based telephone survey revealed that 23% of respondents knew the campaign. Nearly half (42%) heard the radio public service announcements, and 9% saw billboards. Extending primary prevention efforts to the public helps to create a cultural change in the way inconsolable crying, the trigger for AHT, is viewed. With the use of ranked factors and Geographic Information Systems, geographic locations with high visibility and specific risk factors for AHT were identified for targeting the campaign, facilitating the likelihood that our message was reaching the population in greatest need.

  8. Seeking health information online: does Wikipedia matter?

    PubMed

    Laurent, Michaël R; Vickers, Tim J

    2009-01-01

    OBJECTIVE To determine the significance of the English Wikipedia as a source of online health information. DESIGN The authors measured Wikipedia's ranking on general Internet search engines by entering keywords from MedlinePlus, NHS Direct Online, and the National Organization of Rare Diseases as queries into search engine optimization software. We assessed whether article quality influenced this ranking. The authors tested whether traffic to Wikipedia coincided with epidemiological trends and news of emerging health concerns, and how it compares to MedlinePlus. MEASUREMENTS Cumulative incidence and average position of Wikipedia compared to other Web sites among the first 20 results on general Internet search engines (Google, Google UK, Yahoo, and MSN, and page view statistics for selected Wikipedia articles and MedlinePlus pages. RESULTS Wikipedia ranked among the first ten results in 71-85% of search engines and keywords tested. Wikipedia surpassed MedlinePlus and NHS Direct Online (except for queries from the latter on Google UK), and ranked higher with quality articles. Wikipedia ranked highest for rare diseases, although its incidence in several categories decreased. Page views increased parallel to the occurrence of 20 seasonal disorders and news of three emerging health concerns. Wikipedia articles were viewed more often than MedlinePlus Topic (p = 0.001) but for MedlinePlus Encyclopedia pages, the trend was not significant (p = 0.07-0.10). CONCLUSIONS Based on its search engine ranking and page view statistics, the English Wikipedia is a prominent source of online health information compared to the other online health information providers studied.

  9. Basic principles of Hasse diagram technique in chemistry.

    PubMed

    Brüggemann, Rainer; Voigt, Kristina

    2008-11-01

    Principles of partial order applied to ranking are explained. The Hasse diagram technique (HDT) is the application of partial order theory based on a data matrix. In this paper, HDT is introduced in a stepwise procedure, and some elementary theorems are exemplified. The focus is to show how the multivariate character of a data matrix is realized by HDT and in which cases one should apply other mathematical or statistical methods. Many simple examples illustrate the basic theoretical ideas. Finally, it is shown that HDT is a useful alternative for the evaluation of antifouling agents, which was originally performed by amoeba diagrams.

  10. Geographic profiling as a novel spatial tool for targeting infectious disease control.

    PubMed

    Le Comber, Steven C; Rossmo, D Kim; Hassan, Ali N; Fuller, Douglas O; Beier, John C

    2011-05-18

    Geographic profiling is a statistical tool originally developed in criminology to prioritise large lists of suspects in cases of serial crime. Here, we use two data sets--one historical and one modern--to show how it can be used to locate the sources of infectious disease. First, we re-analyse data from a classic epidemiological study, the 1854 London cholera outbreak. Using 321 disease sites as input, we evaluate the locations of 13 neighbourhood water pumps. The Broad Street pump--the outbreak's source--ranks first, situated in the top 0.2% of the geoprofile. We extend our study with an analysis of reported malaria cases in Cairo, Egypt, using 139 disease case locations to rank 59 mosquitogenic local water sources, seven of which tested positive for the vector Anopheles sergentii. Geographic profiling ranks six of these seven sites in positions 1-6, all in the top 2% of the geoprofile. In both analyses the method outperformed other measures of spatial central tendency. We suggest that geographic profiling could form a useful component of integrated control strategies relating to a wide variety of infectious diseases, since evidence-based targeting of interventions is more efficient, environmentally friendly and cost-effective than untargeted intervention.

  11. Automatic evaluation of intrapartum fetal heart rate recordings: a comprehensive analysis of useful features.

    PubMed

    Chudáček, V; Spilka, J; Janků, P; Koucký, M; Lhotská, L; Huptych, M

    2011-08-01

    Cardiotocography is the monitoring of fetal heart rate (FHR) and uterine contractions (TOCO), used routinely since the 1960s by obstetricians to detect fetal hypoxia. The evaluation of the FHR in clinical settings is based on an evaluation of macroscopic morphological features and so far has managed to avoid adopting any achievements from the HRV research field. In this work, most of the features utilized for FHR characterization, including FIGO, HRV, nonlinear, wavelet, and time and frequency domain features, are investigated and assessed based on their statistical significance in the task of distinguishing the FHR into three FIGO classes. We assess the features on a large data set (552 records) and unlike in other published papers we use three-class expert evaluation of the records instead of the pH values. We conclude the paper by presenting the best uncorrelated features and their individual rank of importance according to the meta-analysis of three different ranking methods. The number of accelerations and decelerations, interval index, as well as Lempel-Ziv complexity and Higuchi's fractal dimension are among the top five features.

  12. Optimization of Statistical Methods Impact on Quantitative Proteomics Data.

    PubMed

    Pursiheimo, Anna; Vehmas, Anni P; Afzal, Saira; Suomi, Tomi; Chand, Thaman; Strauss, Leena; Poutanen, Matti; Rokka, Anne; Corthals, Garry L; Elo, Laura L

    2015-10-02

    As tools for quantitative label-free mass spectrometry (MS) rapidly develop, a consensus about the best practices is not apparent. In the work described here we compared popular statistical methods for detecting differential protein expression from quantitative MS data using both controlled experiments with known quantitative differences for specific proteins used as standards as well as "real" experiments where differences in protein abundance are not known a priori. Our results suggest that data-driven reproducibility-optimization can consistently produce reliable differential expression rankings for label-free proteome tools and are straightforward in their application.

  13. Keyword extraction by nonextensivity measure.

    PubMed

    Mehri, Ali; Darooneh, Amir H

    2011-05-01

    The presence of a long-range correlation in the spatial distribution of a relevant word type, in spite of random occurrences of an irrelevant word type, is an important feature of human-written texts. We classify the correlation between the occurrences of words by nonextensive statistical mechanics for the word-ranking process. In particular, we look at the nonextensivity parameter as an alternative metric to measure the spatial correlation in the text, from which the words may be ranked in terms of this measure. Finally, we compare different methods for keyword extraction. © 2011 American Physical Society

  14. Can streamlined multi-criteria decision analysis be used to implement shared decision making for colorectal cancer screening?

    PubMed Central

    Dolan, James G.; Boohaker, Emily; Allison, Jeroan; Imperiale, Thomas F.

    2013-01-01

    Background Current US colorectal cancer screening guidelines that call for shared decision making regarding the choice among several recommended screening options are difficult to implement. Multi-criteria decision analysis (MCDA) is an established methodology well suited for supporting shared decision making. Our study goal was to determine if a streamlined form of MCDA using rank order based judgments can accurately assess patients’ colorectal cancer screening priorities. Methods We converted priorities for four decision criteria and three sub-criteria regarding colorectal cancer screening obtained from 484 average risk patients using the Analytic Hierarchy Process (AHP) in a prior study into rank order-based priorities using rank order centroids. We compared the two sets of priorities using Spearman rank correlation and non-parametric Bland-Altman limits of agreement analysis. We assessed the differential impact of using the rank order-based versus the AHP-based priorities on the results of a full MCDA comparing three currently recommended colorectal cancer screening strategies. Generalizability of the results was assessed using Monte Carlo simulation. Results Correlations between the two sets of priorities for the seven criteria ranged from 0.55 to 0.92. The proportions of absolute differences between rank order-based and AHP-based priorities that were more than ± 0.15 ranged from 1% to 16%. Differences in the full MCDA results were minimal and the relative rankings of the three screening options were identical more than 88% of the time. The Monte Carlo simulation results were similar. Conclusion Rank order-based MCDA could be a simple, practical way to guide individual decisions and assess population decision priorities regarding colorectal cancer screening strategies. Additional research is warranted to further explore the use of these methods for promoting shared decision making. PMID:24300851

  15. Sync-rank: Robust Ranking, Constrained Ranking and Rank Aggregation via Eigenvector and SDP Synchronization

    DTIC Science & Technology

    2015-04-28

    the players . In addition, we compare the algorithms on three real data sets: the outcome of soccer games in the English Premier League, a Microsoft...Premier League soccer games, a Halo 2 game tournament and NCAA College Basketball games), which show that our proposed method compares favorably to...information on the ground truth rank of a subset of players , and propose an algorithm based on SDP which is able to recover the ranking of the remaining

  16. Exploiting Complexity Information for Brain Activation Detection

    PubMed Central

    Zhang, Yan; Liang, Jiali; Lin, Qiang; Hu, Zhenghui

    2016-01-01

    We present a complexity-based approach for the analysis of fMRI time series, in which sample entropy (SampEn) is introduced as a quantification of the voxel complexity. Under this hypothesis the voxel complexity could be modulated in pertinent cognitive tasks, and it changes through experimental paradigms. We calculate the complexity of sequential fMRI data for each voxel in two distinct experimental paradigms and use a nonparametric statistical strategy, the Wilcoxon signed rank test, to evaluate the difference in complexity between them. The results are compared with the well known general linear model based Statistical Parametric Mapping package (SPM12), where a decided difference has been observed. This is because SampEn method detects brain complexity changes in two experiments of different conditions and the data-driven method SampEn evaluates just the complexity of specific sequential fMRI data. Also, the larger and smaller SampEn values correspond to different meanings, and the neutral-blank design produces higher predictability than threat-neutral. Complexity information can be considered as a complementary method to the existing fMRI analysis strategies, and it may help improving the understanding of human brain functions from a different perspective. PMID:27045838

  17. An Automated Approach for Ranking Journals to Help in Clinician Decision Support

    PubMed Central

    Jonnalagadda, Siddhartha R.; Moosavinasab, Soheil; Nath, Chinmoy; Li, Dingcheng; Chute, Christopher G.; Liu, Hongfang

    2014-01-01

    Point of care access to knowledge from full text journal articles supports decision-making and decreases medical errors. However, it is an overwhelming task to search through full text journal articles and find quality information needed by clinicians. We developed a method to rate journals for a given clinical topic, Congestive Heart Failure (CHF). Our method enables filtering of journals and ranking of journal articles based on source journal in relation to CHF. We also obtained a journal priority score, which automatically rates any journal based on its importance to CHF. Comparing our ranking with data gathered by surveying 169 cardiologists, who publish on CHF, our best Multiple Linear Regression model showed a correlation of 0.880, based on five-fold cross validation. Our ranking system can be extended to other clinical topics. PMID:25954382

  18. Rank-Based Inference without Symmetric Errors.

    DTIC Science & Technology

    1982-06-01

    a rank test statistic for testing H : 8=0. The distributional properties0 of S+ were studied in great detail by Hajek and Sidak (1967). The test...fn (x)dx, where F(x) is the integral of f (X). On the other hand, Schuster (1974) and Ahmad (1976) studied ff n(x)dFn(x), where Fn (x) is the empirical...the results cited in the previous sections. In the case of Wilcoxon scores, Aubuchon (1982) proved consistency of y and studied its behavior. Further

  19. Multi-dimensional Rankings, Program Termination, and Complexity Bounds of Flowchart Programs

    NASA Astrophysics Data System (ADS)

    Alias, Christophe; Darte, Alain; Feautrier, Paul; Gonnord, Laure

    Proving the termination of a flowchart program can be done by exhibiting a ranking function, i.e., a function from the program states to a well-founded set, which strictly decreases at each program step. A standard method to automatically generate such a function is to compute invariants for each program point and to search for a ranking in a restricted class of functions that can be handled with linear programming techniques. Previous algorithms based on affine rankings either are applicable only to simple loops (i.e., single-node flowcharts) and rely on enumeration, or are not complete in the sense that they are not guaranteed to find a ranking in the class of functions they consider, if one exists. Our first contribution is to propose an efficient algorithm to compute ranking functions: It can handle flowcharts of arbitrary structure, the class of candidate rankings it explores is larger, and our method, although greedy, is provably complete. Our second contribution is to show how to use the ranking functions we generate to get upper bounds for the computational complexity (number of transitions) of the source program. This estimate is a polynomial, which means that we can handle programs with more than linear complexity. We applied the method on a collection of test cases from the literature. We also show the links and differences with previous techniques based on the insertion of counters.

  20. A New Combinatorial Optimization Approach for Integrated Feature Selection Using Different Datasets: A Prostate Cancer Transcriptomic Study

    PubMed Central

    Puthiyedth, Nisha; Riveros, Carlos; Berretta, Regina; Moscato, Pablo

    2015-01-01

    Background The joint study of multiple datasets has become a common technique for increasing statistical power in detecting biomarkers obtained from smaller studies. The approach generally followed is based on the fact that as the total number of samples increases, we expect to have greater power to detect associations of interest. This methodology has been applied to genome-wide association and transcriptomic studies due to the availability of datasets in the public domain. While this approach is well established in biostatistics, the introduction of new combinatorial optimization models to address this issue has not been explored in depth. In this study, we introduce a new model for the integration of multiple datasets and we show its application in transcriptomics. Methods We propose a new combinatorial optimization problem that addresses the core issue of biomarker detection in integrated datasets. Optimal solutions for this model deliver a feature selection from a panel of prospective biomarkers. The model we propose is a generalised version of the (α,β)-k-Feature Set problem. We illustrate the performance of this new methodology via a challenging meta-analysis task involving six prostate cancer microarray datasets. The results are then compared to the popular RankProd meta-analysis tool and to what can be obtained by analysing the individual datasets by statistical and combinatorial methods alone. Results Application of the integrated method resulted in a more informative signature than the rank-based meta-analysis or individual dataset results, and overcomes problems arising from real world datasets. The set of genes identified is highly significant in the context of prostate cancer. The method used does not rely on homogenisation or transformation of values to a common scale, and at the same time is able to capture markers associated with subgroups of the disease. PMID:26106884

  1. Evaluation of TV commercials using neurophysiological responses.

    PubMed

    Yang, Taeyang; Lee, Do-Young; Kwak, Youngshin; Choi, Jinsook; Kim, Chajoong; Kim, Sung-Phil

    2015-04-24

    In recent years, neuroscientific knowledge has been applied to marketing as a novel and efficient means to comprehend the cognitive and behavioral aspects of consumers. A number of studies have attempted to evaluate media contents, especially TV commercials using various neuroimaging techniques such as electroencephalography (EEG). Yet neurophysiological examination of detailed cognitive and affective responses in viewers is still required to provide practical information to marketers. Here, this study develops a method to analyze temporal patterns of EEG data and extract affective and cognitive indices such as happiness, surprise, and attention for TV commercial evaluation. Twenty participants participated in the study. We developed the neurophysiological indices for TV commercial evaluation using classification model. Specifically, these model-based indices were customized using individual EEG features. We used a video game for developing the index of attention and four video clips for developing indices of happiness and surprise. Statistical processes including one-way analyses of variance (ANOVA) and the cross validation scheme were used to select EEG features for each index. The EEG features were composed of the combinations of spectral power at selected channels from the cross validation for each individual. The Fisher's linear discriminant classifier (FLDA) was used to estimate each neurophysiological index during viewing four different TV commercials. Post hoc behavioral responses of preference, short-term memory, and recall were measured. Behavioral results showed significant differences for all preference, short-term memory rates, and recall rates between commercials, leading to a 'high-ranked' commercial group and a 'low-ranked' group (P < 0.05). Neural estimation of happiness results revealed a significant difference between the high-ranked and the low-ranked commercials in happiness index (P < 0.01). The order of rankings based on happiness and attention matched well with the order of behavioral response rankings. In the elapsed-time analysis of the highest-ranked commercial, we could point to visual and auditory semantic structures of the commercial that induced increases in the happiness index. Our results demonstrated that the neurophysiological indices developed in this study may provide a useful tool for evaluating TV commercials.

  2. Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals

    DOE PAGES

    Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.

    2018-03-20

    A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less

  3. Compressed sparse tensor based quadrature for vibrational quantum mechanics integrals

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rai, Prashant; Sargsyan, Khachik; Najm, Habib N.

    A new method for fast evaluation of high dimensional integrals arising in quantum mechanics is proposed. Here, the method is based on sparse approximation of a high dimensional function followed by a low-rank compression. In the first step, we interpret the high dimensional integrand as a tensor in a suitable tensor product space and determine its entries by a compressed sensing based algorithm using only a few function evaluations. Secondly, we implement a rank reduction strategy to compress this tensor in a suitable low-rank tensor format using standard tensor compression tools. This allows representing a high dimensional integrand function asmore » a small sum of products of low dimensional functions. Finally, a low dimensional Gauss–Hermite quadrature rule is used to integrate this low-rank representation, thus alleviating the curse of dimensionality. Finally, numerical tests on synthetic functions, as well as on energy correction integrals for water and formaldehyde molecules demonstrate the efficiency of this method using very few function evaluations as compared to other integration strategies.« less

  4. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data

    PubMed Central

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar

    2012-01-01

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762

  5. A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.

    PubMed

    Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar

    2012-10-18

    Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.

  6. Existence and stability, and discrete BB and rank conditions, for general mixed-hybrid finite elements in elasticity

    NASA Technical Reports Server (NTRS)

    Xue, W.-M.; Atluri, S. N.

    1985-01-01

    In this paper, all possible forms of mixed-hybrid finite element methods that are based on multi-field variational principles are examined as to the conditions for existence, stability, and uniqueness of their solutions. The reasons as to why certain 'simplified hybrid-mixed methods' in general, and the so-called 'simplified hybrid-displacement method' in particular (based on the so-called simplified variational principles), become unstable, are discussed. A comprehensive discussion of the 'discrete' BB-conditions, and the rank conditions, of the matrices arising in mixed-hybrid methods, is given. Some recent studies aimed at the assurance of such rank conditions, and the related problem of the avoidance of spurious kinematic modes, are presented.

  7. Identifying a set of influential spreaders in complex networks

    NASA Astrophysics Data System (ADS)

    Zhang, Jian-Xiong; Chen, Duan-Bing; Dong, Qiang; Zhao, Zhi-Dan

    2016-06-01

    Identifying a set of influential spreaders in complex networks plays a crucial role in effective information spreading. A simple strategy is to choose top-r ranked nodes as spreaders according to influence ranking method such as PageRank, ClusterRank and k-shell decomposition. Besides, some heuristic methods such as hill-climbing, SPIN, degree discount and independent set based are also proposed. However, these approaches suffer from a possibility that some spreaders are so close together that they overlap sphere of influence or time consuming. In this report, we present a simply yet effectively iterative method named VoteRank to identify a set of decentralized spreaders with the best spreading ability. In this approach, all nodes vote in a spreader in each turn, and the voting ability of neighbors of elected spreader will be decreased in subsequent turn. Experimental results on four real networks show that under Susceptible-Infected-Recovered (SIR) and Susceptible-Infected (SI) models, VoteRank outperforms the traditional benchmark methods on both spreading rate and final affected scale. What’s more, VoteRank has superior computational efficiency.

  8. Dispensing Processes Impact Apparent Biological Activity as Determined by Computational and Statistical Analyses

    PubMed Central

    Ekins, Sean; Olechno, Joe; Williams, Antony J.

    2013-01-01

    Dispensing and dilution processes may profoundly influence estimates of biological activity of compounds. Published data show Ephrin type-B receptor 4 IC50 values obtained via tip-based serial dilution and dispensing versus acoustic dispensing with direct dilution differ by orders of magnitude with no correlation or ranking of datasets. We generated computational 3D pharmacophores based on data derived by both acoustic and tip-based transfer. The computed pharmacophores differ significantly depending upon dispensing and dilution methods. The acoustic dispensing-derived pharmacophore correctly identified active compounds in a subsequent test set where the tip-based method failed. Data from acoustic dispensing generates a pharmacophore containing two hydrophobic features, one hydrogen bond donor and one hydrogen bond acceptor. This is consistent with X-ray crystallography studies of ligand-protein interactions and automatically generated pharmacophores derived from this structural data. In contrast, the tip-based data suggest a pharmacophore with two hydrogen bond acceptors, one hydrogen bond donor and no hydrophobic features. This pharmacophore is inconsistent with the X-ray crystallographic studies and automatically generated pharmacophores. In short, traditional dispensing processes are another important source of error in high-throughput screening that impacts computational and statistical analyses. These findings have far-reaching implications in biological research. PMID:23658723

  9. Diagnosing and ranking retinopathy disease level using diabetic fundus image recuperation approach.

    PubMed

    Somasundaram, K; Rajendran, P Alli

    2015-01-01

    Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time.

  10. Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image Recuperation Approach

    PubMed Central

    Somasundaram, K.; Alli Rajendran, P.

    2015-01-01

    Retinal fundus images are widely used in diagnosing different types of eye diseases. The existing methods such as Feature Based Macular Edema Detection (FMED) and Optimally Adjusted Morphological Operator (OAMO) effectively detected the presence of exudation in fundus images and identified the true positive ratio of exudates detection, respectively. These mechanically detected exudates did not include more detailed feature selection technique to the system for detection of diabetic retinopathy. To categorize the exudates, Diabetic Fundus Image Recuperation (DFIR) method based on sliding window approach is developed in this work to select the features of optic cup in digital retinal fundus images. The DFIR feature selection uses collection of sliding windows with varying range to obtain the features based on the histogram value using Group Sparsity Nonoverlapping Function. Using support vector model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy disease level. The ranking of disease level on each candidate set provides a much promising result for developing practically automated and assisted diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, ranking efficiency, and feature selection time. PMID:25945362

  11. Group social rank is associated with performance on a spatial learning task.

    PubMed

    Langley, Ellis J G; van Horik, Jayden O; Whiteside, Mark A; Madden, Joah R

    2018-02-01

    Dominant individuals differ from subordinates in their performances on cognitive tasks across a suite of taxa. Previous studies often only consider dyadic relationships, rather than the more ecologically relevant social hierarchies or networks, hence failing to account for how dyadic relationships may be adjusted within larger social groups. We used a novel statistical method: randomized Elo-ratings, to infer the social hierarchy of 18 male pheasants, Phasianus colchicus , while in a captive, mixed-sex group with a linear hierarchy. We assayed individual learning performance of these males on a binary spatial discrimination task to investigate whether inter-individual variation in performance is associated with group social rank. Task performance improved with increasing trial number and was positively related to social rank, with higher ranking males showing greater levels of success. Motivation to participate in the task was not related to social rank or task performance, thus indicating that these rank-related differences are not a consequence of differences in motivation to complete the task. Our results provide important information about how variation in cognitive performance relates to an individual's social rank within a group. Whether the social environment causes differences in learning performance or instead, inherent differences in learning ability predetermine rank remains to be tested.

  12. Measuring hospital performance in congenital heart surgery: Administrative vs. clinical registry data

    PubMed Central

    Pasquali, Sara K.; He, Xia; Jacobs, Jeffrey P.; Jacobs, Marshall L.; Gaies, Michael G.; Shah, Samir S.; Hall, Matthew; Gaynor, J. William; Peterson, Eric D.; Mayer, John E.; Hirsch-Romano, Jennifer C.

    2015-01-01

    Background In congenital heart surgery, hospital performance has historically been assessed using widely available administrative datasets. Recent studies have demonstrated inaccuracies in case ascertainment (coding and inclusion of eligible cases) in administrative vs. clinical registry data, however it is unclear whether this impacts assessment of performance on a hospital-level. Methods Merged data from the Society of Thoracic Surgeons (STS) Database (clinical registry), and Pediatric Health Information Systems Database (administrative dataset) on 46,056 children undergoing heart surgery (2006–2010) were utilized to evaluate in-hospital mortality for 33 hospitals based on their administrative vs. registry data. Standard methods to identify/classify cases were used: Risk Adjustment in Congenital Heart Surgery (RACHS-1) in the administrative data, and STS–European Association for Cardiothoracic Surgery (STAT) methodology in the registry. Results Median hospital surgical volume based on the registry data was 269 cases/yr; mortality was 2.9%. Hospital volumes and mortality rates based on the administrative data were on average 10.7% and 4.7% lower, respectively, although this varied widely across hospitals. Hospital rankings for mortality based on the administrative vs. registry data differed by ≥ 5 rank-positions for 24% of hospitals, with a change in mortality tertile classification (high, middle, or low mortality) for 18%, and change in statistical outlier classification for 12%. Higher volume/complexity hospitals were most impacted. Agency for Healthcare Quality and Research methods in the administrative data yielded similar results. Conclusions Inaccuracies in case ascertainment in administrative vs. clinical registry data can lead to important differences in assessment of hospital mortality rates for congenital heart surgery. PMID:25624057

  13. Composite Flood Risk for Virgin Island

    EPA Pesticide Factsheets

    The Composite Flood Risk layer combines flood hazard datasets from Federal Emergency Management Agency (FEMA) flood zones, NOAA's Shallow Coastal Flooding, and the National Hurricane Center SLOSH model for Storm Surge inundation for category 1, 2, and 3 hurricanes.Geographic areas are represented by a grid of 10 by 10 meter cells and each cell has a ranking based on variation in exposure to flooding hazards: Moderate, High and Extreme exposure. Geographic areas in each input layers are ranked based on their probability of flood risk exposure. The logic was such that areas exposed to flooding on a more frequent basis were given a higher ranking. Thus the ranking incorporates the probability of the area being flooded. For example, even though a Category 3 storm surge has higher flooding elevations, the likelihood of the occurrence is lower than a Category 1 storm surge and therefore the Category 3 flood area is given a lower exposure ranking. Extreme exposure areas are those areas that are exposed to relatively frequent flooding.The ranked input layers are then converted to a raster for the creation of the composite risk layer by using cell statistics in spatial analysis. The highest exposure ranking for a given cell in any of the three input layers is assigned to the corresponding cell in the composite layer.For example, if an area (a cell) is rank as medium in the FEMA layer, moderate in the SLOSH layer, but extreme in the SCF layer, the cell will be considere

  14. Analyzing Kernel Matrices for the Identification of Differentially Expressed Genes

    PubMed Central

    Xia, Xiao-Lei; Xing, Huanlai; Liu, Xueqin

    2013-01-01

    One of the most important applications of microarray data is the class prediction of biological samples. For this purpose, statistical tests have often been applied to identify the differentially expressed genes (DEGs), followed by the employment of the state-of-the-art learning machines including the Support Vector Machines (SVM) in particular. The SVM is a typical sample-based classifier whose performance comes down to how discriminant samples are. However, DEGs identified by statistical tests are not guaranteed to result in a training dataset composed of discriminant samples. To tackle this problem, a novel gene ranking method namely the Kernel Matrix Gene Selection (KMGS) is proposed. The rationale of the method, which roots in the fundamental ideas of the SVM algorithm, is described. The notion of ''the separability of a sample'' which is estimated by performing -like statistics on each column of the kernel matrix, is first introduced. The separability of a classification problem is then measured, from which the significance of a specific gene is deduced. Also described is a method of Kernel Matrix Sequential Forward Selection (KMSFS) which shares the KMGS method's essential ideas but proceeds in a greedy manner. On three public microarray datasets, our proposed algorithms achieved noticeably competitive performance in terms of the B.632+ error rate. PMID:24349110

  15. A denoising algorithm for CT image using low-rank sparse coding

    NASA Astrophysics Data System (ADS)

    Lei, Yang; Xu, Dong; Zhou, Zhengyang; Wang, Tonghe; Dong, Xue; Liu, Tian; Dhabaan, Anees; Curran, Walter J.; Yang, Xiaofeng

    2018-03-01

    We propose a denoising method of CT image based on low-rank sparse coding. The proposed method constructs an adaptive dictionary of image patches and estimates the sparse coding regularization parameters using the Bayesian interpretation. A low-rank approximation approach is used to simultaneously construct the dictionary and achieve sparse representation through clustering similar image patches. A variable-splitting scheme and a quadratic optimization are used to reconstruct CT image based on achieved sparse coefficients. We tested this denoising technology using phantom, brain and abdominal CT images. The experimental results showed that the proposed method delivers state-of-art denoising performance, both in terms of objective criteria and visual quality.

  16. Target Fishing for Chemical Compounds using Target-Ligand Activity data and Ranking based Methods

    PubMed Central

    Wale, Nikil; Karypis, George

    2009-01-01

    In recent years the development of computational techniques that identify all the likely targets for a given chemical compound, also termed as the problem of Target Fishing, has been an active area of research. Identification of likely targets of a chemical compound helps to understand problems such as toxicity, lack of efficacy in humans, and poor physical properties associated with that compound in the early stages of drug discovery. In this paper we present a set of techniques whose goal is to rank or prioritize targets in the context of a given chemical compound such that most targets that this compound may show activity against appear higher in the ranked list. These methods are based on our extensions to the SVM and Ranking Perceptron algorithms for this problem. Our extensive experimental study shows that the methods developed in this work outperform previous approaches by 2% to 60% under different evaluation criterions. PMID:19764745

  17. The Visual Analogue Scale for Rating, Ranking and Paired-Comparison (VAS-RRP): A new technique for psychological measurement.

    PubMed

    Sung, Yao-Ting; Wu, Jeng-Shin

    2018-04-17

    Traditionally, the visual analogue scale (VAS) has been proposed to overcome the limitations of ordinal measures from Likert-type scales. However, the function of VASs to overcome the limitations of response styles to Likert-type scales has not yet been addressed. Previous research using ranking and paired comparisons to compensate for the response styles of Likert-type scales has suffered from limitations, such as that the total score of ipsative measures is a constant that cannot be analyzed by means of many common statistical techniques. In this study we propose a new scale, called the Visual Analogue Scale for Rating, Ranking, and Paired-Comparison (VAS-RRP), which can be used to collect rating, ranking, and paired-comparison data simultaneously, while avoiding the limitations of each of these data collection methods. The characteristics, use, and analytic method of VAS-RRPs, as well as how they overcome the disadvantages of Likert-type scales, ranking, and VASs, are discussed. On the basis of analyses of simulated and empirical data, this study showed that VAS-RRPs improved reliability, response style bias, and parameter recovery. Finally, we have also designed a VAS-RRP Generator for researchers' construction and administration of their own VAS-RRPs.

  18. Chromatographic and computational assessment of lipophilicity using sum of ranking differences and generalized pair-correlation.

    PubMed

    Andrić, Filip; Héberger, Károly

    2015-02-06

    Lipophilicity (logP) represents one of the most studied and most frequently used fundamental physicochemical properties. At present there are several possibilities for its quantitative expression and many of them stems from chromatographic experiments. Numerous attempts have been made to compare different computational methods, chromatographic methods vs. computational approaches, as well as chromatographic methods and direct shake-flask procedure without definite results or these findings are not accepted generally. In the present work numerous chromatographically derived lipophilicity measures in combination with diverse computational methods were ranked and clustered using the novel variable discrimination and ranking approaches based on the sum of ranking differences and the generalized pair correlation method. Available literature logP data measured on HILIC, and classical reversed-phase combining different classes of compounds have been compared with most frequently used multivariate data analysis techniques (principal component and hierarchical cluster analysis) as well as with the conclusions in the original sources. Chromatographic lipophilicity measures obtained under typical reversed-phase conditions outperform the majority of computationally estimated logPs. Oppositely, in the case of HILIC none of the many proposed chromatographic indices overcomes any of the computationally assessed logPs. Only two of them (logkmin and kmin) may be selected as recommended chromatographic lipophilicity measures. Both ranking approaches, sum of ranking differences and generalized pair correlation method, although based on different backgrounds, provides highly similar variable ordering and grouping leading to the same conclusions. Copyright © 2015. Published by Elsevier B.V.

  19. Use of Brain MRI Atlases to Determine Boundaries of Age-Related Pathology: The Importance of Statistical Method

    PubMed Central

    Dickie, David Alexander; Job, Dominic E.; Gonzalez, David Rodriguez; Shenkin, Susan D.; Wardlaw, Joanna M.

    2015-01-01

    Introduction Neurodegenerative disease diagnoses may be supported by the comparison of an individual patient’s brain magnetic resonance image (MRI) with a voxel-based atlas of normal brain MRI. Most current brain MRI atlases are of young to middle-aged adults and parametric, e.g., mean ±standard deviation (SD); these atlases require data to be Gaussian. Brain MRI data, e.g., grey matter (GM) proportion images, from normal older subjects are apparently not Gaussian. We created a nonparametric and a parametric atlas of the normal limits of GM proportions in older subjects and compared their classifications of GM proportions in Alzheimer’s disease (AD) patients. Methods Using publicly available brain MRI from 138 normal subjects and 138 subjects diagnosed with AD (all 55–90 years), we created: a mean ±SD atlas to estimate parametrically the percentile ranks and limits of normal ageing GM; and, separately, a nonparametric, rank order-based GM atlas from the same normal ageing subjects. GM images from AD patients were then classified with respect to each atlas to determine the effect statistical distributions had on classifications of proportions of GM in AD patients. Results The parametric atlas often defined the lower normal limit of the proportion of GM to be negative (which does not make sense physiologically as the lowest possible proportion is zero). Because of this, for approximately half of the AD subjects, 25–45% of voxels were classified as normal when compared to the parametric atlas; but were classified as abnormal when compared to the nonparametric atlas. These voxels were mainly concentrated in the frontal and occipital lobes. Discussion To our knowledge, we have presented the first nonparametric brain MRI atlas. In conditions where there is increasing variability in brain structure, such as in old age, nonparametric brain MRI atlases may represent the limits of normal brain structure more accurately than parametric approaches. Therefore, we conclude that the statistical method used for construction of brain MRI atlases should be selected taking into account the population and aim under study. Parametric methods are generally robust for defining central tendencies, e.g., means, of brain structure. Nonparametric methods are advisable when studying the limits of brain structure in ageing and neurodegenerative disease. PMID:26023913

  20. Stratification of co-evolving genomic groups using ranked phylogenetic profiles

    PubMed Central

    Freilich, Shiri; Goldovsky, Leon; Gottlieb, Assaf; Blanc, Eric; Tsoka, Sophia; Ouzounis, Christos A

    2009-01-01

    Background Previous methods of detecting the taxonomic origins of arbitrary sequence collections, with a significant impact to genome analysis and in particular metagenomics, have primarily focused on compositional features of genomes. The evolutionary patterns of phylogenetic distribution of genes or proteins, represented by phylogenetic profiles, provide an alternative approach for the detection of taxonomic origins, but typically suffer from low accuracy. Herein, we present rank-BLAST, a novel approach for the assignment of protein sequences into genomic groups of the same taxonomic origin, based on the ranking order of phylogenetic profiles of target genes or proteins across the reference database. Results The rank-BLAST approach is validated by computing the phylogenetic profiles of all sequences for five distinct microbial species of varying degrees of phylogenetic proximity, against a reference database of 243 fully sequenced genomes. The approach - a combination of sequence searches, statistical estimation and clustering - analyses the degree of sequence divergence between sets of protein sequences and allows the classification of protein sequences according to the species of origin with high accuracy, allowing taxonomic classification of 64% of the proteins studied. In most cases, a main cluster is detected, representing the corresponding species. Secondary, functionally distinct and species-specific clusters exhibit different patterns of phylogenetic distribution, thus flagging gene groups of interest. Detailed analyses of such cases are provided as examples. Conclusion Our results indicate that the rank-BLAST approach can capture the taxonomic origins of sequence collections in an accurate and efficient manner. The approach can be useful both for the analysis of genome evolution and the detection of species groups in metagenomics samples. PMID:19860884

  1. Confidence intervals for single-case effect size measures based on randomization test inversion.

    PubMed

    Michiels, Bart; Heyvaert, Mieke; Meulders, Ann; Onghena, Patrick

    2017-02-01

    In the current paper, we present a method to construct nonparametric confidence intervals (CIs) for single-case effect size measures in the context of various single-case designs. We use the relationship between a two-sided statistical hypothesis test at significance level α and a 100 (1 - α) % two-sided CI to construct CIs for any effect size measure θ that contain all point null hypothesis θ values that cannot be rejected by the hypothesis test at significance level α. This method of hypothesis test inversion (HTI) can be employed using a randomization test as the statistical hypothesis test in order to construct a nonparametric CI for θ. We will refer to this procedure as randomization test inversion (RTI). We illustrate RTI in a situation in which θ is the unstandardized and the standardized difference in means between two treatments in a completely randomized single-case design. Additionally, we demonstrate how RTI can be extended to other types of single-case designs. Finally, we discuss a few challenges for RTI as well as possibilities when using the method with other effect size measures, such as rank-based nonoverlap indices. Supplementary to this paper, we provide easy-to-use R code, which allows the user to construct nonparametric CIs according to the proposed method.

  2. Statistical analysis to assess automated level of suspicion scoring methods in breast ultrasound

    NASA Astrophysics Data System (ADS)

    Galperin, Michael

    2003-05-01

    A well-defined rule-based system has been developed for scoring 0-5 the Level of Suspicion (LOS) based on qualitative lexicon describing the ultrasound appearance of breast lesion. The purposes of the research are to asses and select one of the automated LOS scoring quantitative methods developed during preliminary studies in benign biopsies reduction. The study has used Computer Aided Imaging System (CAIS) to improve the uniformity and accuracy of applying the LOS scheme by automatically detecting, analyzing and comparing breast masses. The overall goal is to reduce biopsies on the masses with lower levels of suspicion, rather that increasing the accuracy of diagnosis of cancers (will require biopsy anyway). On complex cysts and fibroadenoma cases experienced radiologists were up to 50% less certain in true negatives than CAIS. Full correlation analysis was applied to determine which of the proposed LOS quantification methods serves CAIS accuracy the best. This paper presents current results of applying statistical analysis for automated LOS scoring quantification for breast masses with known biopsy results. It was found that First Order Ranking method yielded most the accurate results. The CAIS system (Image Companion, Data Companion software) is developed by Almen Laboratories and was used to achieve the results.

  3. Sparse subspace clustering for data with missing entries and high-rank matrix completion.

    PubMed

    Fan, Jicong; Chow, Tommy W S

    2017-09-01

    Many methods have recently been proposed for subspace clustering, but they are often unable to handle incomplete data because of missing entries. Using matrix completion methods to recover missing entries is a common way to solve the problem. Conventional matrix completion methods require that the matrix should be of low-rank intrinsically, but most matrices are of high-rank or even full-rank in practice, especially when the number of subspaces is large. In this paper, a new method called Sparse Representation with Missing Entries and Matrix Completion is proposed to solve the problems of incomplete-data subspace clustering and high-rank matrix completion. The proposed algorithm alternately computes the matrix of sparse representation coefficients and recovers the missing entries of a data matrix. The proposed algorithm recovers missing entries through minimizing the representation coefficients, representation errors, and matrix rank. Thorough experimental study and comparative analysis based on synthetic data and natural images were conducted. The presented results demonstrate that the proposed algorithm is more effective in subspace clustering and matrix completion compared with other existing methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  4. Practical application of cure mixture model for long-term censored survivor data from a withdrawal clinical trial of patients with major depressive disorder.

    PubMed

    Arano, Ichiro; Sugimoto, Tomoyuki; Hamasaki, Toshimitsu; Ohno, Yuko

    2010-04-23

    Survival analysis methods such as the Kaplan-Meier method, log-rank test, and Cox proportional hazards regression (Cox regression) are commonly used to analyze data from randomized withdrawal studies in patients with major depressive disorder. However, unfortunately, such common methods may be inappropriate when a long-term censored relapse-free time appears in data as the methods assume that if complete follow-up were possible for all individuals, each would eventually experience the event of interest. In this paper, to analyse data including such a long-term censored relapse-free time, we discuss a semi-parametric cure regression (Cox cure regression), which combines a logistic formulation for the probability of occurrence of an event with a Cox proportional hazards specification for the time of occurrence of the event. In specifying the treatment's effect on disease-free survival, we consider the fraction of long-term survivors and the risks associated with a relapse of the disease. In addition, we develop a tree-based method for the time to event data to identify groups of patients with differing prognoses (cure survival CART). Although analysis methods typically adapt the log-rank statistic for recursive partitioning procedures, the method applied here used a likelihood ratio (LR) test statistic from a fitting of cure survival regression assuming exponential and Weibull distributions for the latency time of relapse. The method is illustrated using data from a sertraline randomized withdrawal study in patients with major depressive disorder. We concluded that Cox cure regression reveals facts on who may be cured, and how the treatment and other factors effect on the cured incidence and on the relapse time of uncured patients, and that cure survival CART output provides easily understandable and interpretable information, useful both in identifying groups of patients with differing prognoses and in utilizing Cox cure regression models leading to meaningful interpretations.

  5. Evaluation of the osteoclastogenic process associated with RANK / RANK-L / OPG in odontogenic myxomas

    PubMed Central

    González-Galván, María del Carmen; Mosqueda-Taylor, Adalberto; Bologna-Molina, Ronell; Setien-Olarra, Amaia; Marichalar-Mendia, Xabier; Aguirre-Urizar, José-Manuel

    2018-01-01

    Background Odontogenic myxoma (OM) is a benign intraosseous neoplasm that exhibits local aggressiveness and high recurrence rates. Osteoclastogenesis is an important phenomenon in the tumor growth of maxillary neoplasms. RANK (Receptor Activator of Nuclear Factor κappa B) is the signaling receptor of RANK-L (Receptor activator of nuclear factor kappa-Β ligand) that activates the osteoclasts. OPG (osteoprotegerin) is a decoy receptor for RANK-L that inhibits pro-osteoclastogenesis. The RANK / RANKL / OPG system participates in the regulation of osteolytic activity under normal conditions, and its alteration has been associated with greater bone destruction, and also with tumor growth. Objectives To analyze the immunohistochemical expression of OPG, RANK and RANK-L proteins in odontogenic myxomas (OMs) and their relationship with the tumor size. Material and Methods Eighteen OMs, 4 small (<3 cm) and 14 large (> 3cm) and 18 dental follicles (DF) that were included as control were studied by means of standard immunohistochemical procedure with RANK, RANKL and OPG antibodies. For the evaluation, 5 fields (40x) of representative areas of OM and DF were selected where the expression of each antibody was determined. Descriptive and comparative statistical analyses were performed with the obtained data. Results There are significant differences in the expression of RANK in OM samples as compared to DF (p = 0.022) and among the OMSs and OMLs (p = 0.032). Also a strong association is recognized in the expression of RANK-L and OPG in OM samples. Conclusions Activation of the RANK / RANK-L / OPG triad seems to be involved in the mechanisms of bone balance and destruction, as well as associated with tumor growth in odontogenic myxomas. Key words:Odontogenic myxoma, dental follicle, RANK, RANK-L, OPG, osteoclastogenesis. PMID:29680857

  6. Validation of an image-based technique to assess the perceptual quality of clinical chest radiographs with an observer study

    NASA Astrophysics Data System (ADS)

    Lin, Yuan; Choudhury, Kingshuk R.; McAdams, H. Page; Foos, David H.; Samei, Ehsan

    2014-03-01

    We previously proposed a novel image-based quality assessment technique1 to assess the perceptual quality of clinical chest radiographs. In this paper, an observer study was designed and conducted to systematically validate this technique. Ten metrics were involved in the observer study, i.e., lung grey level, lung detail, lung noise, riblung contrast, rib sharpness, mediastinum detail, mediastinum noise, mediastinum alignment, subdiaphragm-lung contrast, and subdiaphragm area. For each metric, three tasks were successively presented to the observers. In each task, six ROI images were randomly presented in a row and observers were asked to rank the images only based on a designated quality and disregard the other qualities. A range slider on the top of the images was used for observers to indicate the acceptable range based on the corresponding perceptual attribute. Five boardcertificated radiologists from Duke participated in this observer study on a DICOM calibrated diagnostic display workstation and under low ambient lighting conditions. The observer data were analyzed in terms of the correlations between the observer ranking orders and the algorithmic ranking orders. Based on the collected acceptable ranges, quality consistency ranges were statistically derived. The observer study showed that, for each metric, the averaged ranking orders of the participated observers were strongly correlated with the algorithmic orders. For the lung grey level, the observer ranking orders completely accorded with the algorithmic ranking orders. The quality consistency ranges derived from this observer study were close to these derived from our previous study. The observer study indicates that the proposed image-based quality assessment technique provides a robust reflection of the perceptual image quality of the clinical chest radiographs. The derived quality consistency ranges can be used to automatically predict the acceptability of a clinical chest radiograph.

  7. ArraySolver: an algorithm for colour-coded graphical display and Wilcoxon signed-rank statistics for comparing microarray gene expression data.

    PubMed

    Khan, Haseeb Ahmad

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann-Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n < or = 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform.

  8. ArraySolver: An Algorithm for Colour-Coded Graphical Display and Wilcoxon Signed-Rank Statistics for Comparing Microarray Gene Expression Data

    PubMed Central

    2004-01-01

    The massive surge in the production of microarray data poses a great challenge for proper analysis and interpretation. In recent years numerous computational tools have been developed to extract meaningful interpretation of microarray gene expression data. However, a convenient tool for two-groups comparison of microarray data is still lacking and users have to rely on commercial statistical packages that might be costly and require special skills, in addition to extra time and effort for transferring data from one platform to other. Various statistical methods, including the t-test, analysis of variance, Pearson test and Mann–Whitney U test, have been reported for comparing microarray data, whereas the utilization of the Wilcoxon signed-rank test, which is an appropriate test for two-groups comparison of gene expression data, has largely been neglected in microarray studies. The aim of this investigation was to build an integrated tool, ArraySolver, for colour-coded graphical display and comparison of gene expression data using the Wilcoxon signed-rank test. The results of software validation showed similar outputs with ArraySolver and SPSS for large datasets. Whereas the former program appeared to be more accurate for 25 or fewer pairs (n ≤ 25), suggesting its potential application in analysing molecular signatures that usually contain small numbers of genes. The main advantages of ArraySolver are easy data selection, convenient report format, accurate statistics and the familiar Excel platform. PMID:18629036

  9. Deaths: leading causes for 2005.

    PubMed

    Heron, Melonie; Tejada-Vera, Betzaida

    2009-12-23

    This report presents final 2005 data on the 10 leading causes of death in the United States by age, race, sex, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements the annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2005. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD-10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2005, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Cerebrovascular diseases; Chronic lower respiratory diseases; Accidents (unintentional injuries); Diabetes mellitus; Alzheimer's disease; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Septicemia. They accounted for about 77 percent of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2005 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birthweight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Newborn affected by complications of placenta, cord and membranes; Accidents (unintentional injuries); Respiratory distress of newborn; Bacterial sepsis of newborn; Neonatal hemorrhage; and Necrotizing enterocolitis of newborn. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods.

  10. Learning and understanding the Kruskal-Wallis one-way analysis-of-variance-by-ranks test for differences among three or more independent groups.

    PubMed

    Chan, Y; Walmsley, R P

    1997-12-01

    When several treatment methods are available for the same problem, many clinicians are faced with the task of deciding which treatment to use. Many clinicians may have conducted informal "mini-experiments" on their own to determine which treatment is best suited for the problem. These results are usually not documented or reported in a formal manner because many clinicians feel that they are "statistically challenged." Another reason may be because clinicians do not feel they have controlled enough test conditions to warrant analysis. In this update, a statistic is described that does not involve complicated statistical assumptions, making it a simple and easy-to-use statistical method. This update examines the use of two statistics and does not deal with other issues that could affect clinical research such as issues affecting credibility. For readers who want a more in-depth examination of this topic, references have been provided. The Kruskal-Wallis one-way analysis-of-variance-by-ranks test (or H test) is used to determine whether three or more independent groups are the same or different on some variable of interest when an ordinal level of data or an interval or ratio level of data is available. A hypothetical example will be presented to explain when and how to use this statistic, how to interpret results using the statistic, the advantages and disadvantages of the statistic, and what to look for in a written report. This hypothetical example will involve the use of ratio data to demonstrate how to choose between using the nonparametric H test and the more powerful parametric F test.

  11. Computer-Assisted Search Of Large Textual Data Bases

    NASA Technical Reports Server (NTRS)

    Driscoll, James R.

    1995-01-01

    "QA" denotes high-speed computer system for searching diverse collections of documents including (but not limited to) technical reference manuals, legal documents, medical documents, news releases, and patents. Incorporates previously available and emerging information-retrieval technology to help user intelligently and rapidly locate information found in large textual data bases. Technology includes provision for inquiries in natural language; statistical ranking of retrieved information; artificial-intelligence implementation of semantics, in which "surface level" knowledge found in text used to improve ranking of retrieved information; and relevance feedback, in which user's judgements of relevance of some retrieved documents used automatically to modify search for further information.

  12. TREATMENT SWITCHING: STATISTICAL AND DECISION-MAKING CHALLENGES AND APPROACHES.

    PubMed

    Latimer, Nicholas R; Henshall, Chris; Siebert, Uwe; Bell, Helen

    2016-01-01

    Treatment switching refers to the situation in a randomized controlled trial where patients switch from their randomly assigned treatment onto an alternative. Often, switching is from the control group onto the experimental treatment. In this instance, a standard intention-to-treat analysis does not identify the true comparative effectiveness of the treatments under investigation. We aim to describe statistical methods for adjusting for treatment switching in a comprehensible way for nonstatisticians, and to summarize views on these methods expressed by stakeholders at the 2014 Adelaide International Workshop on Treatment Switching in Clinical Trials. We describe three statistical methods used to adjust for treatment switching: marginal structural models, two-stage adjustment, and rank preserving structural failure time models. We draw upon discussion heard at the Adelaide International Workshop to explore the views of stakeholders on the acceptability of these methods. Stakeholders noted that adjustment methods are based on assumptions, the validity of which may often be questionable. There was disagreement on the acceptability of adjustment methods, but consensus that when these are used, they should be justified rigorously. The utility of adjustment methods depends upon the decision being made and the processes used by the decision-maker. Treatment switching makes estimating the true comparative effect of a new treatment challenging. However, many decision-makers have reservations with adjustment methods. These, and how they affect the utility of adjustment methods, require further exploration. Further technical work is required to develop adjustment methods to meet real world needs, to enhance their acceptability to decision-makers.

  13. Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data

    PubMed Central

    Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark

    2010-01-01

    Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815

  14. The rank correlated FSK model for prediction of gas radiation in non-uniform media, and its relationship to the rank correlated SLW model

    NASA Astrophysics Data System (ADS)

    Solovjov, Vladimir P.; Webb, Brent W.; Andre, Frederic

    2018-07-01

    Following previous theoretical development based on the assumption of a rank correlated spectrum, the Rank Correlated Full Spectrum k-distribution (RC-FSK) method is proposed. The method proves advantageous in modeling radiation transfer in high temperature gases in non-uniform media in two important ways. First, and perhaps most importantly, the method requires no specification of a reference gas thermodynamic state. Second, the spectral construction of the RC-FSK model is simpler than original correlated FSK models, requiring only two cumulative k-distributions. Further, although not exhaustive, example problems presented here suggest that the method may also yield improved accuracy relative to prior methods, and may exhibit less sensitivity to the blackbody source temperature used in the model predictions. This paper outlines the theoretical development of the RC-FSK method, comparing the spectral construction with prior correlated spectrum FSK method formulations. Further the RC-FSK model's relationship to the Rank Correlated Spectral Line Weighted-sum-of-gray-gases (RC-SLW) model is defined. The work presents predictions using the Rank Correlated FSK method and previous FSK methods in three different example problems. Line-by-line benchmark predictions are used to assess the accuracy.

  15. CSmetaPred: a consensus method for prediction of catalytic residues.

    PubMed

    Choudhary, Preeti; Kumar, Shailesh; Bachhawat, Anand Kumar; Pandit, Shashi Bhushan

    2017-12-22

    Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for ~73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/ .

  16. Beyond job security and money: driving factors of motivation for government doctors in India

    PubMed Central

    2014-01-01

    Background Despite many efforts from government to address the shortage of medical officers (MOs) in rural areas, rural health centres continue to suffer from severe shortage of MOs. Lack of motivation to join and continue service in rural areas is a major reason for such shortage. In the present study, we aimed to assess and rank the driving factors of motivation important for in-service MOs in their current job. Methods The study participants included ninety two in-service government MOs from three states in India. The study participants were required to rank 14 factors of motivation important for them in their current job. The factors for the study were selected using Herzberg’s two-factor theory of motivation and the data were collected using an instrument that has an established reliability and validity. Test of Kendall’s coefficient of concordance (W) was carried out to assess the agreement in ranks assigned by participants to various motivation factors. Next, we studied the distributions of ranks of different motivating factors using standard descriptive statistics and box plots, which gave us interesting insights into the strength of agreement of the MOs in assigning ranks to various factors. And finally to assess whether MOs are more intrinsically motivated or extrinsically motivated, we used Kolmogorov-Smirnov test. Results The (W) test indicated statistically significant (P < 0.01) agreement of the participants in assigning ranks. The Kolmogorov-Smirnov test indicated that from policy perspectives, MOs place significantly more motivational importance to intrinsic factors than to extrinsic factors. The study results indicate that job security was the most important factor related to motivation, closely followed by interesting work and respect and recognition. Among the top five preferred factors, three were intrinsic factors indicating a great importance given by MOs to factors beyond money and job security. Conclusion To address the issue of motivation, the health departments need to pay close attention to devising management strategies that address not only extrinsic but also intrinsic factors of motivation. The study results may be useful to understand the complicated issue of work motivation and can give some useful insights to design comprehensive management strategies that are based on motivational needs of MOs. PMID:24555787

  17. Seeking Health Information Online: Does Wikipedia Matter?

    PubMed Central

    Laurent, Michaël R.; Vickers, Tim J.

    2009-01-01

    Objective To determine the significance of the English Wikipedia as a source of online health information. Design The authors measured Wikipedia's ranking on general Internet search engines by entering keywords from MedlinePlus, NHS Direct Online, and the National Organization of Rare Diseases as queries into search engine optimization software. We assessed whether article quality influenced this ranking. The authors tested whether traffic to Wikipedia coincided with epidemiological trends and news of emerging health concerns, and how it compares to MedlinePlus. Measurements Cumulative incidence and average position of Wikipedia® compared to other Web sites among the first 20 results on general Internet search engines (Google®, Google UK®, Yahoo®, and MSN®), and page view statistics for selected Wikipedia articles and MedlinePlus pages. Results Wikipedia ranked among the first ten results in 71–85% of search engines and keywords tested. Wikipedia surpassed MedlinePlus and NHS Direct Online (except for queries from the latter on Google UK), and ranked higher with quality articles. Wikipedia ranked highest for rare diseases, although its incidence in several categories decreased. Page views increased parallel to the occurrence of 20 seasonal disorders and news of three emerging health concerns. Wikipedia articles were viewed more often than MedlinePlus Topic (p = 0.001) but for MedlinePlus Encyclopedia pages, the trend was not significant (p = 0.07–0.10). Conclusions Based on its search engine ranking and page view statistics, the English Wikipedia is a prominent source of online health information compared to the other online health information providers studied. PMID:19390105

  18. Ranking of predictor variables based on effect size criterion provides an accurate means of automatically classifying opinion column articles

    NASA Astrophysics Data System (ADS)

    Legara, Erika Fille; Monterola, Christopher; Abundo, Cheryl

    2011-01-01

    We demonstrate an accurate procedure based on linear discriminant analysis that allows automatic authorship classification of opinion column articles. First, we extract the following stylometric features of 157 column articles from four authors: statistics on high frequency words, number of words per sentence, and number of sentences per paragraph. Then, by systematically ranking these features based on an effect size criterion, we show that we can achieve an average classification accuracy of 93% for the test set. In comparison, frequency size based ranking has an average accuracy of 80%. The highest possible average classification accuracy of our data merely relying on chance is ∼31%. By carrying out sensitivity analysis, we show that the effect size criterion is superior than frequency ranking because there exist low frequency words that significantly contribute to successful author discrimination. Consistent results are seen when the procedure is applied in classifying the undisputed Federalist papers of Alexander Hamilton and James Madison. To the best of our knowledge, the work is the first attempt in classifying opinion column articles, that by virtue of being shorter in length (as compared to novels or short stories), are more prone to over-fitting issues. The near perfect classification for the longer papers supports this claim. Our results provide an important insight on authorship attribution that has been overlooked in previous studies: that ranking discriminant variables based on word frequency counts is not necessarily an optimal procedure.

  19. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods.

    PubMed

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J Sunil

    2014-08-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called "Patient Recursive Survival Peeling" is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called "combined" cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication.

  20. Cross-Validation of Survival Bump Hunting by Recursive Peeling Methods

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    We introduce a survival/risk bump hunting framework to build a bump hunting model with a possibly censored time-to-event type of response and to validate model estimates. First, we describe the use of adequate survival peeling criteria to build a survival/risk bump hunting model based on recursive peeling methods. Our method called “Patient Recursive Survival Peeling” is a rule-induction method that makes use of specific peeling criteria such as hazard ratio or log-rank statistics. Second, to validate our model estimates and improve survival prediction accuracy, we describe a resampling-based validation technique specifically designed for the joint task of decision rule making by recursive peeling (i.e. decision-box) and survival estimation. This alternative technique, called “combined” cross-validation is done by combining test samples over the cross-validation loops, a design allowing for bump hunting by recursive peeling in a survival setting. We provide empirical results showing the importance of cross-validation and replication. PMID:26997922

  1. Assessment of technological level of stem cell research using principal component analysis.

    PubMed

    Do Cho, Sung; Hwan Hyun, Byung; Kim, Jae Kyeom

    2016-01-01

    In general, technological levels have been assessed based on specialist's opinion through the methods such as Delphi. But in such cases, results could be significantly biased per study design and individual expert. In this study, therefore scientific literatures and patents were selected by means of analytic indexes for statistic approach and technical assessment of stem cell fields. The analytic indexes, numbers and impact indexes of scientific literatures and patents, were weighted based on principal component analysis, and then, were summated into the single value. Technological obsolescence was calculated through the cited half-life of patents issued by the United States Patents and Trademark Office and was reflected in technological level assessment. As results, ranks of each nation's in reference to the technology level were rated by the proposed method. Furthermore we were able to evaluate strengthens and weaknesses thereof. Although our empirical research presents faithful results, in the further study, there is a need to compare the existing methods and the suggested method.

  2. The Trail Making test: a study of its ability to predict falls in the acute neurological in-patient population.

    PubMed

    Mateen, Bilal Akhter; Bussas, Matthias; Doogan, Catherine; Waller, Denise; Saverino, Alessia; Király, Franz J; Playford, E Diane

    2018-05-01

    To determine whether tests of cognitive function and patient-reported outcome measures of motor function can be used to create a machine learning-based predictive tool for falls. Prospective cohort study. Tertiary neurological and neurosurgical center. In all, 337 in-patients receiving neurosurgical, neurological, or neurorehabilitation-based care. Binary (Y/N) for falling during the in-patient episode, the Trail Making Test (a measure of attention and executive function) and the Walk-12 (a patient-reported measure of physical function). The principal outcome was a fall during the in-patient stay ( n = 54). The Trail test was identified as the best predictor of falls. Moreover, addition of other variables, did not improve the prediction (Wilcoxon signed-rank P < 0.001). Classical linear statistical modeling methods were then compared with more recent machine learning based strategies, for example, random forests, neural networks, support vector machines. The random forest was the best modeling strategy when utilizing just the Trail Making Test data (Wilcoxon signed-rank P < 0.001) with 68% (± 7.7) sensitivity, and 90% (± 2.3) specificity. This study identifies a simple yet powerful machine learning (Random Forest) based predictive model for an in-patient neurological population, utilizing a single neuropsychological test of cognitive function, the Trail Making test.

  3. A sampling-based method for ranking protein structural models by integrating multiple scores and features.

    PubMed

    Shi, Xiaohu; Zhang, Jingfen; He, Zhiquan; Shang, Yi; Xu, Dong

    2011-09-01

    One of the major challenges in protein tertiary structure prediction is structure quality assessment. In many cases, protein structure prediction tools generate good structural models, but fail to select the best models from a huge number of candidates as the final output. In this study, we developed a sampling-based machine-learning method to rank protein structural models by integrating multiple scores and features. First, features such as predicted secondary structure, solvent accessibility and residue-residue contact information are integrated by two Radial Basis Function (RBF) models trained from different datasets. Then, the two RBF scores and five selected scoring functions developed by others, i.e., Opus-CA, Opus-PSP, DFIRE, RAPDF, and Cheng Score are synthesized by a sampling method. At last, another integrated RBF model ranks the structural models according to the features of sampling distribution. We tested the proposed method by using two different datasets, including the CASP server prediction models of all CASP8 targets and a set of models generated by our in-house software MUFOLD. The test result shows that our method outperforms any individual scoring function on both best model selection, and overall correlation between the predicted ranking and the actual ranking of structural quality.

  4. Incorporating linguistic knowledge for learning distributed word representations.

    PubMed

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.

  5. Incorporating Linguistic Knowledge for Learning Distributed Word Representations

    PubMed Central

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining. PMID:25874581

  6. Fast Low-Rank Bayesian Matrix Completion With Hierarchical Gaussian Prior Models

    NASA Astrophysics Data System (ADS)

    Yang, Linxiao; Fang, Jun; Duan, Huiping; Li, Hongbin; Zeng, Bing

    2018-06-01

    The problem of low rank matrix completion is considered in this paper. To exploit the underlying low-rank structure of the data matrix, we propose a hierarchical Gaussian prior model, where columns of the low-rank matrix are assumed to follow a Gaussian distribution with zero mean and a common precision matrix, and a Wishart distribution is specified as a hyperprior over the precision matrix. We show that such a hierarchical Gaussian prior has the potential to encourage a low-rank solution. Based on the proposed hierarchical prior model, a variational Bayesian method is developed for matrix completion, where the generalized approximate massage passing (GAMP) technique is embedded into the variational Bayesian inference in order to circumvent cumbersome matrix inverse operations. Simulation results show that our proposed method demonstrates superiority over existing state-of-the-art matrix completion methods.

  7. Time evolution of Wikipedia network ranking

    NASA Astrophysics Data System (ADS)

    Eom, Young-Ho; Frahm, Klaus M.; Benczúr, András; Shepelyansky, Dima L.

    2013-12-01

    We study the time evolution of ranking and spectral properties of the Google matrix of English Wikipedia hyperlink network during years 2003-2011. The statistical properties of ranking of Wikipedia articles via PageRank and CheiRank probabilities, as well as the matrix spectrum, are shown to be stabilized for 2007-2011. A special emphasis is done on ranking of Wikipedia personalities and universities. We show that PageRank selection is dominated by politicians while 2DRank, which combines PageRank and CheiRank, gives more accent on personalities of arts. The Wikipedia PageRank of universities recovers 80% of top universities of Shanghai ranking during the considered time period.

  8. Signal detection on spontaneous reports of adverse events following immunisation: a comparison of the performance of a disproportionality-based algorithm and a time-to-onset-based algorithm

    PubMed Central

    van Holle, Lionel; Bauchau, Vincent

    2014-01-01

    Purpose Disproportionality methods measure how unexpected the observed number of adverse events is. Time-to-onset (TTO) methods measure how unexpected the TTO distribution of a vaccine-event pair is compared with what is expected from other vaccines and events. Our purpose is to compare the performance associated with each method. Methods For the disproportionality algorithms, we defined 336 combinations of stratification factors (sex, age, region and year) and threshold values of the multi-item gamma Poisson shrinker (MGPS). For the TTO algorithms, we defined 18 combinations of significance level and time windows. We used spontaneous reports of adverse events recorded for eight vaccines. The vaccine product labels were used as proxies for true safety signals. Algorithms were ranked according to their positive predictive value (PPV) for each vaccine separately; amedian rank was attributed to each algorithm across vaccines. Results The algorithm with the highest median rank was based on TTO with a significance level of 0.01 and a time window of 60 days after immunisation. It had an overall PPV 2.5 times higher than for the highest-ranked MGPS algorithm, 16th rank overall, which was fully stratified and had a threshold value of 0.8. A TTO algorithm with roughly the same sensitivity as the highest-ranked MGPS had better specificity but longer time-to-detection. Conclusions Within the scope of this study, the majority of the TTO algorithms presented a higher PPV than for any MGPS algorithm. Considering the complementarity of TTO and disproportionality methods, a signal detection strategy combining them merits further investigation. PMID:24038719

  9. Using volcano plots and regularized-chi statistics in genetic association studies.

    PubMed

    Li, Wentian; Freudenberg, Jan; Suh, Young Ju; Yang, Yaning

    2014-02-01

    Labor intensive experiments are typically required to identify the causal disease variants from a list of disease associated variants in the genome. For designing such experiments, candidate variants are ranked by their strength of genetic association with the disease. However, the two commonly used measures of genetic association, the odds-ratio (OR) and p-value may rank variants in different order. To integrate these two measures into a single analysis, here we transfer the volcano plot methodology from gene expression analysis to genetic association studies. In its original setting, volcano plots are scatter plots of fold-change and t-test statistic (or -log of the p-value), with the latter being more sensitive to sample size. In genetic association studies, the OR and Pearson's chi-square statistic (or equivalently its square root, chi; or the standardized log(OR)) can be analogously used in a volcano plot, allowing for their visual inspection. Moreover, the geometric interpretation of these plots leads to an intuitive method for filtering results by a combination of both OR and chi-square statistic, which we term "regularized-chi". This method selects associated markers by a smooth curve in the volcano plot instead of the right-angled lines which corresponds to independent cutoffs for OR and chi-square statistic. The regularized-chi incorporates relatively more signals from variants with lower minor-allele-frequencies than chi-square test statistic. As rare variants tend to have stronger functional effects, regularized-chi is better suited to the task of prioritization of candidate genes. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. RECONSTRUCTION OF DYNAMIC IMAGE SERIES FROM UNDERSAMPLED MRI DATA USING DATA-DRIVEN MODEL CONSISTENCY CONDITION (MOCCO)

    PubMed Central

    Velikina, Julia V.; Samsonov, Alexey A.

    2014-01-01

    Purpose To accelerate dynamic MR imaging through development of a novel image reconstruction technique using low-rank temporal signal models pre-estimated from training data. Theory We introduce the MOdel Consistency COndition (MOCCO) technique that utilizes temporal models to regularize the reconstruction without constraining the solution to be low-rank as performed in related techniques. This is achieved by using a data-driven model to design a transform for compressed sensing-type regularization. The enforcement of general compliance with the model without excessively penalizing deviating signal allows recovery of a full-rank solution. Methods Our method was compared to standard low-rank approach utilizing model-based dimensionality reduction in phantoms and patient examinations for time-resolved contrast-enhanced angiography (CE MRA) and cardiac CINE imaging. We studied sensitivity of all methods to rank-reduction and temporal subspace modeling errors. Results MOCCO demonstrated reduced sensitivity to modeling errors compared to the standard approach. Full-rank MOCCO solutions showed significantly improved preservation of temporal fidelity and aliasing/noise suppression in highly accelerated CE MRA (acceleration up to 27) and cardiac CINE (acceleration up to 15) data. Conclusions MOCCO overcomes several important deficiencies of previously proposed methods based on pre-estimated temporal models and allows high quality image restoration from highly undersampled CE-MRA and cardiac CINE data. PMID:25399724

  11. PageRank as a method to rank biomedical literature by importance.

    PubMed

    Yates, Elliot J; Dixon, Louise C

    2015-01-01

    Optimal ranking of literature importance is vital in overcoming article overload. Existing ranking methods are typically based on raw citation counts, giving a sum of 'inbound' links with no consideration of citation importance. PageRank, an algorithm originally developed for ranking webpages at the search engine, Google, could potentially be adapted to bibliometrics to quantify the relative importance weightings of a citation network. This article seeks to validate such an approach on the freely available, PubMed Central open access subset (PMC-OAS) of biomedical literature. On-demand cloud computing infrastructure was used to extract a citation network from over 600,000 full-text PMC-OAS articles. PageRanks and citation counts were calculated for each node in this network. PageRank is highly correlated with citation count (R = 0.905, P < 0.01) and we thus validate the former as a surrogate of literature importance. Furthermore, the algorithm can be run in trivial time on cheap, commodity cluster hardware, lowering the barrier of entry for resource-limited open access organisations. PageRank can be trivially computed on commodity cluster hardware and is linearly correlated with citation count. Given its putative benefits in quantifying relative importance, we suggest it may enrich the citation network, thereby overcoming the existing inadequacy of citation counts alone. We thus suggest PageRank as a feasible supplement to, or replacement of, existing bibliometric ranking methods.

  12. A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities.

    PubMed

    Soneson, Charlotte; Fontes, Magnus

    2012-01-01

    Analysis of multivariate data sets from, for example, microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper, we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward comparison between any 2 lists. It can also be used to generate new more stable gene rankings incorporating more information from the experimental data. Using 2 microarray data sets, we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance of the rankings.

  13. Bayesian Inference of Natural Rankings in Incomplete Competition Networks

    PubMed Central

    Park, Juyong; Yook, Soon-Hyung

    2014-01-01

    Competition between a complex system's constituents and a corresponding reward mechanism based on it have profound influence on the functioning, stability, and evolution of the system. But determining the dominance hierarchy or ranking among the constituent parts from the strongest to the weakest – essential in determining reward and penalty – is frequently an ambiguous task due to the incomplete (partially filled) nature of competition networks. Here we introduce the “Natural Ranking,” an unambiguous ranking method applicable to a round robin tournament, and formulate an analytical model based on the Bayesian formula for inferring the expected mean and error of the natural ranking of nodes from an incomplete network. We investigate its potential and uses in resolving important issues of ranking by applying it to real-world competition networks. PMID:25163528

  14. Bayesian Inference of Natural Rankings in Incomplete Competition Networks

    NASA Astrophysics Data System (ADS)

    Park, Juyong; Yook, Soon-Hyung

    2014-08-01

    Competition between a complex system's constituents and a corresponding reward mechanism based on it have profound influence on the functioning, stability, and evolution of the system. But determining the dominance hierarchy or ranking among the constituent parts from the strongest to the weakest - essential in determining reward and penalty - is frequently an ambiguous task due to the incomplete (partially filled) nature of competition networks. Here we introduce the ``Natural Ranking,'' an unambiguous ranking method applicable to a round robin tournament, and formulate an analytical model based on the Bayesian formula for inferring the expected mean and error of the natural ranking of nodes from an incomplete network. We investigate its potential and uses in resolving important issues of ranking by applying it to real-world competition networks.

  15. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization.

    PubMed

    Jia, Zhilong; Zhang, Xiang; Guan, Naiyang; Bo, Xiaochen; Barnes, Michael R; Luo, Zhigang

    2015-01-01

    RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher's discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes' weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher's criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.

  16. Network-based ranking methods for prediction of novel disease associated microRNAs.

    PubMed

    Le, Duc-Hau

    2015-10-01

    Many studies have shown roles of microRNAs on human disease and a number of computational methods have been proposed to predict such associations by ranking candidate microRNAs according to their relevance to a disease. Among them, machine learning-based methods usually have a limitation in specifying non-disease microRNAs as negative training samples. Meanwhile, network-based methods are becoming dominant since they well exploit a "disease module" principle in microRNA functional similarity networks. Of which, random walk with restart (RWR) algorithm-based method is currently state-of-the-art. The use of this algorithm was inspired from its success in predicting disease gene because the "disease module" principle also exists in protein interaction networks. Besides, many algorithms designed for webpage ranking have been successfully applied in ranking disease candidate genes because web networks share topological properties with protein interaction networks. However, these algorithms have not yet been utilized for disease microRNA prediction. We constructed microRNA functional similarity networks based on shared targets of microRNAs, and then we integrated them with a microRNA functional synergistic network, which was recently identified. After analyzing topological properties of these networks, in addition to RWR, we assessed the performance of (i) PRINCE (PRIoritizatioN and Complex Elucidation), which was proposed for disease gene prediction; (ii) PageRank with Priors (PRP) and K-Step Markov (KSM), which were used for studying web networks; and (iii) a neighborhood-based algorithm. Analyses on topological properties showed that all microRNA functional similarity networks are small-worldness and scale-free. The performance of each algorithm was assessed based on average AUC values on 35 disease phenotypes and average rankings of newly discovered disease microRNAs. As a result, the performance on the integrated network was better than that on individual ones. In addition, the performance of PRINCE, PRP and KSM was comparable with that of RWR, whereas it was worst for the neighborhood-based algorithm. Moreover, all the algorithms were stable with the change of parameters. Final, using the integrated network, we predicted six novel miRNAs (i.e., hsa-miR-101, hsa-miR-181d, hsa-miR-192, hsa-miR-423-3p, hsa-miR-484 and hsa-miR-98) associated with breast cancer. Network-based ranking algorithms, which were successfully applied for either disease gene prediction or for studying social/web networks, can be also used effectively for disease microRNA prediction. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Evaluation of statistical and rainfall-runoff models for predicting historical daily streamflow time series in the Des Moines and Iowa River watersheds

    USGS Publications Warehouse

    Farmer, William H.; Knight, Rodney R.; Eash, David A.; Kasey J. Hutchinson,; Linhart, S. Mike; Christiansen, Daniel E.; Archfield, Stacey A.; Over, Thomas M.; Kiang, Julie E.

    2015-08-24

    Daily records of streamflow are essential to understanding hydrologic systems and managing the interactions between human and natural systems. Many watersheds and locations lack streamgages to provide accurate and reliable records of daily streamflow. In such ungaged watersheds, statistical tools and rainfall-runoff models are used to estimate daily streamflow. Previous work compared 19 different techniques for predicting daily streamflow records in the southeastern United States. Here, five of the better-performing methods are compared in a different hydroclimatic region of the United States, in Iowa. The methods fall into three classes: (1) drainage-area ratio methods, (2) nonlinear spatial interpolations using flow duration curves, and (3) mechanistic rainfall-runoff models. The first two classes are each applied with nearest-neighbor and map-correlated index streamgages. Using a threefold validation and robust rank-based evaluation, the methods are assessed for overall goodness of fit of the hydrograph of daily streamflow, the ability to reproduce a daily, no-fail storage-yield curve, and the ability to reproduce key streamflow statistics. As in the Southeast study, a nonlinear spatial interpolation of daily streamflow using flow duration curves is found to be a method with the best predictive accuracy. Comparisons with previous work in Iowa show that the accuracy of mechanistic models with at-site calibration is substantially degraded in the ungaged framework.

  18. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients.

    PubMed

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-09-29

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients.

  19. Gene network inherent in genomic big data improves the accuracy of prognostic prediction for cancer patients

    PubMed Central

    Kim, Yun Hak; Jeong, Dae Cheon; Pak, Kyoungjune; Goh, Tae Sik; Lee, Chi-Seung; Han, Myoung-Eun; Kim, Ji-Young; Liangwen, Liu; Kim, Chi Dae; Jang, Jeon Yeob; Cha, Wonjae; Oh, Sae-Ock

    2017-01-01

    Accurate prediction of prognosis is critical for therapeutic decisions regarding cancer patients. Many previously developed prognostic scoring systems have limitations in reflecting recent progress in the field of cancer biology such as microarray, next-generation sequencing, and signaling pathways. To develop a new prognostic scoring system for cancer patients, we used mRNA expression and clinical data in various independent breast cancer cohorts (n=1214) from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and Gene Expression Omnibus (GEO). A new prognostic score that reflects gene network inherent in genomic big data was calculated using Network-Regularized high-dimensional Cox-regression (Net-score). We compared its discriminatory power with those of two previously used statistical methods: stepwise variable selection via univariate Cox regression (Uni-score) and Cox regression via Elastic net (Enet-score). The Net scoring system showed better discriminatory power in prediction of disease-specific survival (DSS) than other statistical methods (p=0 in METABRIC training cohort, p=0.000331, 4.58e-06 in two METABRIC validation cohorts) when accuracy was examined by log-rank test. Notably, comparison of C-index and AUC values in receiver operating characteristic analysis at 5 years showed fewer differences between training and validation cohorts with the Net scoring system than other statistical methods, suggesting minimal overfitting. The Net-based scoring system also successfully predicted prognosis in various independent GEO cohorts with high discriminatory power. In conclusion, the Net-based scoring system showed better discriminative power than previous statistical methods in prognostic prediction for breast cancer patients. This new system will mark a new era in prognosis prediction for cancer patients. PMID:29100405

  20. Specific Features of Executive Dysfunction in Alzheimer-Type Mild Dementia Based on Computerized Cambridge Neuropsychological Test Automated Battery (CANTAB) Test Results.

    PubMed

    Kuzmickienė, Jurgita; Kaubrys, Gintaras

    2016-10-08

    BACKGROUND The primary manifestation of Alzheimer's disease (AD) is decline in memory. Dysexecutive symptoms have tremendous impact on functional activities and quality of life. Data regarding frontal-executive dysfunction in mild AD are controversial. The aim of this study was to assess the presence and specific features of executive dysfunction in mild AD based on Cambridge Neuropsychological Test Automated Battery (CANTAB) results. MATERIAL AND METHODS Fifty newly diagnosed, treatment-naïve, mild, late-onset AD patients (MMSE ≥20, AD group) and 25 control subjects (CG group) were recruited in this prospective, cross-sectional study. The CANTAB tests CRT, SOC, PAL, SWM were used for in-depth cognitive assessment. Comparisons were performed using the t test or Mann-Whitney U test, as appropriate. Correlations were evaluated by Pearson r or Spearman R. Statistical significance was set at p<0.05. RESULTS AD and CG groups did not differ according to age, education, gender, or depression. Few differences were found between groups in the SOC test for performance measures: Mean moves (minimum 3 moves): AD (Rank Sum=2227), CG (Rank Sum=623), p<0.001. However, all SOC test time measures differed significantly between groups: SOC Mean subsequent thinking time (4 moves): AD (Rank Sum=2406), CG (Rank Sum=444), p<0.001. Correlations were weak between executive function (SOC) and episodic/working memory (PAL, SWM) (R=0.01-0.38) or attention/psychomotor speed (CRT) (R=0.02-0.37). CONCLUSIONS Frontal-executive functions are impaired in mild AD patients. Executive dysfunction is highly prominent in time measures, but minimal in performance measures. Executive disorders do not correlate with a decline in episodic and working memory or psychomotor speed in mild AD.

  1. Identification of curriculum content for a renewable energy graduate degree program

    NASA Astrophysics Data System (ADS)

    Haughery, John R.

    There currently exists a disconnect between renewable energy industry workforce needs and academic program proficiencies. This is evidenced by an absence of clear curriculum content on renewable energy graduate program websites. The purpose of this study was to identify a set of curriculum content for graduate degrees in renewable energy. At the conclusion, a clear list of 42 content items was identified and statistically ranked. The content items identified were based on a review of literature from government initiatives, professional society's body of knowledge, and related research studies. Leaders and experts in the field of renewable energy and sustainability were surveyed, using a five-point Likert-Scale model. This allowed each item's importance level to be analyzed and prioritized based on non-parametric statistical analysis methods. The study found seven competency items to be very important , 30 to be important, and five to be somewhat important. The results were also appropriate for use as a framework in developing or improving renewable energy graduate programs.

  2. A Comparative Approach for Ranking Contaminated Sites Based on the Risk Assessment Paradigm Using Fuzzy PROMETHEE

    NASA Astrophysics Data System (ADS)

    Zhang, Kejiang; Kluck, Cheryl; Achari, Gopal

    2009-11-01

    A ranking system for contaminated sites based on comparative risk methodology using fuzzy Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE) was developed in this article. It combines the concepts of fuzzy sets to represent uncertain site information with the PROMETHEE, a subgroup of Multi-Criteria Decision Making (MCDM) methods. Criteria are identified based on a combination of the attributes (toxicity, exposure, and receptors) associated with the potential human health and ecological risks posed by contaminated sites, chemical properties, site geology and hydrogeology and contaminant transport phenomena. Original site data are directly used avoiding the subjective assignment of scores to site attributes. When the input data are numeric and crisp the PROMETHEE method can be used. The Fuzzy PROMETHEE method is preferred when substantial uncertainties and subjectivities exist in site information. The PROMETHEE and fuzzy PROMETHEE methods are both used in this research to compare the sites. The case study shows that this methodology provides reasonable results.

  3. A comparative approach for ranking contaminated sites based on the risk assessment paradigm using fuzzy PROMETHEE.

    PubMed

    Zhang, Kejiang; Kluck, Cheryl; Achari, Gopal

    2009-11-01

    A ranking system for contaminated sites based on comparative risk methodology using fuzzy Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE) was developed in this article. It combines the concepts of fuzzy sets to represent uncertain site information with the PROMETHEE, a subgroup of Multi-Criteria Decision Making (MCDM) methods. Criteria are identified based on a combination of the attributes (toxicity, exposure, and receptors) associated with the potential human health and ecological risks posed by contaminated sites, chemical properties, site geology and hydrogeology and contaminant transport phenomena. Original site data are directly used avoiding the subjective assignment of scores to site attributes. When the input data are numeric and crisp the PROMETHEE method can be used. The Fuzzy PROMETHEE method is preferred when substantial uncertainties and subjectivities exist in site information. The PROMETHEE and fuzzy PROMETHEE methods are both used in this research to compare the sites. The case study shows that this methodology provides reasonable results.

  4. Alternative methods for determining shrinkage in restorative resin composites.

    PubMed

    de Melo Monteiro, Gabriela Queiroz; Montes, Marcos Antonio Japiassú Resende; Rolim, Tiago Vieira; de Oliveira Mota, Cláudia Cristina Brainer; de Barros Correia Kyotoku, Bernardo; Gomes, Anderson Stevens Leônidas; de Freitas, Anderson Zanardi

    2011-08-01

    The purpose of this study was to evaluate polymerization shrinkage of resin composites using a coordinate measuring machine, optical coherence tomography and a more widely known method, such as Archimedes Principle. Two null hypothesis were tested: (1) there are no differences between the materials tested; (2) there are no differences between the methods used for polymerization shrinkage measurements. Polymerization shrinkage of seven resin-based dental composites (Filtek Z250™, Filtek Z350™, Filtek P90™/3M ESPE, Esthet-X™, TPH Spectrum™/Dentsply 4 Seasons™, Tetric Ceram™/Ivoclar-Vivadent) was measured. For coordinate measuring machine measurements, composites were applied to a cylindrical Teflon mold (7 mm × 2 mm), polymerized and removed from the mold. The difference between the volume of the mold and the volume of the specimen was calculated as a percentage. Optical coherence tomography was also used for linear shrinkage evaluations. The thickness of the specimens was measured before and after photoactivation. Polymerization shrinkage was also measured using Archimedes Principle of buoyancy (n=5). Statistical analysis of the data was performed with ANOVA and the Games-Howell test. The results show that polymerization shrinkage values vary with the method used. Despite numerical differences the ranking of the resins was very similar with Filtek P90 presenting the lowest shrinkage values. Because of the variations in the results, reported values could only be used to compare materials within the same method. However, it is possible rank composites for polymerization shrinkage and to relate these data from different test methods. Independently of the method used, reduced polymerization shrinkage was found for silorane resin-based composite. Copyright © 2011 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  5. Augmenting the Deliberative Method for Ranking Risks.

    PubMed

    Susel, Irving; Lasley, Trace; Montezemolo, Mark; Piper, Joel

    2016-01-01

    The Department of Homeland Security (DHS) characterized and prioritized the physical cross-border threats and hazards to the nation stemming from terrorism, market-driven illicit flows of people and goods (illegal immigration, narcotics, funds, counterfeits, and weaponry), and other nonmarket concerns (movement of diseases, pests, and invasive species). These threats and hazards pose a wide diversity of consequences with very different combinations of magnitudes and likelihoods, making it very challenging to prioritize them. This article presents the approach that was used at DHS to arrive at a consensus regarding the threats and hazards that stand out from the rest based on the overall risk they pose. Due to time constraints for the decision analysis, it was not feasible to apply multiattribute methodologies like multiattribute utility theory or the analytic hierarchy process. Using a holistic approach was considered, such as the deliberative method for ranking risks first published in this journal. However, an ordinal ranking alone does not indicate relative or absolute magnitude differences among the risks. Therefore, the use of the deliberative method for ranking risks is not sufficient for deciding whether there is a material difference between the top-ranked and bottom-ranked risks, let alone deciding what the stand-out risks are. To address this limitation of ordinal rankings, the deliberative method for ranking risks was augmented by adding an additional step to transform the ordinal ranking into a ratio scale ranking. This additional step enabled the selection of stand-out risks to help prioritize further analysis. © 2015 Society for Risk Analysis.

  6. Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

    PubMed

    Schlenker, Evelyn

    2016-01-01

    This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.

  7. Fast Modeling of Binding Affinities by Means of Superposing Significant Interaction Rules (SSIR) Method

    PubMed Central

    Besalú, Emili

    2016-01-01

    The Superposing Significant Interaction Rules (SSIR) method is described. It is a general combinatorial and symbolic procedure able to rank compounds belonging to combinatorial analogue series. The procedure generates structure-activity relationship (SAR) models and also serves as an inverse SAR tool. The method is fast and can deal with large databases. SSIR operates from statistical significances calculated from the available library of compounds and according to the previously attached molecular labels of interest or non-interest. The required symbolic codification allows dealing with almost any combinatorial data set, even in a confidential manner, if desired. The application example categorizes molecules as binding or non-binding, and consensus ranking SAR models are generated from training and two distinct cross-validation methods: leave-one-out and balanced leave-two-out (BL2O), the latter being suited for the treatment of binary properties. PMID:27240346

  8. Testing Multivariate Adaptive Regression Splines (MARS) as a Method of Land Cover Classification of TERRA-ASTER Satellite Images.

    PubMed

    Quirós, Elia; Felicísimo, Angel M; Cuartero, Aurora

    2009-01-01

    This work proposes a new method to classify multi-spectral satellite images based on multivariate adaptive regression splines (MARS) and compares this classification system with the more common parallelepiped and maximum likelihood (ML) methods. We apply the classification methods to the land cover classification of a test zone located in southwestern Spain. The basis of the MARS method and its associated procedures are explained in detail, and the area under the ROC curve (AUC) is compared for the three methods. The results show that the MARS method provides better results than the parallelepiped method in all cases, and it provides better results than the maximum likelihood method in 13 cases out of 17. These results demonstrate that the MARS method can be used in isolation or in combination with other methods to improve the accuracy of soil cover classification. The improvement is statistically significant according to the Wilcoxon signed rank test.

  9. A statistical approach to bioclimatic trend detection in the airborne pollen records of Catalonia (NE Spain)

    NASA Astrophysics Data System (ADS)

    Fernández-Llamazares, Álvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción

    2014-04-01

    Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.

  10. A novel method for identifying disease associated protein complexes based on functional similarity protein complex networks.

    PubMed

    Le, Duc-Hau

    2015-01-01

    Protein complexes formed by non-covalent interaction among proteins play important roles in cellular functions. Computational and purification methods have been used to identify many protein complexes and their cellular functions. However, their roles in terms of causing disease have not been well discovered yet. There exist only a few studies for the identification of disease-associated protein complexes. However, they mostly utilize complicated heterogeneous networks which are constructed based on an out-of-date database of phenotype similarity network collected from literature. In addition, they only apply for diseases for which tissue-specific data exist. In this study, we propose a method to identify novel disease-protein complex associations. First, we introduce a framework to construct functional similarity protein complex networks where two protein complexes are functionally connected by either shared protein elements, shared annotating GO terms or based on protein interactions between elements in each protein complex. Second, we propose a simple but effective neighborhood-based algorithm, which yields a local similarity measure, to rank disease candidate protein complexes. Comparing the predictive performance of our proposed algorithm with that of two state-of-the-art network propagation algorithms including one we used in our previous study, we found that it performed statistically significantly better than that of these two algorithms for all the constructed functional similarity protein complex networks. In addition, it ran about 32 times faster than these two algorithms. Moreover, our proposed method always achieved high performance in terms of AUC values irrespective of the ways to construct the functional similarity protein complex networks and the used algorithms. The performance of our method was also higher than that reported in some existing methods which were based on complicated heterogeneous networks. Finally, we also tested our method with prostate cancer and selected the top 100 highly ranked candidate protein complexes. Interestingly, 69 of them were evidenced since at least one of their protein elements are known to be associated with prostate cancer. Our proposed method, including the framework to construct functional similarity protein complex networks and the neighborhood-based algorithm on these networks, could be used for identification of novel disease-protein complex associations.

  11. Integrated Low-Rank-Based Discriminative Feature Learning for Recognition.

    PubMed

    Zhou, Pan; Lin, Zhouchen; Zhang, Chao

    2016-05-01

    Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature learning methods have been proposed and have achieved great success in many applications. However, these methods perform feature learning and subsequent classification in two separate steps, which may not be optimal for recognition tasks. In this paper, we present a supervised low-rank-based approach for learning discriminative features. By integrating latent low-rank representation (LatLRR) with a ridge regression-based classifier, our approach combines feature learning with classification, so that the regulated classification error is minimized. In this way, the extracted features are more discriminative for the recognition tasks. Our approach benefits from a recent discovery on the closed-form solutions to noiseless LatLRR. When there is noise, a robust Principal Component Analysis (PCA)-based denoising step can be added as preprocessing. When the scale of a problem is large, we utilize a fast randomized algorithm to speed up the computation of robust PCA. Extensive experimental results demonstrate the effectiveness and robustness of our method.

  12. “URM Candidates Are Encouraged to Apply”: A National Study to Identify Effective Strategies to Enhance Racial and Ethnic Faculty Diversity in Academic Departments of Medicine

    PubMed Central

    Peek, Monica E.; Kim, Karen E.; Johnson, Julie K.; Vela, Monica B.

    2016-01-01

    Purpose There is little evidence regarding which factors and strategies are associated with high proportions of underrepresented minority (URM) faculty in academic medicine. The authors conducted a national study of U.S. academic medicine departments to better understand the challenges, successful strategies, and predictive factors for enhancing racial and ethnic diversity among faculty (i.e., physicians with an academic position or rank). Method This was a mixed-methods study using quantitative and qualitative methods. The authors conducted a cross-sectional study of eligible departments of medicine in 125 accredited U.S. medical schools, dichotomized into low-URM (bottom 50%) versus high-URM rank (top 50%). They used t tests and chi-squared tests to compare departments by geographic region, academic school rank, city type, and composite measures of “diversity best practices.” The authors also conducted semistructured in-depth interviews with a subsample from the highest-and lowest-quartile medical schools in terms of URM rank. Results Eighty-two medical schools responded (66%). Geographic region and academic rank were statistically associated with URM rank, but not city type or composite measures of diversity best practices. Key themes emerged from interviews regarding successful strategies for URM faculty recruitment and retention including institutional leadership, the use of human capital and social relationships and strategic deployment of institutional resources. Conclusions Departments of medicine with high proportions of URM faculty employ a number of successful strategies and programs for recruitment and retention. More research is warranted to identify new successful strategies and to determine the impact of specific strategies on establishing and maintaining workforce diversity. PMID:23348090

  13. Efficient Tensor Completion for Color Image and Video Recovery: Low-Rank Tensor Train.

    PubMed

    Bengua, Johann A; Phien, Ho N; Tuan, Hoang Duong; Do, Minh N

    2017-05-01

    This paper proposes a novel approach to tensor completion, which recovers missing entries of data represented by tensors. The approach is based on the tensor train (TT) rank, which is able to capture hidden information from tensors thanks to its definition from a well-balanced matricization scheme. Accordingly, new optimization formulations for tensor completion are proposed as well as two new algorithms for their solution. The first one called simple low-rank tensor completion via TT (SiLRTC-TT) is intimately related to minimizing a nuclear norm based on TT rank. The second one is from a multilinear matrix factorization model to approximate the TT rank of a tensor, and is called tensor completion by parallel matrix factorization via TT (TMac-TT). A tensor augmentation scheme of transforming a low-order tensor to higher orders is also proposed to enhance the effectiveness of SiLRTC-TT and TMac-TT. Simulation results for color image and video recovery show the clear advantage of our method over all other methods.

  14. A new physical method to assess handle properties of fabrics made from wood-based fibers

    NASA Astrophysics Data System (ADS)

    Abu-Rous, M.; Liftinger, E.; Innerlohinger, J.; Malengier, B.; Vasile, S.

    2017-10-01

    In this work, the handfeel of fabrics made of wood-based fibers such as viscose, modal and Lyocell was investigated in relation to cotton fabrics applying the Tissue Softness Analyzer (TSA) method in comparison to other classical methods. Two different construction groups of textile were investigated. The validity of TSA in assessing textile softness of these constructions was tested. TSA results were compared to human hand evaluation as well as to classical physical measurements like drape coefficient, ring pull-through and Handle-o-meter, as well as a newer device, the Fabric Touch Tester (FTT). Physical methods as well as human hand assessments mostly agreed on the softest and smoothest range, but showed different rankings in the harder/rougher side fabrics. TSA ranking of softness and smoothness corresponded to the rankings by other physical methods as well as with human hand feel for the basic textile constructions.

  15. Search by photo methodology for signature properties assessment by human observers

    NASA Astrophysics Data System (ADS)

    Selj, Gorm K.; Heinrich, Daniela H.

    2015-05-01

    Reliable, low-cost and simple methods for assessment of signature properties for military purposes are very important. In this paper we present such an approach that uses human observers in a search by photo assessment of signature properties of generic test targets. The method was carried out by logging a large number of detection times of targets recorded in relevant terrain backgrounds. The detection times were harvested by using human observers searching for targets in scene images shown by a high definition pc screen. All targets were identically located in each "search image", allowing relative comparisons (and not just rank by order) of targets. To avoid biased detections, each observer only searched for one target per scene. Statistical analyses were carried out for the detection times data. Analysis of variance was chosen if detection times distribution associated with all targets satisfied normality, and non-parametric tests, such as Wilcoxon's rank test, if otherwise. The new methodology allows assessment of signature properties in a reproducible, rapid and reliable setting. Such assessments are very complex as they must sort out what is of relevance in a signature test, but not loose information of value. We believe that choosing detection times as the primary variable for a comparison of signature properties, allows a careful and necessary inspection of observer data as the variable is continuous rather than discrete. Our method thus stands in opposition to approaches based on detections by subsequent, stepwise reductions in distance to target, or based on probability of detection.

  16. Current Interview Trail Metrics in the Otolaryngology Match.

    PubMed

    Cabrera-Muffly, Cristina; Chang, C W David; Puscas, Liana

    2017-06-01

    Objectives To identify how applicants to otolaryngology residency determine how to apply to, interview with, and rank programs on the interview trail and to determine the extent of the financial burden of the otolaryngology interview trail. Study Design Web-based survey distributed in March and April 2016. Setting Otolaryngology residency applicants throughout the United States. Subjects and Methods Applicants to otolaryngology residency during the 2016 match cycle and current otolaryngology residents were surveyed. Results Median number of applications, interview offers, interviews attended, and programs ranked was not different during the 2016 match and the previous 5 match years. The most important factor affecting the number of applications was the need to apply widely to ensure sufficient interview offers. The most common reason for declining an interview offer was scheduling conflict. Applicants during the 2016 match spent a median of $5400 applying and interviewing for otolaryngology residency. Conclusions Median number of applications, interview offers, interviews attended, and programs ranked has not changed. The most cited reason for applying to many programs was to increase the chances of matching, but this is not statistically likely to increase match success. We advocate for continued attempts to make the otolaryngology match process more transparent for both applicants and resident selection committees, but recognize that applicants are likely to continue to overapply for otolaryngology residency positions.

  17. Geographic profiling as a novel spatial tool for targeting infectious disease control

    PubMed Central

    2011-01-01

    Background Geographic profiling is a statistical tool originally developed in criminology to prioritise large lists of suspects in cases of serial crime. Here, we use two data sets - one historical and one modern - to show how it can be used to locate the sources of infectious disease. Results First, we re-analyse data from a classic epidemiological study, the 1854 London cholera outbreak. Using 321 disease sites as input, we evaluate the locations of 13 neighbourhood water pumps. The Broad Street pump - the outbreak's source- ranks first, situated in the top 0.2% of the geoprofile. We extend our study with an analysis of reported malaria cases in Cairo, Egypt, using 139 disease case locations to rank 59 mosquitogenic local water sources, seven of which tested positive for the vector Anopheles sergentii. Geographic profiling ranks six of these seven sites in positions 1-6, all in the top 2% of the geoprofile. In both analyses the method outperformed other measures of spatial central tendency. Conclusions We suggest that geographic profiling could form a useful component of integrated control strategies relating to a wide variety of infectious diseases, since evidence-based targeting of interventions is more efficient, environmentally friendly and cost-effective than untargeted intervention. PMID:21592339

  18. Machine learning for the New York City power grid.

    PubMed

    Rudin, Cynthia; Waltz, David; Anderson, Roger N; Boulanger, Albert; Salleb-Aouissi, Ansaf; Chow, Maggie; Dutta, Haimonti; Gross, Philip N; Huang, Bert; Ierome, Steve; Isaac, Delfina F; Kressner, Arthur; Passonneau, Rebecca J; Radeva, Axinia; Wu, Leon

    2012-02-01

    Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk of failures for components and systems. These models can be used directly by power companies to assist with prioritization of maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint, terminator, and transformer rankings, 3) feeder Mean Time Between Failure (MTBF) estimates, and 4) manhole events vulnerability rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or realtime, incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to assist in maintaining New York City’s electrical grid.

  19. Risk Driven Outcome-Based Command and Control (C2) Assessment

    DTIC Science & Technology

    2000-01-01

    shaping the risk ranking scores into more interpretable and statistically sound risk measures. Regression analysis was applied to determine what...Architecture Framework Implementation, AFCEA Coursebook 503J, February 8-11, 2000, San Diego, California. [Morgan and Henrion, 1990] M. Granger Morgan and

  20. Analyses of the Microbial Diversity across the Human Microbiome

    PubMed Central

    Li, Kelvin; Bihan, Monika; Yooseph, Shibu; Methé, Barbara A.

    2012-01-01

    Analysis of human body microbial diversity is fundamental to understanding community structure, biology and ecology. The National Institutes of Health Human Microbiome Project (HMP) has provided an unprecedented opportunity to examine microbial diversity within and across body habitats and individuals through pyrosequencing-based profiling of 16 S rRNA gene sequences (16 S) from habits of the oral, skin, distal gut, and vaginal body regions from over 200 healthy individuals enabling the application of statistical techniques. In this study, two approaches were applied to elucidate the nature and extent of human microbiome diversity. First, bootstrap and parametric curve fitting techniques were evaluated to estimate the maximum number of unique taxa, Smax, and taxa discovery rate for habitats across individuals. Next, our results demonstrated that the variation of diversity within low abundant taxa across habitats and individuals was not sufficiently quantified with standard ecological diversity indices. This impact from low abundant taxa motivated us to introduce a novel rank-based diversity measure, the Tail statistic, (“τ”), based on the standard deviation of the rank abundance curve if made symmetric by reflection around the most abundant taxon. Due to τ’s greater sensitivity to low abundant taxa, its application to diversity estimation of taxonomic units using taxonomic dependent and independent methods revealed a greater range of values recovered between individuals versus body habitats, and different patterns of diversity within habitats. The greatest range of τ values within and across individuals was found in stool, which also exhibited the most undiscovered taxa. Oral and skin habitats revealed variable diversity patterns, while vaginal habitats were consistently the least diverse. Collectively, these results demonstrate the importance, and motivate the introduction, of several visualization and analysis methods tuned specifically for next-generation sequence data, further revealing that low abundant taxa serve as an important reservoir of genetic diversity in the human microbiome. PMID:22719823

  1. An analysis of an optimal selection process for characteristics and technical performance of baseball pitchers.

    PubMed

    Lin, Wen-Bin; Tung, I-Wu; Chen, Mei-Jung; Chen, Mei-Yen

    2011-08-01

    Selection of a qualified pitcher has relied previously on qualitative indices; here, both quantitative and qualitative indices including pitching statistics, defense, mental skills, experience, and managers' recognition were collected, and an analytic hierarchy process was used to rank baseball pitchers. The participants were 8 experts who ranked characteristics and statistics of 15 baseball pitchers who comprised the first round of potential representatives for the Chinese Taipei National Baseball team. The results indicated a selection rate that was 91% consistent with the official national team roster, as 11 pitchers with the highest scores who were recommended as optimal choices to be official members of the Chinese Tai-pei National Baseball team actually participated in the 2009 Baseball World Cup. An analytic hierarchy can aid in selection of qualified pitchers, depending on situational and practical needs; the method could be extended to other sports and team-selection situations.

  2. Selection of the open pit mining cut-off grade strategy under price uncertainty using a risk based multi-criteria ranking system / Wybór strategii określania warunku opłacalności wydobycia w kopalniach odkrywkowych w warunkach niepewności cen w oparciu o wielokryterialny system rankingowy z uwzględnieniem czynników ryzyka

    NASA Astrophysics Data System (ADS)

    Azimi, Yousue; Osanloo, Montza; Esfahanipour, Akbar

    2012-12-01

    Cut-off Grade Strategy (COGS) is a concept that directly influences the financial, technical, economic, environmental, and legal issues in relation to exploitation of a mineral resource. A decision making system is proposed to select the best technically feasible COGS under price uncertainty. In the proposed system both the conventional discounted cash flow and modern simulation based real option valuations are used to evaluate the alternative strategies. Then the conventional expected value criterion and a multiple criteria ranking system were used to rank the strategies based on the two valuation methods. In the multiple criteria ranking system besides the expected value other stochastic orders expressing abilities of strategies in producing extra profits, minimizing losses and achieving the predefined goals of the exploitation strategy are considered. Finally, the best strategy is selected based on the overall average rank of strategies through all ranking systems. The proposed system was examined using the data of Sungun Copper Mine. To assess the merits of the alternatives better, ranking process was done at both high (prevailing economic condition) and low price conditions. Ranking results revealed that at different price conditions and valuation methods, different results would be obtained. It is concluded that these differences are due to the different behavior of the embedded option to close the mine early, which is more likely to be exercised under low price condition rather than high price condition. The proposed system would enhance the quality of decision making process by providing a more informative and certain platform for project evaluation.

  3. ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data

    PubMed Central

    McKinney, Brett A.; White, Bill C.; Grill, Diane E.; Li, Peter W.; Kennedy, Richard B.; Poland, Gregory A.; Oberg, Ann L.

    2013-01-01

    Relief-F is a nonparametric, nearest-neighbor machine learning method that has been successfully used to identify relevant variables that may interact in complex multivariate models to explain phenotypic variation. While several tools have been developed for assessing differential expression in sequence-based transcriptomics, the detection of statistical interactions between transcripts has received less attention in the area of RNA-seq analysis. We describe a new extension and assessment of Relief-F for feature selection in RNA-seq data. The ReliefSeq implementation adapts the number of nearest neighbors (k) for each gene to optimize the Relief-F test statistics (importance scores) for finding both main effects and interactions. We compare this gene-wise adaptive-k (gwak) Relief-F method with standard RNA-seq feature selection tools, such as DESeq and edgeR, and with the popular machine learning method Random Forests. We demonstrate performance on a panel of simulated data that have a range of distributional properties reflected in real mRNA-seq data including multiple transcripts with varying sizes of main effects and interaction effects. For simulated main effects, gwak-Relief-F feature selection performs comparably to standard tools DESeq and edgeR for ranking relevant transcripts. For gene-gene interactions, gwak-Relief-F outperforms all comparison methods at ranking relevant genes in all but the highest fold change/highest signal situations where it performs similarly. The gwak-Relief-F algorithm outperforms Random Forests for detecting relevant genes in all simulation experiments. In addition, Relief-F is comparable to the other methods based on computational time. We also apply ReliefSeq to an RNA-Seq study of smallpox vaccine to identify gene expression changes between vaccinia virus-stimulated and unstimulated samples. ReliefSeq is an attractive tool for inclusion in the suite of tools used for analysis of mRNA-Seq data; it has power to detect both main effects and interaction effects. Software Availability: http://insilico.utulsa.edu/ReliefSeq.php. PMID:24339943

  4. Important Literature in Endocrinology: Citation Analysis and Historial Methodology.

    ERIC Educational Resources Information Center

    Hurt, C. D.

    1982-01-01

    Results of a study comparing two approaches to the identification of important literature in endocrinology reveals that association between ranking of cited items using the two methods is not statistically significant and use of citation or historical analysis alone will not result in same set of literature. Forty-two sources are appended. (EJS)

  5. Selection of Marine Corps Drill Instructors

    DTIC Science & Technology

    1980-03-01

    8 4. ., ey- Construction and Cross-Validation Statistics for Drill Instructor School Performance Success Keys...Race, and School Attrition ........... ............................. ... 15 13. Key- Construction and Cross-Validation Statistics for Drill... constructed form, the Alternation Ranking of Series Drill Instruc- tors. In this form, DIs in a Series are ranked from highest to lowest in terms of their

  6. Potential Environmental Justice (EJ) areas in Region 2 based on 2000 Census [EPA.EJAREAS_2000

    EPA Pesticide Factsheets

    Potential Environmental Justice (EJ) areas in Region 2 . This dataset was derived from 2000 census data and based on the criteria setforth in the Region 2 Interim Environmental Justice Policy. The two criteria for Region 2's EJ demographic analysis are percent poverty and percent minority. The percent minority and percent poverty numbers for each blockgroup are compared to the benchmark value for the state. Census blockgroups with percent poverty or percent minority higher than the state threshold are considered potential EJ areas. The cutoffs for each state were derived by using the statistical method - cluster analysis.Cluster analysis was chosen as the most objective way of evaluating the demographic data and determining cutoff values for minority and low income. With cluster analysis, data are divided into two distinct groups (e.g., minority and non-minority, and low income and non-low income). Cluster analysis examines natural breaks of the data. Separate analyses were conducted for minority and low income, respectively, for each State. All census block groups within a State were ranked in descending order according to the demographic factor under evaluation. This resulted in a ranking for percent minority by block group and a separate ranking for percent low income by block group. An iterative process was employed where the data were (1) split into two groups; (2) the means for each of the two groups were calculated; (3) the difference between the

  7. An Investigation of the Variety and Complexity of Statistical Methods Used in Current Internal Medicine Literature.

    PubMed

    Narayanan, Roshni; Nugent, Rebecca; Nugent, Kenneth

    2015-10-01

    Accreditation Council for Graduate Medical Education guidelines require internal medicine residents to develop skills in the interpretation of medical literature and to understand the principles of research. A necessary component is the ability to understand the statistical methods used and their results, material that is not an in-depth focus of most medical school curricula and residency programs. Given the breadth and depth of the current medical literature and an increasing emphasis on complex, sophisticated statistical analyses, the statistical foundation and education necessary for residents are uncertain. We reviewed the statistical methods and terms used in 49 articles discussed at the journal club in the Department of Internal Medicine residency program at Texas Tech University between January 1, 2013 and June 30, 2013. We collected information on the study type and on the statistical methods used for summarizing and comparing samples, determining the relations between independent variables and dependent variables, and estimating models. We then identified the typical statistics education level at which each term or method is learned. A total of 14 articles came from the Journal of the American Medical Association Internal Medicine, 11 from the New England Journal of Medicine, 6 from the Annals of Internal Medicine, 5 from the Journal of the American Medical Association, and 13 from other journals. Twenty reported randomized controlled trials. Summary statistics included mean values (39 articles), category counts (38), and medians (28). Group comparisons were based on t tests (14 articles), χ2 tests (21), and nonparametric ranking tests (10). The relations between dependent and independent variables were analyzed with simple regression (6 articles), multivariate regression (11), and logistic regression (8). Nine studies reported odds ratios with 95% confidence intervals, and seven analyzed test performance using sensitivity and specificity calculations. These papers used 128 statistical terms and context-defined concepts, including some from data analysis (56), epidemiology-biostatistics (31), modeling (24), data collection (12), and meta-analysis (5). Ten different software programs were used in these articles. Based on usual undergraduate and graduate statistics curricula, 64.3% of the concepts and methods used in these papers required at least a master's degree-level statistics education. The interpretation of the current medical literature can require an extensive background in statistical methods at an education level exceeding the material and resources provided to most medical students and residents. Given the complexity and time pressure of medical education, these deficiencies will be hard to correct, but this project can serve as a basis for developing a curriculum in study design and statistical methods needed by physicians-in-training.

  8. Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths

    PubMed Central

    Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T

    2014-01-01

    How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method. PMID:26146586

  9. Ranking of Prokaryotic Genomes Based on Maximization of Sortedness of Gene Lengths.

    PubMed

    Bolshoy, A; Salih, B; Cohen, I; Tatarinova, T

    How variations of gene lengths (some genes become longer than their predecessors, while other genes become shorter and the sizes of these factions are randomly different from organism to organism) depend on organismal evolution and adaptation is still an open question. We propose to rank the genomes according to lengths of their genes, and then find association between the genome rank and variousproperties, such as growth temperature, nucleotide composition, and pathogenicity. This approach reveals evolutionary driving factors. The main purpose of this study is to test effectiveness and robustness of several ranking methods. The selected method of evaluation is measuring of overall sortedness of the data. We have demonstrated that all considered methods give consistent results and Bubble Sort and Simulated Annealing achieve the highest sortedness. Also, Bubble Sort is considerably faster than the Simulated Annealing method.

  10. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights

    PubMed Central

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-01

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher’s exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO’s usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher. PMID:26750448

  11. LEGO: a novel method for gene set over-representation analysis by incorporating network-based gene weights.

    PubMed

    Dong, Xinran; Hao, Yun; Wang, Xiao; Tian, Weidong

    2016-01-11

    Pathway or gene set over-representation analysis (ORA) has become a routine task in functional genomics studies. However, currently widely used ORA tools employ statistical methods such as Fisher's exact test that reduce a pathway into a list of genes, ignoring the constitutive functional non-equivalent roles of genes and the complex gene-gene interactions. Here, we develop a novel method named LEGO (functional Link Enrichment of Gene Ontology or gene sets) that takes into consideration these two types of information by incorporating network-based gene weights in ORA analysis. In three benchmarks, LEGO achieves better performance than Fisher and three other network-based methods. To further evaluate LEGO's usefulness, we compare LEGO with five gene expression-based and three pathway topology-based methods using a benchmark of 34 disease gene expression datasets compiled by a recent publication, and show that LEGO is among the top-ranked methods in terms of both sensitivity and prioritization for detecting target KEGG pathways. In addition, we develop a cluster-and-filter approach to reduce the redundancy among the enriched gene sets, making the results more interpretable to biologists. Finally, we apply LEGO to two lists of autism genes, and identify relevant gene sets to autism that could not be found by Fisher.

  12. Recursive feature selection with significant variables of support vectors.

    PubMed

    Tsai, Chen-An; Huang, Chien-Hsun; Chang, Ching-Wei; Chen, Chun-Houh

    2012-01-01

    The development of DNA microarray makes researchers screen thousands of genes simultaneously and it also helps determine high- and low-expression level genes in normal and disease tissues. Selecting relevant genes for cancer classification is an important issue. Most of the gene selection methods use univariate ranking criteria and arbitrarily choose a threshold to choose genes. However, the parameter setting may not be compatible to the selected classification algorithms. In this paper, we propose a new gene selection method (SVM-t) based on the use of t-statistics embedded in support vector machine. We compared the performance to two similar SVM-based methods: SVM recursive feature elimination (SVMRFE) and recursive support vector machine (RSVM). The three methods were compared based on extensive simulation experiments and analyses of two published microarray datasets. In the simulation experiments, we found that the proposed method is more robust in selecting informative genes than SVMRFE and RSVM and capable to attain good classification performance when the variations of informative and noninformative genes are different. In the analysis of two microarray datasets, the proposed method yields better performance in identifying fewer genes with good prediction accuracy, compared to SVMRFE and RSVM.

  13. Global aesthetic surgery statistics: a closer look.

    PubMed

    Heidekrueger, Paul I; Juran, S; Ehrl, D; Aung, T; Tanna, N; Broer, P Niclas

    2017-08-01

    Obtaining quality global statistics about surgical procedures remains an important yet challenging task. The International Society of Aesthetic Plastic Surgery (ISAPS) reports the total number of surgical and non-surgical procedures performed worldwide on a yearly basis. While providing valuable insight, ISAPS' statistics leave two important factors unaccounted for: (1) the underlying base population, and (2) the number of surgeons performing the procedures. Statistics of the published ISAPS' 'International Survey on Aesthetic/Cosmetic Surgery' were analysed by country, taking into account the underlying national base population according to the official United Nations population estimates. Further, the number of surgeons per country was used to calculate the number of surgeries performed per surgeon. In 2014, based on ISAPS statistics, national surgical procedures ranked in the following order: 1st USA, 2nd Brazil, 3rd South Korea, 4th Mexico, 5th Japan, 6th Germany, 7th Colombia, and 8th France. When considering the size of the underlying national populations, the demand for surgical procedures per 100,000 people changes the overall ranking substantially. It was also found that the rate of surgical procedures per surgeon shows great variation between the responding countries. While the US and Brazil are often quoted as the countries with the highest demand for plastic surgery, according to the presented analysis, other countries surpass these countries in surgical procedures per capita. While data acquisition and quality should be improved in the future, valuable insight regarding the demand for surgical procedures can be gained by taking specific demographic and geographic factors into consideration.

  14. SU-F-J-96: Comparison of Frame-Based and Mutual Information Registration Techniques for CT and MR Image Sets

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Popple, R; Bredel, M; Brezovich, I

    Purpose: To compare the accuracy of CT-MR registration using a mutual information method with registration using a frame-based localizer box. Methods: Ten patients having the Leksell head frame and scanned with a modality specific localizer box were imported into the treatment planning system. The fiducial rods of the localizer box were contoured on both the MR and CT scans. The skull was contoured on the CT images. The MR and CT images were registered by two methods. The frame-based method used the transformation that minimized the mean square distance of the centroids of the contours of the fiducial rods frommore » a mathematical model of the localizer. The mutual information method used automated image registration tools in the TPS and was restricted to a volume-of-interest defined by the skull contours with a 5 mm margin. For each case, the two registrations were adjusted by two evaluation teams, each comprised of an experienced radiation oncologist and neurosurgeon, to optimize alignment in the region of the brainstem. The teams were blinded to the registration method. Results: The mean adjustment was 0.4 mm (range 0 to 2 mm) and 0.2 mm (range 0 to 1 mm) for the frame and mutual information methods, respectively. The median difference between the frame and mutual information registrations was 0.3 mm, but was not statistically significant using the Wilcoxon signed rank test (p=0.37). Conclusion: The difference between frame and mutual information registration techniques was neither statistically significant nor, for most applications, clinically important. These results suggest that mutual information is equivalent to frame-based image registration for radiosurgery. Work is ongoing to add additional evaluators and to assess the differences between evaluators.« less

  15. Ranking and averaging independent component analysis by reproducibility (RAICAR).

    PubMed

    Yang, Zhi; LaConte, Stephen; Weng, Xuchu; Hu, Xiaoping

    2008-06-01

    Independent component analysis (ICA) is a data-driven approach that has exhibited great utility for functional magnetic resonance imaging (fMRI). Standard ICA implementations, however, do not provide the number and relative importance of the resulting components. In addition, ICA algorithms utilizing gradient-based optimization give decompositions that are dependent on initialization values, which can lead to dramatically different results. In this work, a new method, RAICAR (Ranking and Averaging Independent Component Analysis by Reproducibility), is introduced to address these issues for spatial ICA applied to fMRI. RAICAR utilizes repeated ICA realizations and relies on the reproducibility between them to rank and select components. Different realizations are aligned based on correlations, leading to aligned components. Each component is ranked and thresholded based on between-realization correlations. Furthermore, different realizations of each aligned component are selectively averaged to generate the final estimate of the given component. Reliability and accuracy of this method are demonstrated with both simulated and experimental fMRI data. Copyright 2007 Wiley-Liss, Inc.

  16. An Optimization-Based Method for Feature Ranking in Nonlinear Regression Problems.

    PubMed

    Bravi, Luca; Piccialli, Veronica; Sciandrone, Marco

    2017-04-01

    In this paper, we consider the feature ranking problem, where, given a set of training instances, the task is to associate a score with the features in order to assess their relevance. Feature ranking is a very important tool for decision support systems, and may be used as an auxiliary step of feature selection to reduce the high dimensionality of real-world data. We focus on regression problems by assuming that the process underlying the generated data can be approximated by a continuous function (for instance, a feedforward neural network). We formally state the notion of relevance of a feature by introducing a minimum zero-norm inversion problem of a neural network, which is a nonsmooth, constrained optimization problem. We employ a concave approximation of the zero-norm function, and we define a smooth, global optimization problem to be solved in order to assess the relevance of the features. We present the new feature ranking method based on the solution of instances of the global optimization problem depending on the available training data. Computational experiments on both artificial and real data sets are performed, and point out that the proposed feature ranking method is a valid alternative to existing methods in terms of effectiveness. The obtained results also show that the method is costly in terms of CPU time, and this may be a limitation in the solution of large-dimensional problems.

  17. Learning to rank using user clicks and visual features for image retrieval.

    PubMed

    Yu, Jun; Tao, Dacheng; Wang, Meng; Rui, Yong

    2015-04-01

    The inconsistency between textual features and visual contents can cause poor image search results. To solve this problem, click features, which are more reliable than textual information in justifying the relevance between a query and clicked images, are adopted in image ranking model. However, the existing ranking model cannot integrate visual features, which are efficient in refining the click-based search results. In this paper, we propose a novel ranking model based on the learning to rank framework. Visual features and click features are simultaneously utilized to obtain the ranking model. Specifically, the proposed approach is based on large margin structured output learning and the visual consistency is integrated with the click features through a hypergraph regularizer term. In accordance with the fast alternating linearization method, we design a novel algorithm to optimize the objective function. This algorithm alternately minimizes two different approximations of the original objective function by keeping one function unchanged and linearizing the other. We conduct experiments on a large-scale dataset collected from the Microsoft Bing image search engine, and the results demonstrate that the proposed learning to rank models based on visual features and user clicks outperforms state-of-the-art algorithms.

  18. Comparisons of methods for determining dominance rank in male and female prairie voles (Microtus ochrogastor)

    USGS Publications Warehouse

    Lanctot, Richard B.; Best, Louis B.

    2000-01-01

    Dominance ranks in male and female prairie voles (Microtus ochrogaster) were determined from 6 measurements that mimicked environmental situations that might be encountered by prairie voles in communal groups, including agonistic interactions resulting from competition for food and water and encounters in burrows. Male and female groups of 6 individuals each were tested against one another in pairwise encounters (i.e., dyads) for 5 of the measurements and together as a group in a 6th measurement. Two types of response variables, aggressive behaviors and possession time of a limiting resource, were collected during trials, and those data were used to determine cardinal ranks and principal component ranks for all animals within each group. Cardinal ranks and principal component ranks seldom yielded similar rankings for each animal across measurements. However, dominance measurements that were conducted in similar environmental contexts, regardless of the response variable recorded, ranked animals similarly. Our results suggest that individual dominance measurements assessed situation- or resource-specific responses. Our study demonstrates problems inherent in determining dominance rankings of individuals within groups, including choosing measurements, response variables, and statistical techniques. Researchers should avoid using a single measurement to represent social dominance until they have first demonstrated that a dominance relationship between 2 individuals has been learned (i.e., subsequent interactions show a reduced response rather than an escalation), that this relationship is relatively constant through time, and that the relationship is not context dependent. Such assessments of dominance status between all dyads then can be used to generate dominance rankings within social groups.

  19. A Residual Mass Ballistic Testing Method to Compare Armor Materials or Components (Residual Mass Ballistic Testing Method)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Benjamin Langhorst; Thomas M Lillo; Henry S Chu

    2014-05-01

    A statistics based ballistic test method is presented for use when comparing multiple groups of test articles of unknown relative ballistic perforation resistance. The method is intended to be more efficient than many traditional methods for research and development testing. To establish the validity of the method, it is employed in this study to compare test groups of known relative ballistic performance. Multiple groups of test articles were perforated using consistent projectiles and impact conditions. Test groups were made of rolled homogeneous armor (RHA) plates and differed in thickness. After perforation, each residual projectile was captured behind the target andmore » its mass was measured. The residual masses measured for each test group were analyzed to provide ballistic performance rankings with associated confidence levels. When compared to traditional V50 methods, the residual mass (RM) method was found to require fewer test events and be more tolerant of variations in impact conditions.« less

  20. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  1. Spectral methods in machine learning and new strategies for very large datasets

    PubMed Central

    Belabbas, Mohamed-Ali; Wolfe, Patrick J.

    2009-01-01

    Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these—based on sampling—leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach—based on sorting—provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods. PMID:19129490

  2. Risk management of key issues of FPSO

    NASA Astrophysics Data System (ADS)

    Sun, Liping; Sun, Hai

    2012-12-01

    Risk analysis of key systems have become a growing topic late of because of the development of offshore structures. Equipment failures of offloading system and fire accidents were analyzed based on the floating production, storage and offloading (FPSO) features. Fault tree analysis (FTA), and failure modes and effects analysis (FMEA) methods were examined based on information already researched on modules of relex reliability studio (RRS). Equipment failures were also analyzed qualitatively by establishing a fault tree and Boolean structure function based on the shortage of failure cases, statistical data, and risk control measures examined. Failure modes of fire accident were classified according to the different areas of fire occurrences during the FMEA process, using risk priority number (RPN) methods to evaluate their severity rank. The qualitative analysis of FTA gave the basic insight of forming the failure modes of FPSO offloading, and the fire FMEA gave the priorities and suggested processes. The research has practical importance for the security analysis problems of FPSO.

  3. Quantile rank maps: a new tool for understanding individual brain development.

    PubMed

    Chen, Huaihou; Kelly, Clare; Castellanos, F Xavier; He, Ye; Zuo, Xi-Nian; Reiss, Philip T

    2015-05-01

    We propose a novel method for neurodevelopmental brain mapping that displays how an individual's values for a quantity of interest compare with age-specific norms. By estimating smoothly age-varying distributions at a set of brain regions of interest, we derive age-dependent region-wise quantile ranks for a given individual, which can be presented in the form of a brain map. Such quantile rank maps could potentially be used for clinical screening. Bootstrap-based confidence intervals are proposed for the quantile rank estimates. We also propose a recalibrated Kolmogorov-Smirnov test for detecting group differences in the age-varying distribution. This test is shown to be more robust to model misspecification than a linear regression-based test. The proposed methods are applied to brain imaging data from the Nathan Kline Institute Rockland Sample and from the Autism Brain Imaging Data Exchange (ABIDE) sample. Copyright © 2015 Elsevier Inc. All rights reserved.

  4. A Ranking Analysis of the Management Schools in Greater China (2000-2010): Evidence from the SSCI Database

    ERIC Educational Resources Information Center

    Hou, Mingjun; Fan, Peihua; Liu, Heng

    2014-01-01

    The authors rank the management schools in Greater China (including Mainland China, Hong Kong, Taiwan, and Macau) based on their academic publications in the Social Sciences Citation Index management and business journals from 2000 to 2010. Following K. Ritzberger's (2008) and X. Yu and Z. Gao's (2010) ranking method, the authors develop six…

  5. Dual Low-Rank Pursuit: Learning Salient Features for Saliency Detection.

    PubMed

    Lang, Congyan; Feng, Jiashi; Feng, Songhe; Wang, Jingdong; Yan, Shuicheng

    2016-06-01

    Saliency detection is an important procedure for machines to understand visual world as humans do. In this paper, we consider a specific saliency detection problem of predicting human eye fixations when they freely view natural images, and propose a novel dual low-rank pursuit (DLRP) method. DLRP learns saliency-aware feature transformations by utilizing available supervision information and constructs discriminative bases for effectively detecting human fixation points under the popular low-rank and sparsity-pursuit framework. Benefiting from the embedded high-level information in the supervised learning process, DLRP is able to predict fixations accurately without performing the expensive object segmentation as in the previous works. Comprehensive experiments clearly show the superiority of the proposed DLRP method over the established state-of-the-art methods. We also empirically demonstrate that DLRP provides stronger generalization performance across different data sets and inherits the advantages of both the bottom-up- and top-down-based saliency detection methods.

  6. Performance of CMIP3 and CMIP5 GCMs to simulate observed rainfall characteristics over the Western Himalayan region

    NASA Astrophysics Data System (ADS)

    Meher, J. K.; Das, L.

    2017-12-01

    The Western Himalayan Region (WHR) was subject to a significant negative trend in the annual and monsoon rainfall during 1902-2005. Annual and seasonal rainfall change over WHR of India was estimated using 22 rain gauge station rainfall data from the India Meteorological Department. The performance of 13 global climate models (GCMs) from the coupled model intercomparison project phase 3 (CMIP3) and 42 GCMs from CMIP5 was evaluated through multiple analysis: the evaluation of the mean annual cycle, annual cycles of interannual variability, spatial patterns, trends and signal-to-noise ratio. In general, CMIP5 GCMs were more skillful in terms of simulating the annual cycle of interannual variability compared to CMIP3 GCMs. The CMIP3 GCMs failed to reproduce the observed trend whereas 50% of the CMIP5 GCMs reproduced the statistical distribution of short-term (30-years) trend-estimates than for the longer term (99-years). GCMs from both CMIP3 and CMIP5 were able to simulate the spatial distribution of observed rainfall in pre-monsoon and winter months. Based on performance, each model of CMIP3 and CMIP5 was given an overall rank, which puts the high resolution version of the MIROC3.2 model (MIROC3.2 hires) and MIROC5 at the top in CMIP3 and CMIP5 respectively. Robustness of the ranking was judged through a sensitivity analysis, which indicated that ranks were independent during the process of adding or removing any individual method. It also revealed that trend analysis was not a robust method of judging performances of the model as compared to other methods.

  7. Optoelectronic scanning system upgrade by energy center localization methods

    NASA Astrophysics Data System (ADS)

    Flores-Fuentes, W.; Sergiyenko, O.; Rodriguez-Quiñonez, J. C.; Rivas-López, M.; Hernández-Balbuena, D.; Básaca-Preciado, L. C.; Lindner, L.; González-Navarro, F. F.

    2016-11-01

    A problem of upgrading an optoelectronic scanning system with digital post-processing of the signal based on adequate methods of energy center localization is considered. An improved dynamic triangulation analysis technique is proposed by an example of industrial infrastructure damage detection. A modification of our previously published method aimed at searching for the energy center of an optoelectronic signal is described. Application of the artificial intelligence algorithm of compensation for the error of determining the angular coordinate in calculating the spatial coordinate through dynamic triangulation is demonstrated. Five energy center localization methods are developed and tested to select the best method. After implementation of these methods, digital compensation for the measurement error, and statistical data analysis, a non-parametric behavior of the data is identified. The Wilcoxon signed rank test is applied to improve the result further. For optical scanning systems, it is necessary to detect a light emitter mounted on the infrastructure being investigated to calculate its spatial coordinate by the energy center localization method.

  8. Low-rank structure learning via nonconvex heuristic recovery.

    PubMed

    Deng, Yue; Dai, Qionghai; Liu, Risheng; Zhang, Zengke; Hu, Sanqing

    2013-03-01

    In this paper, we propose a nonconvex framework to learn the essential low-rank structure from corrupted data. Different from traditional approaches, which directly utilizes convex norms to measure the sparseness, our method introduces more reasonable nonconvex measurements to enhance the sparsity in both the intrinsic low-rank structure and the sparse corruptions. We will, respectively, introduce how to combine the widely used ℓp norm (0 < p < 1) and log-sum term into the framework of low-rank structure learning. Although the proposed optimization is no longer convex, it still can be effectively solved by a majorization-minimization (MM)-type algorithm, with which the nonconvex objective function is iteratively replaced by its convex surrogate and the nonconvex problem finally falls into the general framework of reweighed approaches. We prove that the MM-type algorithm can converge to a stationary point after successive iterations. The proposed model is applied to solve two typical problems: robust principal component analysis and low-rank representation. Experimental results on low-rank structure learning demonstrate that our nonconvex heuristic methods, especially the log-sum heuristic recovery algorithm, generally perform much better than the convex-norm-based method (0 < p < 1) for both data with higher rank and with denser corruptions.

  9. Statistical inference methods for two crossing survival curves: a comparison of methods.

    PubMed

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman's smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér-von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman's smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests.

  10. Statistical Inference Methods for Two Crossing Survival Curves: A Comparison of Methods

    PubMed Central

    Li, Huimin; Han, Dong; Hou, Yawen; Chen, Huilin; Chen, Zheng

    2015-01-01

    A common problem that is encountered in medical applications is the overall homogeneity of survival distributions when two survival curves cross each other. A survey demonstrated that under this condition, which was an obvious violation of the assumption of proportional hazard rates, the log-rank test was still used in 70% of studies. Several statistical methods have been proposed to solve this problem. However, in many applications, it is difficult to specify the types of survival differences and choose an appropriate method prior to analysis. Thus, we conducted an extensive series of Monte Carlo simulations to investigate the power and type I error rate of these procedures under various patterns of crossing survival curves with different censoring rates and distribution parameters. Our objective was to evaluate the strengths and weaknesses of tests in different situations and for various censoring rates and to recommend an appropriate test that will not fail for a wide range of applications. Simulation studies demonstrated that adaptive Neyman’s smooth tests and the two-stage procedure offer higher power and greater stability than other methods when the survival distributions cross at early, middle or late times. Even for proportional hazards, both methods maintain acceptable power compared with the log-rank test. In terms of the type I error rate, Renyi and Cramér—von Mises tests are relatively conservative, whereas the statistics of the Lin-Xu test exhibit apparent inflation as the censoring rate increases. Other tests produce results close to the nominal 0.05 level. In conclusion, adaptive Neyman’s smooth tests and the two-stage procedure are found to be the most stable and feasible approaches for a variety of situations and censoring rates. Therefore, they are applicable to a wider spectrum of alternatives compared with other tests. PMID:25615624

  11. Reconstruction of dynamic image series from undersampled MRI data using data-driven model consistency condition (MOCCO).

    PubMed

    Velikina, Julia V; Samsonov, Alexey A

    2015-11-01

    To accelerate dynamic MR imaging through development of a novel image reconstruction technique using low-rank temporal signal models preestimated from training data. We introduce the model consistency condition (MOCCO) technique, which utilizes temporal models to regularize reconstruction without constraining the solution to be low-rank, as is performed in related techniques. This is achieved by using a data-driven model to design a transform for compressed sensing-type regularization. The enforcement of general compliance with the model without excessively penalizing deviating signal allows recovery of a full-rank solution. Our method was compared with a standard low-rank approach utilizing model-based dimensionality reduction in phantoms and patient examinations for time-resolved contrast-enhanced angiography (CE-MRA) and cardiac CINE imaging. We studied the sensitivity of all methods to rank reduction and temporal subspace modeling errors. MOCCO demonstrated reduced sensitivity to modeling errors compared with the standard approach. Full-rank MOCCO solutions showed significantly improved preservation of temporal fidelity and aliasing/noise suppression in highly accelerated CE-MRA (acceleration up to 27) and cardiac CINE (acceleration up to 15) data. MOCCO overcomes several important deficiencies of previously proposed methods based on pre-estimated temporal models and allows high quality image restoration from highly undersampled CE-MRA and cardiac CINE data. © 2014 Wiley Periodicals, Inc.

  12. A benchmark for statistical microarray data analysis that preserves actual biological and technical variance.

    PubMed

    De Hertogh, Benoît; De Meulder, Bertrand; Berger, Fabrice; Pierre, Michael; Bareke, Eric; Gaigneaux, Anthoula; Depiereux, Eric

    2010-01-11

    Recent reanalysis of spike-in datasets underscored the need for new and more accurate benchmark datasets for statistical microarray analysis. We present here a fresh method using biologically-relevant data to evaluate the performance of statistical methods. Our novel method ranks the probesets from a dataset composed of publicly-available biological microarray data and extracts subset matrices with precise information/noise ratios. Our method can be used to determine the capability of different methods to better estimate variance for a given number of replicates. The mean-variance and mean-fold change relationships of the matrices revealed a closer approximation of biological reality. Performance analysis refined the results from benchmarks published previously.We show that the Shrinkage t test (close to Limma) was the best of the methods tested, except when two replicates were examined, where the Regularized t test and the Window t test performed slightly better. The R scripts used for the analysis are available at http://urbm-cluster.urbm.fundp.ac.be/~bdemeulder/.

  13. Hazard perception of Dutch farmers and veterinarians related to dairy young stock rearing.

    PubMed

    Boersema, J S C; Noordhuizen, J P T M; Lievaart, J J

    2013-08-01

    A group of 110 dairy farmers and 26 bovine veterinarians participated in a web-based questionnaire using the adaptive conjoint analysis technique to rank their perception regarding several hazards during 6 subsequent periods of the process of dairy young stock rearing. The method applied only involved selected respondents with a high consistency in their answering (correlation >30%). For the ranking, answers were first transformed into a utility score (US) for each hazard. The final ranking for each of the 6 periods was based on the US per hazard separately for farmers and veterinarians. Besides the ranking, the absolute values and the US itself were also compared between farmers and veterinarians to determine any statistically significant differences between the levels of the score despite the ranking. The overall conclusion is that, for almost every designated period, the ranking of the hazards differed between farmers and veterinarians. Only 1 period was observed (period IV, Pregnancy period until 4 weeks before calving) where veterinarians and farmers had the same top 3 ranking of the hazards, namely "Mastitis," "Abortion," and "Poor growth rate of the pregnant heifer." Major differences between farmers and veterinarians were seen during period II (feeding milk until weaning) for the hazard "Diarrhea in older calf," which was considered less important by farmers compared to veterinarians, and period number III (weaning until insemination) for "Over-condition," which, again, was seen as the most important hazard by veterinarians, but only ranked as number 5 by farmers. Besides the ranking, significant differences in absolute US values between veterinarians and farmers were seen in "Infection with Johne's disease" (14.5 vs. 7.8), "Diarrhea in newborn calf" (18.2 vs. 12.2), and "Insufficient feed intake" (16.2 vs. 8.4) in period I (colostrum until transition to milk replacer). Lameness represented the most important significant difference in absolute values in period III (weaning until insemination; 6.3 vs. 14.3), which was again significant in period V (4 wks before calving until calving; 7.4 vs. 12.1). The outcome of this study shows that hazard perception of veterinarians and farmers differs for most rearing periods (in ranking and absolute values). The outcome of this study can be used for 2 purposes: first, to improve communication between farmers and their consulting veterinarian about hazards and hazard perception in young stock rearing; and second, the US scores can be used to select top priority hazards which should at least be integrated into management advisory programs to improve dairy young stock rearing. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  14. Two-Dimensional Hermite Filters Simplify the Description of High-Order Statistics of Natural Images.

    PubMed

    Hu, Qin; Victor, Jonathan D

    2016-09-01

    Natural image statistics play a crucial role in shaping biological visual systems, understanding their function and design principles, and designing effective computer-vision algorithms. High-order statistics are critical for conveying local features, but they are challenging to study - largely because their number and variety is large. Here, via the use of two-dimensional Hermite (TDH) functions, we identify a covert symmetry in high-order statistics of natural images that simplifies this task. This emerges from the structure of TDH functions, which are an orthogonal set of functions that are organized into a hierarchy of ranks. Specifically, we find that the shape (skewness and kurtosis) of the distribution of filter coefficients depends only on the projection of the function onto a 1-dimensional subspace specific to each rank. The characterization of natural image statistics provided by TDH filter coefficients reflects both their phase and amplitude structure, and we suggest an intuitive interpretation for the special subspace within each rank.

  15. SHOP: scaffold HOPping by GRID-based similarity searches.

    PubMed

    Bergmann, Rikke; Linusson, Anna; Zamora, Ismael

    2007-05-31

    A new GRID-based method for scaffold hopping (SHOP) is presented. In a fully automatic manner, scaffolds were identified in a database based on three types of 3D-descriptors. SHOP's ability to recover scaffolds was assessed and validated by searching a database spiked with fragments of known ligands of three different protein targets relevant for drug discovery using a rational approach based on statistical experimental design. Five out of eight and seven out of eight thrombin scaffolds and all seven HIV protease scaffolds were recovered within the top 10 and 31 out of 31 neuraminidase scaffolds were in the 31 top-ranked scaffolds. SHOP also identified new scaffolds with substantially different chemotypes from the queries. Docking analysis indicated that the new scaffolds would have similar binding modes to those of the respective query scaffolds observed in X-ray structures. The databases contained scaffolds from published combinatorial libraries to ensure that identified scaffolds could be feasibly synthesized.

  16. Management Ratios 1. For Colleges & Universities.

    ERIC Educational Resources Information Center

    Minter, John, Ed.

    Ratios that enable colleges and universities to select other institutions for comparison are presented. The ratios and underlying data also enable colleges to rank order institutions and to calculate means, quartiles, and ranges for these groups. The data are based on FY 1983 U.S. Department of Education Statistics. The ratios summarize the…

  17. Statistical Summary of Missouri Higher Education, 1996-1997.

    ERIC Educational Resources Information Center

    Missouri Coordinating Board for Higher Education, Jefferson City.

    Extensive data tables on higher education in Missouri present information on: the academic preparation of college freshmen (fall 1996), including distribution of American College Testing (ACT) scores and high school rankings; tuition, fees, and financial aid (state and federal, by aid type, including merit-based scholarships) and trends;…

  18. Missouri Higher Education: 1995-1996 Statistical Summary.

    ERIC Educational Resources Information Center

    Missouri State Coordinating Board for Higher Education, Jefferson City.

    Extensive data tables on higher education in Missouri present information on: the academic preparation of college freshmen (fall 1995), including distribution of American College Testing (ACT) scores and high school rankings; tuition, fees, and financial aid (state and federal, by aid type, including merit-based scholarships) and trends;…

  19. Comparative evaluation of topographical data of dental implant surfaces applying optical interferometry and scanning electron microscopy.

    PubMed

    Kournetas, N; Spintzyk, S; Schweizer, E; Sawada, T; Said, F; Schmid, P; Geis-Gerstorfer, J; Eliades, G; Rupp, F

    2017-08-01

    Comparability of topographical data of implant surfaces in literature is low and their clinical relevance often equivocal. The aim of this study was to investigate the ability of scanning electron microscopy and optical interferometry to assess statistically similar 3-dimensional roughness parameter results and to evaluate these data based on predefined criteria regarded relevant for a favorable biological response. Four different commercial dental screw-type implants (NanoTite Certain Prevail, TiUnite Brånemark Mk III, XiVE S Plus and SLA Standard Plus) were analyzed by stereo scanning electron microscopy and white light interferometry. Surface height, spatial and hybrid roughness parameters (Sa, Sz, Ssk, Sku, Sal, Str, Sdr) were assessed from raw and filtered data (Gaussian 50μm and 5μm cut-off-filters), respectively. Data were statistically compared by one-way ANOVA and Tukey-Kramer post-hoc test. For a clinically relevant interpretation, a categorizing evaluation approach was used based on predefined threshold criteria for each roughness parameter. The two methods exhibited predominantly statistical differences. Dependent on roughness parameters and filter settings, both methods showed variations in rankings of the implant surfaces and differed in their ability to discriminate the different topographies. Overall, the analyses revealed scale-dependent roughness data. Compared to the pure statistical approach, the categorizing evaluation resulted in much more similarities between the two methods. This study suggests to reconsider current approaches for the topographical evaluation of implant surfaces and to further seek after proper experimental settings. Furthermore, the specific role of different roughness parameters for the bioresponse has to be studied in detail in order to better define clinically relevant, scale-dependent and parameter-specific thresholds and ranges. Copyright © 2017 The Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  20. Characteristics of genomic signatures derived using univariate methods and mechanistically anchored functional descriptors for predicting drug- and xenobiotic-induced nephrotoxicity.

    PubMed

    Shi, Weiwei; Bugrim, Andrej; Nikolsky, Yuri; Nikolskya, Tatiana; Brennan, Richard J

    2008-01-01

    ABSTRACT The ideal toxicity biomarker is composed of the properties of prediction (is detected prior to traditional pathological signs of injury), accuracy (high sensitivity and specificity), and mechanistic relationships to the endpoint measured (biological relevance). Gene expression-based toxicity biomarkers ("signatures") have shown good predictive power and accuracy, but are difficult to interpret biologically. We have compared different statistical methods of feature selection with knowledge-based approaches, using GeneGo's database of canonical pathway maps, to generate gene sets for the classification of renal tubule toxicity. The gene set selection algorithms include four univariate analyses: t-statistics, fold-change, B-statistics, and RankProd, and their combination and overlap for the identification of differentially expressed probes. Enrichment analysis following the results of the four univariate analyses, Hotelling T-square test, and, finally out-of-bag selection, a variant of cross-validation, were used to identify canonical pathway maps-sets of genes coordinately involved in key biological processes-with classification power. Differentially expressed genes identified by the different statistical univariate analyses all generated reasonably performing classifiers of tubule toxicity. Maps identified by enrichment analysis or Hotelling T-square had lower classification power, but highlighted perturbed lipid homeostasis as a common discriminator of nephrotoxic treatments. The out-of-bag method yielded the best functionally integrated classifier. The map "ephrins signaling" performed comparably to a classifier derived using sparse linear programming, a machine learning algorithm, and represents a signaling network specifically involved in renal tubule development and integrity. Such functional descriptors of toxicity promise to better integrate predictive toxicogenomics with mechanistic analysis, facilitating the interpretation and risk assessment of predictive genomic investigations.

  1. SU-G-IeP1-13: Sub-Nyquist Dynamic MRI Via Prior Rank, Intensity and Sparsity Model (PRISM)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jiang, B; Gao, H

    Purpose: Accelerated dynamic MRI is important for MRI guided radiotherapy. Inspired by compressive sensing (CS), sub-Nyquist dynamic MRI has been an active research area, i.e., sparse sampling in k-t space for accelerated dynamic MRI. This work is to investigate sub-Nyquist dynamic MRI via a previously developed CS model, namely Prior Rank, Intensity and Sparsity Model (PRISM). Methods: The proposed method utilizes PRISM with rank minimization and incoherent sampling patterns for sub-Nyquist reconstruction. In PRISM, the low-rank background image, which is automatically calculated by rank minimization, is excluded from the L1 minimization step of the CS reconstruction to further sparsify themore » residual image, thus allowing for higher acceleration rates. Furthermore, the sampling pattern in k-t space is made more incoherent by sampling a different set of k-space points at different temporal frames. Results: Reconstruction results from L1-sparsity method and PRISM method with 30% undersampled data and 15% undersampled data are compared to demonstrate the power of PRISM for dynamic MRI. Conclusion: A sub- Nyquist MRI reconstruction method based on PRISM is developed with improved image quality from the L1-sparsity method.« less

  2. Rank-based decompositions of morphological templates.

    PubMed

    Sussner, P; Ritter, G X

    2000-01-01

    Methods for matrix decomposition have found numerous applications in image processing, in particular for the problem of template decomposition. Since existing matrix decomposition techniques are mainly concerned with the linear domain, we consider it timely to investigate matrix decomposition techniques in the nonlinear domain with applications in image processing. The mathematical basis for these investigations is the new theory of rank within minimax algebra. Thus far, only minimax decompositions of rank 1 and rank 2 matrices into outer product expansions are known to the image processing community. We derive a heuristic algorithm for the decomposition of matrices having arbitrary rank.

  3. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    PubMed Central

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2014-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion. PMID:25422534

  4. Speaker-sensitive emotion recognition via ranking: Studies on acted and spontaneous speech☆

    PubMed

    Cao, Houwei; Verma, Ragini; Nenkova, Ani

    2015-01-01

    We introduce a ranking approach for emotion recognition which naturally incorporates information about the general expressivity of speakers. We demonstrate that our approach leads to substantial gains in accuracy compared to conventional approaches. We train ranking SVMs for individual emotions, treating the data from each speaker as a separate query, and combine the predictions from all rankers to perform multi-class prediction. The ranking method provides two natural benefits. It captures speaker specific information even in speaker-independent training/testing conditions. It also incorporates the intuition that each utterance can express a mix of possible emotion and that considering the degree to which each emotion is expressed can be productively exploited to identify the dominant emotion. We compare the performance of the rankers and their combination to standard SVM classification approaches on two publicly available datasets of acted emotional speech, Berlin and LDC, as well as on spontaneous emotional data from the FAU Aibo dataset. On acted data, ranking approaches exhibit significantly better performance compared to SVM classification both in distinguishing a specific emotion from all others and in multi-class prediction. On the spontaneous data, which contains mostly neutral utterances with a relatively small portion of less intense emotional utterances, ranking-based classifiers again achieve much higher precision in identifying emotional utterances than conventional SVM classifiers. In addition, we discuss the complementarity of conventional SVM and ranking-based classifiers. On all three datasets we find dramatically higher accuracy for the test items on whose prediction the two methods agree compared to the accuracy of individual methods. Furthermore on the spontaneous data the ranking and standard classification are complementary and we obtain marked improvement when we combine the two classifiers by late-stage fusion.

  5. Nexus Between Protein–Ligand Affinity Rank-Ordering, Biophysical Approaches, and Drug Discovery

    PubMed Central

    2013-01-01

    The confluence of computational and biophysical methods to accurately rank-order the binding affinities of small molecules and determine structures of macromolecular complexes is a potentially transformative advance in the work flow of drug discovery. This viewpoint explores the impact that advanced computational methods may have on the efficacy of small molecule drug discovery and optimization, particularly with respect to emerging fragment-based methods. PMID:24900579

  6. Hierarchical models and bayesian analysis of bird survey information

    Treesearch

    John R. Sauer; William A. Link; J. Andrew Royle

    2005-01-01

    Summary of bird survey information is a critical component of conservation activities, but often our summaries rely on statistical methods that do not accommodate the limitations of the information. Prioritization of species requires ranking and analysis of species by magnitude of population trend, but often magnitude of trend is a misleading measure of actual decline...

  7. Rank score and permutation testing alternatives for regression quantile estimates

    USGS Publications Warehouse

    Cade, B.S.; Richards, J.D.; Mielke, P.W.

    2006-01-01

    Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application relating trout densities to stream channel width:depth.

  8. Quality of Public Hospitals Websites: A Cross-Sectional Analytical Study in Iran.

    PubMed

    Salarvand, Shahin; Samadbeik, Mahnaz; Tarrahi, Mohammad Javad; Salarvand, Hamed

    2016-04-01

    Nowadays, hospitals have turned increasingly towards the Internet and develop their own web presence. Hospital Websites could be operating as effective web resources of information and interactive communication mediums to enhance hospital services to the public. Therefore, the aim of this study was to assess the quality of websites in Tehran's public hospitals. This cross-sectional analysis involved all public hospitals in Iran's capital city, Tehran, with a working website or subsites between April and June, 2014 (N=59). The websites were evaluated using three validated instruments: a localized checklist, Google page rank, and the Alexa traffic ranking. The mentioned checklist consisted of 112 items divided into five sections: technical characteristics, hospital information and facilities, medical services, interactive on-line services and external activities. Data were analyzed using descriptive and analytical statistics. The mean website evaluation score was 45.7 out of 224 for selected public hospitals. All the studied websites were in the weak category based on the earned quality scores. There was no statistically significant association between the website evaluation score with Google page rank (P=0.092), Alexa global traffic rank and Alexa traffic rank in Iran (P>0.05). The hospital websites had a lower quality score in the interactive online services and external activities criteria in comparing to other criteria. Due to the low quality level of the studied websites and the importance of hospital portals in providing information and services on the Internet, the authorities should do precise planning for the appreciable improvement in the quality of hospital websites.

  9. Detecting determinism with improved sensitivity in time series: rank-based nonlinear predictability score.

    PubMed

    Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G

    2014-09-01

    The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).

  10. Detecting determinism with improved sensitivity in time series: Rank-based nonlinear predictability score

    NASA Astrophysics Data System (ADS)

    Naro, Daniel; Rummel, Christian; Schindler, Kaspar; Andrzejak, Ralph G.

    2014-09-01

    The rank-based nonlinear predictability score was recently introduced as a test for determinism in point processes. We here adapt this measure to time series sampled from time-continuous flows. We use noisy Lorenz signals to compare this approach against a classical amplitude-based nonlinear prediction error. Both measures show an almost identical robustness against Gaussian white noise. In contrast, when the amplitude distribution of the noise has a narrower central peak and heavier tails than the normal distribution, the rank-based nonlinear predictability score outperforms the amplitude-based nonlinear prediction error. For this type of noise, the nonlinear predictability score has a higher sensitivity for deterministic structure in noisy signals. It also yields a higher statistical power in a surrogate test of the null hypothesis of linear stochastic correlated signals. We show the high relevance of this improved performance in an application to electroencephalographic (EEG) recordings from epilepsy patients. Here the nonlinear predictability score again appears of higher sensitivity to nonrandomness. Importantly, it yields an improved contrast between signals recorded from brain areas where the first ictal EEG signal changes were detected (focal EEG signals) versus signals recorded from brain areas that were not involved at seizure onset (nonfocal EEG signals).

  11. Risk-based prioritization method for the classification of groundwater pesticide pollution from agricultural regions.

    PubMed

    Yang, Yu; Lian, Xin-Ying; Jiang, Yong-Hai; Xi, Bei-Dou; He, Xiao-Song

    2017-11-01

    Agricultural regions are a significant source of groundwater pesticide pollution. To ensure that agricultural regions with a significantly high risk of groundwater pesticide contamination are properly managed, a risk-based ranking method related to groundwater pesticide contamination is needed. In the present paper, a risk-based prioritization method for the classification of groundwater pesticide pollution from agricultural regions was established. The method encompasses 3 phases, including indicator selection, characterization, and classification. In the risk ranking index system employed here, 17 indicators involving the physicochemical properties, environmental behavior characteristics, pesticide application methods, and inherent vulnerability of groundwater in the agricultural region were selected. The boundary of each indicator was determined using K-means cluster analysis based on a survey of a typical agricultural region and the physical and chemical properties of 300 typical pesticides. The total risk characterization was calculated by multiplying the risk value of each indicator, which could effectively avoid the subjectivity of index weight calculation and identify the main factors associated with the risk. The results indicated that the risk for groundwater pesticide contamination from agriculture in a region could be ranked into 4 classes from low to high risk. This method was applied to an agricultural region in Jiangsu Province, China, and it showed that this region had a relatively high risk for groundwater contamination from pesticides, and that the pesticide application method was the primary factor contributing to the relatively high risk. The risk ranking method was determined to be feasible, valid, and able to provide reference data related to the risk management of groundwater pesticide pollution from agricultural regions. Integr Environ Assess Manag 2017;13:1052-1059. © 2017 SETAC. © 2017 SETAC.

  12. Microarray image analysis: background estimation using quantile and morphological filters.

    PubMed

    Bengtsson, Anders; Bengtsson, Henrik

    2006-02-28

    In a microarray experiment the difference in expression between genes on the same slide is up to 103 fold or more. At low expression, even a small error in the estimate will have great influence on the final test and reference ratios. In addition to the true spot intensity the scanned signal consists of different kinds of noise referred to as background. In order to assess the true spot intensity background must be subtracted. The standard approach to estimate background intensities is to assume they are equal to the intensity levels between spots. In the literature, morphological opening is suggested to be one of the best methods for estimating background this way. This paper examines fundamental properties of rank and quantile filters, which include morphological filters at the extremes, with focus on their ability to estimate between-spot intensity levels. The bias and variance of these filter estimates are driven by the number of background pixels used and their distributions. A new rank-filter algorithm is implemented and compared to methods available in Spot by CSIRO and GenePix Pro by Axon Instruments. Spot's morphological opening has a mean bias between -47 and -248 compared to a bias between 2 and -2 for the rank filter and the variability of the morphological opening estimate is 3 times higher than for the rank filter. The mean bias of Spot's second method, morph.close.open, is between -5 and -16 and the variability is approximately the same as for morphological opening. The variability of GenePix Pro's region-based estimate is more than ten times higher than the variability of the rank-filter estimate and with slightly more bias. The large variability is because the size of the background window changes with spot size. To overcome this, a non-adaptive region-based method is implemented. Its bias and variability are comparable to that of the rank filter. The performance of more advanced rank filters is equal to the best region-based methods. However, in order to get unbiased estimates these filters have to be implemented with great care. The performance of morphological opening is in general poor with a substantial spatial-dependent bias.

  13. Understanding perceived availability and importance of tobacco control interventions to inform European adoption of a UK economic model: a cross-sectional study.

    PubMed

    Kulchaitanaroaj, Puttarin; Kaló, Zoltán; West, Robert; Cheung, Kei Long; Evers, Silvia; Vokó, Zoltán; Hiligsmann, Mickael; de Vries, Hein; Owen, Lesley; Trapero-Bertran, Marta; Leidl, Reiner; Pokhrel, Subhash

    2018-02-14

    The evidence on the extent to which stakeholders in different European countries agree with availability and importance of tobacco-control interventions is limited. This study assessed and compared stakeholders' views from five European countries and compared the perceived ranking of interventions with evidence-based ranking using cost-effectiveness data. An interview survey (face-to-face, by phone or Skype) was conducted between April and July 2014 with five categories of stakeholders - decision makers, service purchasers, service providers, evidence generators and health promotion advocates - from Germany, Hungary, the Netherlands, Spain, and the United Kingdom. A list of potential stakeholders drawn from the research team's contacts and snowballing served as the sampling frame. An email invitation was sent to all stakeholders in this list and recruitment was based on positive replies. Respondents were asked to rate availability and importance of 30 tobacco control interventions. Kappa coefficients assessed agreement of stakeholders' views. A mean importance score for each intervention was used to rank the interventions. This ranking was compared with the ranking based on cost-effectiveness data from a published review. Ninety-three stakeholders (55.7% response rate) completed the survey: 18.3% were from Germany, 17.2% from Hungary, 30.1% from the Netherlands, 19.4% from Spain, and 15.1% from the UK. Of those, 31.2% were decision makers, 26.9% evidence generators, 19.4% service providers, 15.1% health-promotion advocates, and 7.5% purchasers of services/pharmaceutical products. Smoking restrictions in public areas were rated as the most important intervention (mean score = 1.89). The agreement on availability of interventions between the stakeholders was very low (kappa = 0.098; 95% CI = [0.085, 0.111] but the agreement on the importance of the interventions was fair (kappa = 0.239; 95% CI = [0.208, 0.253]). A correlation was found between availability and importance rankings for stage-based interventions. The importance ranking was not statistically concordant with the ranking based on published cost-effectiveness data (Kendall rank correlation coefficient = 0.40; p-value = 0.11; 95% CI = [- 0.09, 0.89]). The intrinsic differences in stakeholder views must be addressed while transferring economic evidence Europe-wide. Strong engagement with stakeholders, focussing on better communication, has a potential to mitigate this challenge.

  14. User embracement with risk classification in an emergency care unit: an evaluative study.

    PubMed

    Hermida, Patrícia Madalena Vieira; Nascimento, Eliane Regina Pereira do; Echevarría-Guanilo, Maria Elena; Brüggemann, Odaléa Maria; Malfussi, Luciana Bihain Hagemann de

    2018-01-01

    Objective Describing the evaluation of the Structure, Process and Outcome of User Embracement with Risk Classification of an Emergency Care Unit from the perspective of physicians and nurses. Method An evaluative, descriptive, quantitative study developed in Santa Catarina. Data were collected using a validated and adapted instrument consisting of 21 items distributed in the dimensions of Structure (facilities), Process (activities and relationships in providing care) and Outcome (care effects). In the analysis, descriptive statistics and the Mean Ranking and Mean Score calculations were applied. Results The sample consisted of 37 participants. From the 21 evaluated items, 11 (52.4%) had a Mean Ranking between 3 and 4, and none of them reached the maximum ranking (5 points). "Prioritization of severe cases" and "Primary care according to the severity of the case" reached a higher Mean Ranking (4.5), while "Flowchart discussion" had the lowest Ranking (2.1). The dimensions of Structure, Process and Outcome reached mean scores of 23.9, 21.9 and 25.5, respectively, indicating a Precarious evaluation (17.5 to 26.1 points). Conclusion User Embracement with Risk Classification is precarious, especially regarding the Process which obtained a lower satisfaction level from the participants.

  15. Combining evidence and values in priority setting: testing the balance sheet method in a low-income country.

    PubMed

    Makundi, Emmanuel; Kapiriri, Lydia; Norheim, Ole Frithjof

    2007-09-24

    Procedures for priority setting need to incorporate both scientific evidence and public values. The aim of this study was to test out a model for priority setting which incorporates both scientific evidence and public values, and to explore use of evidence by a selection of stakeholders and to study reasons for the relative ranking of health care interventions in a setting of extreme resource scarcity. Systematic search for and assessment of relevant evidence for priority setting in a low-income country. Development of a balance sheet according to Eddy's explicit method. Eight group interviews (n-85), using a modified nominal group technique for eliciting individual and group rankings of a given set of health interventions. The study procedure made it possible to compare the groups' ranking before and after all the evidence was provided to participants. A rank deviation is significant if the rank order of the same intervention differed by two or more points on the ordinal scale. A comparison between the initial rank and the final rank (before deliberation) showed a rank deviation of 67%. The difference between the initial rank and the final rank after discussion and voting gave a rank deviation of 78%. Evidence-based and deliberative decision-making does change priorities significantly in an experimental setting. Our use of the balance sheet method was meant as a demonstration project, but could if properly developed be feasible for health planners, experts and health workers, although more work is needed before it can be used for laypersons.

  16. Rank-k modification methods for recursive least squares problems

    NASA Astrophysics Data System (ADS)

    Olszanskyj, Serge; Lebak, James; Bojanczyk, Adam

    1994-09-01

    In least squares problems, it is often desired to solve the same problem repeatedly but with several rows of the data either added, deleted, or both. Methods for quickly solving a problem after adding or deleting one row of data at a time are known. In this paper we introduce fundamental rank-k updating and downdating methods and show how extensions of rank-1 downdating methods based on LINPACK, Corrected Semi-Normal Equations (CSNE), and Gram-Schmidt factorizations, as well as new rank-k downdating methods, can all be derived from these fundamental results. We then analyze the cost of each new algorithm and make comparisons tok applications of the corresponding rank-1 algorithms. We provide experimental results comparing the numerical accuracy of the various algorithms, paying particular attention to the downdating methods, due to their potential numerical difficulties for ill-conditioned problems. We then discuss the computation involved for each downdating method, measured in terms of operation counts and BLAS calls. Finally, we provide serial execution timing results for these algorithms, noting preferable points for improvement and optimization. From our experiments we conclude that the Gram-Schmidt methods perform best in terms of numerical accuracy, but may be too costly for serial execution for large problems.

  17. Jackknife Variance Estimator for Two Sample Linear Rank Statistics

    DTIC Science & Technology

    1988-11-01

    Accesion For - - ,NTIS GPA&I "TIC TAB Unann c, nc .. [d Keywords: strong consistency; linear rank test’ influence function . i , at L By S- )Distribut...reverse if necessary and identify by block number) FIELD IGROUP SUB-GROUP Strong consistency; linear rank test; influence function . 19. ABSTRACT

  18. Association between tumour infiltrating lymphocytes, histotype and clinical outcome in epithelial ovarian cancer.

    PubMed

    James, Fiona R; Jiminez-Linan, Mercedes; Alsop, Jennifer; Mack, Marie; Song, Honglin; Brenton, James D; Pharoah, Paul D P; Ali, H Raza

    2017-09-20

    There is evidence that some ovarian tumours evoke an immune response, which can be assessed by tumour infiltrating lymphocytes (TILs). To facilitate adoption of TILs as a clinical biomarker, a standardised method for their H&E visual evaluation has been validated in breast cancer. We sought to investigate the prognostic significance of TILs in a study of 953 invasive epithelial ovarian cancer tumour samples, both primary and metastatic, from 707 patients from the prospective population-based SEARCH study. TILs were analysed using a standardised method based on H&E staining producing a percentage score for stromal and intratumoral compartments. We used Cox regression to estimate hazard ratios of the association between TILs and survival. The extent of stromal and intra-tumoral TILs were correlated in the primary tumours (n = 679, Spearman's rank correlation = 0.60, P < 0.001) with a similar correlation in secondary tumours (n = 224, Spearman's rank correlation = 0.62, P < 0.001). There was a weak correlation between stromal TIL levels in primary and secondary tumour samples (Spearman's rank correlation = 0.29, P < 0.001) and intra-tumoral TIL levels in primary and secondary tumour samples (Spearman's rank correlation = 0.19, P = 0.0094). The extent of stromal TILs differed between histotypes (Pearson chi2 (12d.f.) 54.1, P < 0.0001) with higher levels of stromal infiltration in the high-grade serous and endometriod cases. A significant association was observed for higher intratumoral TIL levels and a favourable prognosis (HR 0.74 95% CI 0.55-1.00 p = 0.047). This study is the largest collection of epithelial ovarian tumour samples evaluated for TILs. We have shown that stromal and intratumoral TIL levels are correlated and that their levels correlate with clinical variables such as tumour histological subtype. We have also shown that increased levels of both intratumoral and stromal TILs are associated with a better prognosis; however, this is only statistically significant for intratumoral TILs. This study suggests that a clinically useful immune prognostic indicator in epithelial ovarian cancer could be developed using this technique.

  19. Implementation of preference ranking organization method for enrichment evaluation (Promethee) on selection system of student’s achievement

    NASA Astrophysics Data System (ADS)

    Karlitasari, L.; Suhartini, D.; Nurrosikawati, L.

    2018-03-01

    Selection of Student Achievement is conducted every year, starting from the level of Study Program, Faculty, to University, which then rank one will be sent to Kopertis level. The criteria made for the selection are Academic and Rich Scientific, Organizational, Personality, and English. In order for the selection of Student Achievement is Objective, then in addition to the presence of the jury is expected to use methods that support the decision to be more optimal in determining the Student Achievement. One method used is the Promethee Method. Preference Ranking Organization Method for Enrichment Evaluation (Promethee) is a method of ranking in Multi Criteria Decision Making (MCDM). PROMETHEE has the advantage that there is a preference type against the criteria that can take into account alternatives with other alternatives on the same criteria. The conjecture of alternate dominance over a criterion used in PROMETHEE is the use of values in the relationships between alternative ranking values. Based on the calculation result, from 7 applicants between Manual and Promethee Matrices, rank 1, 2, and 3, did not change, only 4 to 7 positions were changed. However, after the sensitivity test, almost all criteria experience a high level of sensitivity. Although it does not affect the students who will be sent to the next level, but can bring psychological impact on prospective student’s achievement

  20. Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering.

    PubMed

    Gao, Shan; Guo, Guibing; Li, Runzhi; Wang, Zongmin

    2017-01-01

    Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users' preference by exploiting explicit feedbacks (numerical ratings), or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks). Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users' actions), based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users' other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app.

  1. Leveraging Multiactions to Improve Medical Personalized Ranking for Collaborative Filtering

    PubMed Central

    2017-01-01

    Nowadays, providing high-quality recommendation services to users is an essential component in web applications, including shopping, making friends, and healthcare. This can be regarded either as a problem of estimating users' preference by exploiting explicit feedbacks (numerical ratings), or as a problem of collaborative ranking with implicit feedback (e.g., purchases, views, and clicks). Previous works for solving this issue include pointwise regression methods and pairwise ranking methods. The emerging healthcare websites and online medical databases impose a new challenge for medical service recommendation. In this paper, we develop a model, MBPR (Medical Bayesian Personalized Ranking over multiple users' actions), based on the simple observation that users tend to assign higher ranks to some kind of healthcare services that are meanwhile preferred in users' other actions. Experimental results on the real-world datasets demonstrate that MBPR achieves more accurate recommendations than several state-of-the-art methods and shows its generality and scalability via experiments on the datasets from one mobile shopping app. PMID:29118963

  2. Trace Norm Regularized CANDECOMP/PARAFAC Decomposition With Missing Data.

    PubMed

    Liu, Yuanyuan; Shang, Fanhua; Jiao, Licheng; Cheng, James; Cheng, Hong

    2015-11-01

    In recent years, low-rank tensor completion (LRTC) problems have received a significant amount of attention in computer vision, data mining, and signal processing. The existing trace norm minimization algorithms for iteratively solving LRTC problems involve multiple singular value decompositions of very large matrices at each iteration. Therefore, they suffer from high computational cost. In this paper, we propose a novel trace norm regularized CANDECOMP/PARAFAC decomposition (TNCP) method for simultaneous tensor decomposition and completion. We first formulate a factor matrix rank minimization model by deducing the relation between the rank of each factor matrix and the mode- n rank of a tensor. Then, we introduce a tractable relaxation of our rank function, and then achieve a convex combination problem of much smaller-scale matrix trace norm minimization. Finally, we develop an efficient algorithm based on alternating direction method of multipliers to solve our problem. The promising experimental results on synthetic and real-world data validate the effectiveness of our TNCP method. Moreover, TNCP is significantly faster than the state-of-the-art methods and scales to larger problems.

  3. A Truncated Nuclear Norm Regularization Method Based on Weighted Residual Error for Matrix Completion.

    PubMed

    Qing Liu; Zhihui Lai; Zongwei Zhou; Fangjun Kuang; Zhong Jin

    2016-01-01

    Low-rank matrix completion aims to recover a matrix from a small subset of its entries and has received much attention in the field of computer vision. Most existing methods formulate the task as a low-rank matrix approximation problem. A truncated nuclear norm has recently been proposed as a better approximation to the rank of matrix than a nuclear norm. The corresponding optimization method, truncated nuclear norm regularization (TNNR), converges better than the nuclear norm minimization-based methods. However, it is not robust to the number of subtracted singular values and requires a large number of iterations to converge. In this paper, a TNNR method based on weighted residual error (TNNR-WRE) for matrix completion and its extension model (ETNNR-WRE) are proposed. TNNR-WRE assigns different weights to the rows of the residual error matrix in an augmented Lagrange function to accelerate the convergence of the TNNR method. The ETNNR-WRE is much more robust to the number of subtracted singular values than the TNNR-WRE, TNNR alternating direction method of multipliers, and TNNR accelerated proximal gradient with Line search methods. Experimental results using both synthetic and real visual data sets show that the proposed TNNR-WRE and ETNNR-WRE methods perform better than TNNR and Iteratively Reweighted Nuclear Norm (IRNN) methods.

  4. Qualitative Analysis of Surveyed Emergency Responders and the Identified Factors That Affect First Stage of Primary Triage Decision-Making of Mass Casualty Incidents

    PubMed Central

    Klein, Kelly R.; Burkle Jr., Frederick M.; Swienton, Raymond; King, Richard V.; Lehman, Thomas; North, Carol S.

    2016-01-01

    Introduction: After all large-scale disasters multiple papers are published describing the shortcomings of the triage methods utilized. This paper uses medical provider input to help describe attributes and patient characteristics that impact triage decisions. Methods: A survey distributed electronically to medical providers with and without disaster experience. Questions asked included what disaster experiences they had, and to rank six attributes in order of importance regarding triage. Results: 403 unique completed surveys were analyzed. 92% practiced a structural triage approach with the rest reporting they used “gestalt”.(gut feeling) Twelve per cent were identified as having placed patients in an expectant category during triage. Respiratory status, ability to speak, perfusion/pulse were all ranked in the top three. Gut feeling regardless of statistical analysis was fourth. Supplies were ranked in the top four when analyzed for those who had placed patients in the expectant category. Conclusion: Primary triage decisions in a mass casualty scenario are multifactorial and encompass patient mobility, life saving interventions, situational instincts, and logistics. PMID:27651979

  5. Learning to Select Supplier Portfolios for Service Supply Chain

    PubMed Central

    Zhang, Rui; Li, Jingfei; Wu, Shaoyu; Meng, Dabin

    2016-01-01

    The research on service supply chain has attracted more and more focus from both academia and industrial community. In a service supply chain, the selection of supplier portfolio is an important and difficult problem due to the fact that a supplier portfolio may include multiple suppliers from a variety of fields. To address this problem, we propose a novel supplier portfolio selection method based on a well known machine learning approach, i.e., Ranking Neural Network (RankNet). In the proposed method, we regard the problem of supplier portfolio selection as a ranking problem, which integrates a large scale of decision making features into a ranking neural network. Extensive simulation experiments are conducted, which demonstrate the feasibility and effectiveness of the proposed method. The proposed supplier portfolio selection model can be applied in a real corporation easily in the future. PMID:27195756

  6. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises

    PubMed Central

    Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise. PMID:28692667

  7. Optimal Weights Mixed Filter for removing mixture of Gaussian and impulse noises.

    PubMed

    Jin, Qiyu; Grama, Ion; Liu, Quansheng

    2017-01-01

    In this paper we consider the problem of restoration of a image contaminated by a mixture of Gaussian and impulse noises. We propose a new statistic called ROADGI which improves the well-known Rank-Ordered Absolute Differences (ROAD) statistic for detecting points contaminated with the impulse noise in this context. Combining ROADGI statistic with the method of weights optimization we obtain a new algorithm called Optimal Weights Mixed Filter (OWMF) to deal with the mixed noise. Our simulation results show that the proposed filter is effective for mixed noises, as well as for single impulse noise and for single Gaussian noise.

  8. Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

    NASA Astrophysics Data System (ADS)

    Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

    2015-03-01

    Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.

  9. A comparative analysis of swarm intelligence techniques for feature selection in cancer classification.

    PubMed

    Gunavathi, Chellamuthu; Premalatha, Kandasamy

    2014-01-01

    Feature selection in cancer classification is a central area of research in the field of bioinformatics and used to select the informative genes from thousands of genes of the microarray. The genes are ranked based on T-statistics, signal-to-noise ratio (SNR), and F-test values. The swarm intelligence (SI) technique finds the informative genes from the top-m ranked genes. These selected genes are used for classification. In this paper the shuffled frog leaping with Lévy flight (SFLLF) is proposed for feature selection. In SFLLF, the Lévy flight is included to avoid premature convergence of shuffled frog leaping (SFL) algorithm. The SI techniques such as particle swarm optimization (PSO), cuckoo search (CS), SFL, and SFLLF are used for feature selection which identifies informative genes for classification. The k-nearest neighbour (k-NN) technique is used to classify the samples. The proposed work is applied on 10 different benchmark datasets and examined with SI techniques. The experimental results show that the results obtained from k-NN classifier through SFLLF feature selection method outperform PSO, CS, and SFL.

  10. Self-management improvement program combined with community involvement in Thai hypertensive population: an action research.

    PubMed

    Srichairattanakull, Jeamjai; Kaewpan, Wonpen; Powattana, Arpaporn; Pichayapinyo, Panan

    2014-04-01

    To investigate the effectiveness of a program that utilizes community involvement to improve the self-management strategies among people living with hypertension. Forty-four subjects, aged 35 to 59-year-old, with hypertension in Nakhon Pathom Province, Thailand, were randomly allocated to either an experimental group (n = 22) or a control group (n = 20). The experimental group attended a program to improve self-management methods based on social cognitive theory (SCT). The program lasted 12 weeks, consisted of 1 1/2 hours meeting once a week, including group meetings and home visit monitoring. Mann-Whitney U test and Friedman test were employed to analyze the program's effectiveness. After the program, the mean rank of the perceived self-efficacy for the self-management strategies was statistically different between the two groups (p = 0.023). In the experimental group, after the twelve week, the mean rank of perceived self-efficacy and outcome expectancy increased and diastolic blood pressure decreased after the eight week. The program applied social cognitive theory (SCT) to promote self-management techniques, increased the health promoting behavior among hypertensive people.

  11. Resolvent-based modeling of passive scalar dynamics in wall-bounded turbulence

    NASA Astrophysics Data System (ADS)

    Dawson, Scott; Saxton-Fox, Theresa; McKeon, Beverley

    2017-11-01

    The resolvent formulation of the Navier-Stokes equations expresses the system state as the output of a linear (resolvent) operator acting upon a nonlinear forcing. Previous studies have demonstrated that a low-rank approximation of this linear operator predicts many known features of incompressible wall-bounded turbulence. In this work, this resolvent model for wall-bounded turbulence is extended to include a passive scalar field. This formulation allows for a number of additional simplifications that reduce model complexity. Firstly, it is shown that the effect of changing scalar diffusivity can be approximated through a transformation of spatial wavenumbers and temporal frequencies. Secondly, passive scalar dynamics may be studied through the low-rank approximation of a passive scalar resolvent operator, which is decoupled from velocity response modes. Thirdly, this passive scalar resolvent operator is amenable to approximation by semi-analytic methods. We investigate the extent to which this resulting hierarchy of models can describe and predict passive scalar dynamics and statistics in wall-bounded turbulence. The support of AFOSR under Grant Numbers FA9550-16-1-0232 and FA9550-16-1-0361 is gratefully acknowledged.

  12. Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification.

    PubMed

    Jowkar, Gholam-Hossein; Mansoori, Eghbal G

    2016-10-01

    Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k-nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Copyright © 2016 Elsevier Ltd. All rights reserved.

  13. A Novel Image Recuperation Approach for Diagnosing and Ranking Retinopathy Disease Level Using Diabetic Fundus Image

    PubMed Central

    2015-01-01

    Retinal fundus images are widely used in diagnosing and providing treatment for several eye diseases. Prior works using retinal fundus images detected the presence of exudation with the aid of publicly available dataset using extensive segmentation process. Though it was proved to be computationally efficient, it failed to create a diabetic retinopathy feature selection system for transparently diagnosing the disease state. Also the diagnosis of diseases did not employ machine learning methods to categorize candidate fundus images into true positive and true negative ratio. Several candidate fundus images did not include more detailed feature selection technique for diabetic retinopathy. To apply machine learning methods and classify the candidate fundus images on the basis of sliding window a method called, Diabetic Fundus Image Recuperation (DFIR) is designed in this paper. The initial phase of DFIR method select the feature of optic cup in digital retinal fundus images based on Sliding Window Approach. With this, the disease state for diabetic retinopathy is assessed. The feature selection in DFIR method uses collection of sliding windows to obtain the features based on the histogram value. The histogram based feature selection with the aid of Group Sparsity Non-overlapping function provides more detailed information of features. Using Support Vector Model in the second phase, the DFIR method based on Spiral Basis Function effectively ranks the diabetic retinopathy diseases. The ranking of disease level for each candidate set provides a much promising result for developing practically automated diabetic retinopathy diagnosis system. Experimental work on digital fundus images using the DFIR method performs research on the factors such as sensitivity, specificity rate, ranking efficiency and feature selection time. PMID:25974230

  14. Influences for Gender Disparity in Academic Psychiatry in the United States.

    PubMed

    Sheikh, Muhammad H; Chaudhary, Amna Mohyud Din; Khan, Anum S; Tahir, Muhammad A; Yahya, Hafiz A; Naveed, Sadiq; Khosa, Faisal

    2018-04-22

    Introduction Academic undertakings, including research, lead to career progression. However, the career paths of female psychiatrists appear to diverge significantly from that of their male counterparts. This article reviews the pervasiveness of the trend of women being less likely to pursue active research in psychiatry. In addition, we examine the correlation between academic rank and research productivity. Methods We searched the American Medical Association's (AMA) Fellowship and Residency Electronic Interactive Database (FREIDA) to identify training programs for psychiatry. A total of 5234 psychiatrists met our inclusion criteria. The gender, academic rank, research work, and h-index of faculty members were compared. The ratio of women reaching senior ranks as compared to men was also calculated. The Scopus database was used to determine the h-index of the individuals included in this study. Data analysis was done with SPSS 22.0 Release 2013 (IBM SPSS Statistics for Windows, IBM, Armonk, NY, USA). Kruskal-Wallis and Mann-Whitney U tests were used where required, with the P-value set at less than 0.05. Results In our study sample, 2181 (42%) of the psychiatrists were women. However, according to the information obtained from the websites of 23 programs, few women reached higher ranks, full professorship, or positions such as the chairperson of a program, and only 9% of women achieved the designation of chairperson of the psychiatry department, with men representing the other 91%. Higher academic rank correlated with higher h-index. A statistically-significant difference between the genders in terms of h-index was found for the assistant professor rank as well. However, this difference was not observed at the level of an associate professor. Conclusions Despite adequate representation of women in the academic workforce in psychiatry, there appears to be a discrepancy in the research productivity of the two genders. This study highlights the need for targeted interventions to address gender disparities in academic psychiatry.

  15. OPATs: Omnibus P-value association tests.

    PubMed

    Chen, Chia-Wei; Yang, Hsin-Chou

    2017-07-10

    Combining statistical significances (P-values) from a set of single-locus association tests in genome-wide association studies is a proof-of-principle method for identifying disease-associated genomic segments, functional genes and biological pathways. We review P-value combinations for genome-wide association studies and introduce an integrated analysis tool, Omnibus P-value Association Tests (OPATs), which provides popular analysis methods of P-value combinations. The software OPATs programmed in R and R graphical user interface features a user-friendly interface. In addition to analysis modules for data quality control and single-locus association tests, OPATs provides three types of set-based association test: window-, gene- and biopathway-based association tests. P-value combinations with or without threshold and rank truncation are provided. The significance of a set-based association test is evaluated by using resampling procedures. Performance of the set-based association tests in OPATs has been evaluated by simulation studies and real data analyses. These set-based association tests help boost the statistical power, alleviate the multiple-testing problem, reduce the impact of genetic heterogeneity, increase the replication efficiency of association tests and facilitate the interpretation of association signals by streamlining the testing procedures and integrating the genetic effects of multiple variants in genomic regions of biological relevance. In summary, P-value combinations facilitate the identification of marker sets associated with disease susceptibility and uncover missing heritability in association studies, thereby establishing a foundation for the genetic dissection of complex diseases and traits. OPATs provides an easy-to-use and statistically powerful analysis tool for P-value combinations. OPATs, examples, and user guide can be downloaded from http://www.stat.sinica.edu.tw/hsinchou/genetics/association/OPATs.htm. © The Author 2017. Published by Oxford University Press.

  16. Deaths: Leading Causes for 2013.

    PubMed

    Heron, Melonie

    2016-02-16

    This report presents final 2013 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements "Deaths: Final Data for 2013," the National Center for Health Statistics’ annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2013. Causes of death classified by the International Classification of Diseases, Tenth Revision (ICD–10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2013, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Accidents (unintentional injuries); Cerebrovascular diseases; Alzheimer’s disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). They accounted for 74% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2013 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Newborn affected by maternal complications of pregnancy; Sudden infant death syndrome; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

  17. Deaths: Leading Causes for 2011.

    PubMed

    Heron, Melonie

    2015-07-27

    This report presents final 2011 data on the 10 leading causes of death in the United States by age, sex, race, and Hispanic origin. Leading causes of infant, neonatal, and postneonatal death are also presented. This report supplements ‘‘Deaths: Final Data for 2011,’’ the National Center for Health Statistics’ annual report of final mortality statistics. Data in this report are based on information from all death certificates filed in the 50 states and the District of Columbia in 2011. Causes of death classified by the International Classification of Diseases, 10th Revision (ICD–10) are ranked according to the number of deaths assigned to rankable causes. Cause-of-death statistics are based on the underlying cause of death. In 2011, the 10 leading causes of death were, in rank order: Diseases of heart; Malignant neoplasms; Chronic lower respiratory diseases; Cerebrovascular diseases; Accidents (unintentional injuries); Alzheimer’s disease; Diabetes mellitus; Influenza and pneumonia; Nephritis, nephrotic syndrome and nephrosis; and Intentional self-harm (suicide). They accounted for 74% of all deaths occurring in the United States. Differences in the rankings are evident by age, sex, race, and Hispanic origin. Leading causes of infant death for 2011 were, in rank order: Congenital malformations, deformations and chromosomal abnormalities; Disorders related to short gestation and low birth weight, not elsewhere classified; Sudden infant death syndrome; Newborn affected by maternal complications of pregnancy; Accidents (unintentional injuries); Newborn affected by complications of placenta, cord and membranes; Bacterial sepsis of newborn; Respiratory distress of newborn; Diseases of the circulatory system; and Neonatal hemorrhage. Important variations in the leading causes of infant death are noted for the neonatal and postneonatal periods. All material appearing in this report is in the public domain and may be reproduced or copied without permission; citation as to source, however, is appreciated.

  18. MO-DE-207A-05: Dictionary Learning Based Reconstruction with Low-Rank Constraint for Low-Dose Spectral CT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Q; Stanford University School of Medicine, Stanford, CA; Liu, H

    Purpose: Spectral CT enabled by an energy-resolved photon-counting detector outperforms conventional CT in terms of material discrimination, contrast resolution, etc. One reconstruction method for spectral CT is to generate a color image from a reconstructed component in each energy channel. However, given the radiation dose, the number of photons in each channel is limited, which will result in strong noise in each channel and affect the final color reconstruction. Here we propose a novel dictionary learning method for spectral CT that combines dictionary-based sparse representation method and the patch based low-rank constraint to simultaneously improve the reconstruction in each channelmore » and to address the inter-channel correlations to further improve the reconstruction. Methods: The proposed method has two important features: (1) guarantee of the patch based sparsity in each energy channel, which is the result of the dictionary based sparse representation constraint; (2) the explicit consideration of the correlations among different energy channels, which is realized by patch-by-patch nuclear norm-based low-rank constraint. For each channel, the dictionary consists of two sub-dictionaries. One is learned from the average of the images in all energy channels, and the other is learned from the average of the images in all energy channels except the current channel. With average operation to reduce noise, these two dictionaries can effectively preserve the structural details and get rid of artifacts caused by noise. Combining them together can express all structural information in current channel. Results: Dictionary learning based methods can obtain better results than FBP and the TV-based method. With low-rank constraint, the image quality can be further improved in the channel with more noise. The final color result by the proposed method has the best visual quality. Conclusion: The proposed method can effectively improve the image quality of low-dose spectral CT. This work is partially supported by the National Natural Science Foundation of China (No. 61302136), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2014JQ8317).« less

  19. MRMC analysis of agreement studies

    NASA Astrophysics Data System (ADS)

    Gallas, Brandon D.; Anam, Amrita; Chen, Weijie; Wunderlich, Adam; Zhang, Zhiwei

    2016-03-01

    The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.

  20. Structured Matrix Completion with Applications to Genomic Data Integration.

    PubMed

    Cai, Tianxi; Cai, T Tony; Zhang, Anru

    2016-01-01

    Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

  1. Ranking the spreading ability of nodes in network core

    NASA Astrophysics Data System (ADS)

    Tong, Xiao-Lei; Liu, Jian-Guo; Wang, Jiang-Pan; Guo, Qiang; Ni, Jing

    2015-11-01

    Ranking nodes by their spreading ability in complex networks is of vital significance to better understand the network structure and more efficiently spread information. The k-shell decomposition method could identify the most influential nodes, namely network core, with the same ks values regardless to their different spreading influence. In this paper, we present an improved method based on the k-shell decomposition method and closeness centrality (CC) to rank the node spreading influence of the network core. Experiment results on the data from the scientific collaboration network and U.S. aviation network show that the accuracy of the presented method could be increased by 31% and 45% than the one obtained by the degree k, 32% and 31% than the one by the betweenness.

  2. Identification of robust statistical downscaling methods based on a comprehensive suite of performance metrics for South Korea

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Cannon, A. J.

    2015-12-01

    Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.

  3. Comparison of statistical methods for detection of serum lipid biomarkers for mesothelioma and asbestos exposure.

    PubMed

    Xu, Rengyi; Mesaros, Clementina; Weng, Liwei; Snyder, Nathaniel W; Vachani, Anil; Blair, Ian A; Hwang, Wei-Ting

    2017-07-01

    We compared three statistical methods in selecting a panel of serum lipid biomarkers for mesothelioma and asbestos exposure. Serum samples from mesothelioma, asbestos-exposed subjects and controls (40 per group) were analyzed. Three variable selection methods were considered: top-ranked predictors from univariate model, stepwise and least absolute shrinkage and selection operator. Crossed-validated area under the receiver operating characteristic curve was used to compare the prediction performance. Lipids with high crossed-validated area under the curve were identified. Lipid with mass-to-charge ratio of 372.31 was selected by all three methods comparing mesothelioma versus control. Lipids with mass-to-charge ratio of 1464.80 and 329.21 were selected by two models for asbestos exposure versus control. Different methods selected a similar set of serum lipids. Combining candidate biomarkers can improve prediction.

  4. An auxiliary method to reduce potential adverse impacts of projected land developments: subwatershed prioritization.

    PubMed

    Kalin, Latif; Hantush, Mohamed M

    2009-02-01

    An index based method is developed that ranks the subwatersheds of a watershed based on their relative impacts on watershed response to anticipated land developments, and then applied to an urbanizing watershed in Eastern Pennsylvania. Simulations with a semi-distributed hydrologic model show that computed low- and high-flow frequencies at the main outlet increase significantly with the projected landscape changes in the watershed. The developed index is utilized to prioritize areas in the urbanizing watershed based on their contributions to alterations in the magnitude of selected flow characteristics at two spatial resolutions. The low-flow measure, 7Q10, rankings are shown to mimic the spatial trend of groundwater recharge rates, whereas average annual maximum daily flow, QAMAX, and average monthly median of daily flows, QMMED, rankings are influenced by both recharge and proximity to watershed outlet. Results indicate that, especially with the higher resolution, areas having quicker responses are not necessarily the more critical areas for high-flow scenarios. Subwatershed rankings are shown to vary slightly with the location of water quality/quantity criteria enforcement. It is also found that rankings of subwatersheds upstream from the site of interest, which could be the main outlet or any interior point in the watershed, may be influenced by the time scale of the hydrologic processes.

  5. Seismic noise attenuation using an online subspace tracking algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Yatong; Li, Shuhua; Zhang, Dong; Chen, Yangkang

    2018-02-01

    We propose a new low-rank based noise attenuation method using an efficient algorithm for tracking subspaces from highly corrupted seismic observations. The subspace tracking algorithm requires only basic linear algebraic manipulations. The algorithm is derived by analysing incremental gradient descent on the Grassmannian manifold of subspaces. When the multidimensional seismic data are mapped to a low-rank space, the subspace tracking algorithm can be directly applied to the input low-rank matrix to estimate the useful signals. Since the subspace tracking algorithm is an online algorithm, it is more robust to random noise than traditional truncated singular value decomposition (TSVD) based subspace tracking algorithm. Compared with the state-of-the-art algorithms, the proposed denoising method can obtain better performance. More specifically, the proposed method outperforms the TSVD-based singular spectrum analysis method in causing less residual noise and also in saving half of the computational cost. Several synthetic and field data examples with different levels of complexities demonstrate the effectiveness and robustness of the presented algorithm in rejecting different types of noise including random noise, spiky noise, blending noise, and coherent noise.

  6. Validation of Endogenous Internal Real-Time PCR Controls in Renal Tissues

    PubMed Central

    Cui, Xiangqin; Zhou, Juling; Qiu, Jing; Johnson, Martin R.; Mrug, Michal

    2009-01-01

    Background Endogenous internal controls (‘reference’ or ‘housekeeping’ genes) are widely used in real-time PCR (RT-PCR) analyses. Their use relies on the premise of consistently stable expression across studied experimental conditions. Unfortunately, none of these controls fulfills this premise across a wide range of experimental conditions; consequently, none of them can be recommended for universal use. Methods To determine which endogenous RT-PCR controls are suitable for analyses of renal tissues altered by kidney disease, we studied the expression of 16 commonly used ‘reference genes’ in 7 mildly and 7 severely affected whole kidney tissues from a well-characterized cystic kidney disease model. Expression levels of these 16 genes, determined by TaqMan® RT-PCR analyses and Affymetrix GeneChip® arrays, were normalized and tested for overall variance and equivalence of the means. Results Both statistical approaches and both TaqMan- and GeneChip-based methods converged on 3 out of the 4 top-ranked genes (Ppia, Gapdh and Pgk1) that had the most constant expression levels across the studied phenotypes. Conclusion A combination of the top-ranked genes will provide a suitable endogenous internal control for similar studies of kidney tissues across a wide range of disease severity. PMID:19729889

  7. Unreported links between trial registrations and published articles were identified using document similarity measures in a cross-sectional analysis of ClinicalTrials.gov.

    PubMed

    Dunn, Adam G; Coiera, Enrico; Bourgeois, Florence T

    2018-03-01

    Trial registries can be used to measure reporting biases and support systematic reviews, but 45% of registrations do not provide a link to the article reporting on the trial. We evaluated the use of document similarity methods to identify unreported links between ClinicalTrials.gov and PubMed. We extracted terms and concepts from a data set of 72,469 ClinicalTrials.gov registrations and 276,307 PubMed articles and tested methods for ranking articles across 16,005 reported links and 90 manually identified unreported links. Performance was measured by the median rank of matching articles and the proportion of unreported links that could be found by screening ranked candidate articles in order. The best-performing concept-based representation produced a median rank of 3 (interquartile range [IQR] 1-21) for reported links and 3 (IQR 1-19) for the manually identified unreported links, and term-based representations produced a median rank of 2 (1-20) for reported links and 2 (IQR 1-12) in unreported links. The matching article was ranked first for 40% of registrations, and screening 50 candidate articles per registration identified 86% of the unreported links. Leveraging the growth in the corpus of reported links between ClinicalTrials.gov and PubMed, we found that document similarity methods can assist in the identification of unreported links between trial registrations and corresponding articles. Copyright © 2017 Elsevier Inc. All rights reserved.

  8. Annual Statistical Report of the Public School of Arkansas and Education Service Cooperatives

    ERIC Educational Resources Information Center

    Arkansas Department of Education, 2016

    2016-01-01

    In compliance with the provisions of A.C.A.§§6-20-2201 et seq., the Annual Statistical Report of the Public Schools of Arkansas, Public Charter Schools, and Education Service Cooperatives, 2014-2015 Actual and 2015-2016 Budgeted, (ASR) is presented here. The Rankings of Selected Items of the Public Schools of Arkansas, 2014-2015 Actual, (Rankings)…

  9. [Establishment of simultaneous measurement method of 8 salivary components using urinary test paper and clinical evaluation of oral environment].

    PubMed

    Yuuki, Kenji; Tsukasaki, Hiroaki; Kawawa, Tadaharu; Shiba, Akihiko; Shiba, Kiyoko

    2008-07-01

    Clinical findings were compared with glucose, protein, albumin, bilirubin, creatinine, pH, occult blood, ketone body, nitrite, and white blood cells contained in whole saliva to investigate the components that most markedly reflect the periodontal condition. The subjects were staff of the Prosthodontics Department, Showa University, and patients who visited for dental treatments (57 subjects in total). At the first time, saliva samples were gargled with 1.5 ml of distilled water for 15 seconds and collected by spitting out into a paper cup. At the second time, saliva samples were collected by the same method. At the third time, saliva samples after chewing paraffin gum for 60 seconds were collected by spitting out into a paper cup. Thus whole saliva collecting that was divided on three times. After sampling, 8 mul of the saliva sample was dripped in reagent sticks for the 10 items of urinary test paper and the reflectance was measured using a specific reflectometer. In the periodontal tissue evaluation, the degree of alveolar bone resorption, probing value, and tooth mobility and the presence or absence of lesions in the root furcation were examined and classified into 4 ranks. The mean values in each periodontal disease rank and correlation between the periodontal disease ranks and the components were statistically analyzed. Bilirubin and ketone body were not measurable. The components density of the 8 items was increased as the periodontal disease rank increased. Regarding the correlation between the periodontal disease ranks and the components, high correlations were noted for protein, albumin, creatinine, pH, and white blood cells. The simultaneous measurement method of 8 salivary components using test paper may be very useful for the diagnosis of periodontal disease of abutment teeth.

  10. Multivariate bias adjustment of high-dimensional climate simulations: the Rank Resampling for Distributions and Dependences (R2D2) bias correction

    NASA Astrophysics Data System (ADS)

    Vrac, Mathieu

    2018-06-01

    Climate simulations often suffer from statistical biases with respect to observations or reanalyses. It is therefore common to correct (or adjust) those simulations before using them as inputs into impact models. However, most bias correction (BC) methods are univariate and so do not account for the statistical dependences linking the different locations and/or physical variables of interest. In addition, they are often deterministic, and stochasticity is frequently needed to investigate climate uncertainty and to add constrained randomness to climate simulations that do not possess a realistic variability. This study presents a multivariate method of rank resampling for distributions and dependences (R2D2) bias correction allowing one to adjust not only the univariate distributions but also their inter-variable and inter-site dependence structures. Moreover, the proposed R2D2 method provides some stochasticity since it can generate as many multivariate corrected outputs as the number of statistical dimensions (i.e., number of grid cell × number of climate variables) of the simulations to be corrected. It is based on an assumption of stability in time of the dependence structure - making it possible to deal with a high number of statistical dimensions - that lets the climate model drive the temporal properties and their changes in time. R2D2 is applied on temperature and precipitation reanalysis time series with respect to high-resolution reference data over the southeast of France (1506 grid cell). Bivariate, 1506-dimensional and 3012-dimensional versions of R2D2 are tested over a historical period and compared to a univariate BC. How the different BC methods behave in a climate change context is also illustrated with an application to regional climate simulations over the 2071-2100 period. The results indicate that the 1d-BC basically reproduces the climate model multivariate properties, 2d-R2D2 is only satisfying in the inter-variable context, 1506d-R2D2 strongly improves inter-site properties and 3012d-R2D2 is able to account for both. Applications of the proposed R2D2 method to various climate datasets are relevant for many impact studies. The perspectives of improvements are numerous, such as introducing stochasticity in the dependence itself, questioning its stability assumption, and accounting for temporal properties adjustment while including more physics in the adjustment procedures.

  11. Hierarchical Gene Selection and Genetic Fuzzy System for Cancer Microarray Data Classification

    PubMed Central

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice. PMID:25823003

  12. Hierarchical gene selection and genetic fuzzy system for cancer microarray data classification.

    PubMed

    Nguyen, Thanh; Khosravi, Abbas; Creighton, Douglas; Nahavandi, Saeid

    2015-01-01

    This paper introduces a novel approach to gene selection based on a substantial modification of analytic hierarchy process (AHP). The modified AHP systematically integrates outcomes of individual filter methods to select the most informative genes for microarray classification. Five individual ranking methods including t-test, entropy, receiver operating characteristic (ROC) curve, Wilcoxon and signal to noise ratio are employed to rank genes. These ranked genes are then considered as inputs for the modified AHP. Additionally, a method that uses fuzzy standard additive model (FSAM) for cancer classification based on genes selected by AHP is also proposed in this paper. Traditional FSAM learning is a hybrid process comprising unsupervised structure learning and supervised parameter tuning. Genetic algorithm (GA) is incorporated in-between unsupervised and supervised training to optimize the number of fuzzy rules. The integration of GA enables FSAM to deal with the high-dimensional-low-sample nature of microarray data and thus enhance the efficiency of the classification. Experiments are carried out on numerous microarray datasets. Results demonstrate the performance dominance of the AHP-based gene selection against the single ranking methods. Furthermore, the combination of AHP-FSAM shows a great accuracy in microarray data classification compared to various competing classifiers. The proposed approach therefore is useful for medical practitioners and clinicians as a decision support system that can be implemented in the real medical practice.

  13. MoleculaRnetworks: an integrated graph theoretic and data mining tool to explore solvent organization in molecular simulation.

    PubMed

    Mooney, Barbara Logan; Corrales, L René; Clark, Aurora E

    2012-03-30

    This work discusses scripts for processing molecular simulations data written using the software package R: A Language and Environment for Statistical Computing. These scripts, named moleculaRnetworks, are intended for the geometric and solvent network analysis of aqueous solutes and can be extended to other H-bonded solvents. New algorithms, several of which are based on graph theory, that interrogate the solvent environment about a solute are presented and described. This includes a novel method for identifying the geometric shape adopted by the solvent in the immediate vicinity of the solute and an exploratory approach for describing H-bonding, both based on the PageRank algorithm of Google search fame. The moleculaRnetworks codes include a preprocessor, which distills simulation trajectories into physicochemical data arrays, and an interactive analysis script that enables statistical, trend, and correlation analysis, and other data mining. The goal of these scripts is to increase access to the wealth of structural and dynamical information that can be obtained from molecular simulations. Copyright © 2012 Wiley Periodicals, Inc.

  14. Statistical methods for astronomical data with upper limits. II - Correlation and regression

    NASA Technical Reports Server (NTRS)

    Isobe, T.; Feigelson, E. D.; Nelson, P. I.

    1986-01-01

    Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.

  15. Zipf's law holds for phrases, not words.

    PubMed

    Williams, Jake Ryland; Lessard, Paul R; Desu, Suma; Clark, Eric M; Bagrow, James P; Danforth, Christopher M; Dodds, Peter Sheridan

    2015-08-11

    With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.

  16. Mapping remote and multidisciplinary learning barriers: lessons from challenge-based innovation at CERN

    NASA Astrophysics Data System (ADS)

    Jensen, Matilde Bisballe; Utriainen, Tuuli Maria; Steinert, Martin

    2018-01-01

    This paper presents the experienced difficulties of students participating in the multidisciplinary, remote collaborating engineering design course challenge-based innovation at CERN. This is with the aim to identify learning barriers and improve future learning experiences. We statistically analyse the rated differences between distinct design activities, educational background and remote vs. co-located collaboration. The analysis is based on a quantitative and qualitative questionnaire (N = 37). Our analysis found significant ranking differences between remote and co-located activities. This questions whether the remote factor might be a barrier for the originally intended learning goals. Further a correlation between analytical and converging design phases was identified. Hence, future facilitators are suggested to help students in the transition from one design phase to the next rather than only teaching methods in the individual design phases. Finally, we discuss how educators address the identified learning barriers when designing future courses including multidisciplinary or remote collaboration.

  17. Chimpanzee females queue but males compete for social status

    PubMed Central

    Foerster, Steffen; Franz, Mathias; Murray, Carson M.; Gilby, Ian C.; Feldblum, Joseph T.; Walker, Kara K.; Pusey, Anne E.

    2016-01-01

    Dominance hierarchies are widespread in animal social groups and often have measureable effects on individual health and reproductive success. Dominance ranks are not static individual attributes, however, but instead are influenced by two independent processes: 1) changes in hierarchy membership and 2) successful challenges of higher-ranking individuals. Understanding which of these processes dominates the dynamics of rank trajectories can provide insights into fitness benefits of within-sex competition. This question has yet to be examined systematically in a wide range of taxa due to the scarcity of long-term data and a lack of appropriate methodologies for distinguishing between alternative causes of rank changes over time. Here, we expand on recent work and develop a new likelihood-based Elo rating method that facilitates the systematic assessment of rank dynamics in animal social groups, even when interaction data are sparse. We apply this method to characterize long-term rank trajectories in wild eastern chimpanzees (Pan troglodytes schweinfurthii) and find remarkable sex differences in rank dynamics, indicating that females queue for social status while males actively challenge each other to rise in rank. Further, our results suggest that natal females obtain a head start in the rank queue if they avoid dispersal, with potential fitness benefits. PMID:27739527

  18. A social choice-based methodology for treated wastewater reuse in urban and suburban areas.

    PubMed

    Mahjouri, Najmeh; Pourmand, Ehsan

    2017-07-01

    Reusing treated wastewater for supplying water demands such as landscape and agricultural irrigation in urban and suburban areas has become a major water supply approach especially in regions struggling with water shortage. Due to limited available treated wastewater to satisfy all water demands, conflicts may arise in allocating treated wastewater to water users. Since there is usually more than one decision maker and more than one criterion to measure the impact of each water allocation scenario, effective tools are needed to combine individual preferences to reach a collective decision. In this paper, a new social choice (SC) method, which can consider some indifference thresholds for decision makers, is proposed for evaluating and ranking treated wastewater and urban runoff allocation scenarios to water users in urban and suburban areas. Some SC methods, namely plurality voting, Borda count, pairwise comparisons, Hare system, dictatorship, and approval voting, are applied for comparing and evaluating the results. Different scenarios are proposed for allocating treated wastewater and urban runoff to landscape irrigation, agricultural lands as well as artificial recharge of aquifer in the Tehran metropolitan Area, Iran. The main stakeholders rank the proposed scenarios based on their utilities using two different approaches. The proposed method suggests ranking of the scenarios based on the stakeholders' utilities and considering the scores they assigned to each scenario. Comparing the results of the proposed method with those of six different SC methods shows that the obtained ranks are mostly in compliance with the social welfare.

  19. Interval MULTIMOORA method with target values of attributes based on interval distance and preference degree: biomaterials selection

    NASA Astrophysics Data System (ADS)

    Hafezalkotob, Arian; Hafezalkotob, Ashkan

    2017-06-01

    A target-based MADM method covers beneficial and non-beneficial attributes besides target values for some attributes. Such techniques are considered as the comprehensive forms of MADM approaches. Target-based MADM methods can also be used in traditional decision-making problems in which beneficial and non-beneficial attributes only exist. In many practical selection problems, some attributes have given target values. The values of decision matrix and target-based attributes can be provided as intervals in some of such problems. Some target-based decision-making methods have recently been developed; however, a research gap exists in the area of MADM techniques with target-based attributes under uncertainty of information. We extend the MULTIMOORA method for solving practical material selection problems in which material properties and their target values are given as interval numbers. We employ various concepts of interval computations to reduce degeneration of uncertain data. In this regard, we use interval arithmetic and introduce innovative formula for interval distance of interval numbers to create interval target-based normalization technique. Furthermore, we use a pairwise preference matrix based on the concept of degree of preference of interval numbers to calculate the maximum, minimum, and ranking of these numbers. Two decision-making problems regarding biomaterials selection of hip and knee prostheses are discussed. Preference degree-based ranking lists for subordinate parts of the extended MULTIMOORA method are generated by calculating the relative degrees of preference for the arranged assessment values of the biomaterials. The resultant rankings for the problem are compared with the outcomes of other target-based models in the literature.

  20. Constrained Low-Rank Learning Using Least Squares-Based Regularization.

    PubMed

    Li, Ping; Yu, Jun; Wang, Meng; Zhang, Luming; Cai, Deng; Li, Xuelong

    2017-12-01

    Low-rank learning has attracted much attention recently due to its efficacy in a rich variety of real-world tasks, e.g., subspace segmentation and image categorization. Most low-rank methods are incapable of capturing low-dimensional subspace for supervised learning tasks, e.g., classification and regression. This paper aims to learn both the discriminant low-rank representation (LRR) and the robust projecting subspace in a supervised manner. To achieve this goal, we cast the problem into a constrained rank minimization framework by adopting the least squares regularization. Naturally, the data label structure tends to resemble that of the corresponding low-dimensional representation, which is derived from the robust subspace projection of clean data by low-rank learning. Moreover, the low-dimensional representation of original data can be paired with some informative structure by imposing an appropriate constraint, e.g., Laplacian regularizer. Therefore, we propose a novel constrained LRR method. The objective function is formulated as a constrained nuclear norm minimization problem, which can be solved by the inexact augmented Lagrange multiplier algorithm. Extensive experiments on image classification, human pose estimation, and robust face recovery have confirmed the superiority of our method.

  1. Fabric defect detection based on visual saliency using deep feature and low-rank recovery

    NASA Astrophysics Data System (ADS)

    Liu, Zhoufeng; Wang, Baorui; Li, Chunlei; Li, Bicao; Dong, Yan

    2018-04-01

    Fabric defect detection plays an important role in improving the quality of fabric product. In this paper, a novel fabric defect detection method based on visual saliency using deep feature and low-rank recovery was proposed. First, unsupervised training is carried out by the initial network parameters based on MNIST large datasets. The supervised fine-tuning of fabric image library based on Convolutional Neural Networks (CNNs) is implemented, and then more accurate deep neural network model is generated. Second, the fabric images are uniformly divided into the image block with the same size, then we extract their multi-layer deep features using the trained deep network. Thereafter, all the extracted features are concentrated into a feature matrix. Third, low-rank matrix recovery is adopted to divide the feature matrix into the low-rank matrix which indicates the background and the sparse matrix which indicates the salient defect. In the end, the iterative optimal threshold segmentation algorithm is utilized to segment the saliency maps generated by the sparse matrix to locate the fabric defect area. Experimental results demonstrate that the feature extracted by CNN is more suitable for characterizing the fabric texture than the traditional LBP, HOG and other hand-crafted features extraction method, and the proposed method can accurately detect the defect regions of various fabric defects, even for the image with complex texture.

  2. A Novel Riemannian Metric Based on Riemannian Structure and Scaling Information for Fixed Low-Rank Matrix Completion.

    PubMed

    Mao, Shasha; Xiong, Lin; Jiao, Licheng; Feng, Tian; Yeung, Sai-Kit

    2017-05-01

    Riemannian optimization has been widely used to deal with the fixed low-rank matrix completion problem, and Riemannian metric is a crucial factor of obtaining the search direction in Riemannian optimization. This paper proposes a new Riemannian metric via simultaneously considering the Riemannian geometry structure and the scaling information, which is smoothly varying and invariant along the equivalence class. The proposed metric can make a tradeoff between the Riemannian geometry structure and the scaling information effectively. Essentially, it can be viewed as a generalization of some existing metrics. Based on the proposed Riemanian metric, we also design a Riemannian nonlinear conjugate gradient algorithm, which can efficiently solve the fixed low-rank matrix completion problem. By experimenting on the fixed low-rank matrix completion, collaborative filtering, and image and video recovery, it illustrates that the proposed method is superior to the state-of-the-art methods on the convergence efficiency and the numerical performance.

  3. "URM candidates are encouraged to apply": a national study to identify effective strategies to enhance racial and ethnic faculty diversity in academic departments of medicine.

    PubMed

    Peek, Monica E; Kim, Karen E; Johnson, Julie K; Vela, Monica B

    2013-03-01

    There is little evidence regarding which factors and strategies are associated with high proportions of underrepresented minority (URM) faculty in academic medicine. The authors conducted a national study of U.S. academic medicine departments to better understand the challenges, successful strategies, and predictive factors for enhancing racial and ethnic diversity among faculty (i.e., physicians with an academic position or rank). This was a mixed-methods study using quantitative and qualitative methods. The authors conducted a cross-sectional study of eligible departments of medicine in 125 accredited U.S. medical schools, dichotomized into low-URM (bottom 50%) versus high-URM rank (top 50%). They used t tests and chi-squared tests to compare departments by geographic region, academic school rank, city type, and composite measures of "diversity best practices." The authors also conducted semistructured in-depth interviews with a subsample from the highest- and lowest-quartile medical schools in terms of URM rank. Eighty-two medical schools responded (66%). Geographic region and academic rank were statistically associated with URM rank, but not city type or composite measures of diversity best practices. Key themes emerged from interviews regarding successful strategies for URM faculty recruitment and retention, including institutional leadership, the use of human capital and social relationships, and strategic deployment of institutional resources. Departments of medicine with high proportions of URM faculty employ a number of successful strategies and programs for recruitment and retention. More research is warranted to identify new successful strategies and to determine the impact of specific strategies on establishing and maintaining workforce diversity.

  4. Finding minimum gene subsets with heuristic breadth-first search algorithm for robust tumor classification

    PubMed Central

    2012-01-01

    Background Previous studies on tumor classification based on gene expression profiles suggest that gene selection plays a key role in improving the classification performance. Moreover, finding important tumor-related genes with the highest accuracy is a very important task because these genes might serve as tumor biomarkers, which is of great benefit to not only tumor molecular diagnosis but also drug development. Results This paper proposes a novel gene selection method with rich biomedical meaning based on Heuristic Breadth-first Search Algorithm (HBSA) to find as many optimal gene subsets as possible. Due to the curse of dimensionality, this type of method could suffer from over-fitting and selection bias problems. To address these potential problems, a HBSA-based ensemble classifier is constructed using majority voting strategy from individual classifiers constructed by the selected gene subsets, and a novel HBSA-based gene ranking method is designed to find important tumor-related genes by measuring the significance of genes using their occurrence frequencies in the selected gene subsets. The experimental results on nine tumor datasets including three pairs of cross-platform datasets indicate that the proposed method can not only obtain better generalization performance but also find many important tumor-related genes. Conclusions It is found that the frequencies of the selected genes follow a power-law distribution, indicating that only a few top-ranked genes can be used as potential diagnosis biomarkers. Moreover, the top-ranked genes leading to very high prediction accuracy are closely related to specific tumor subtype and even hub genes. Compared with other related methods, the proposed method can achieve higher prediction accuracy with fewer genes. Moreover, they are further justified by analyzing the top-ranked genes in the context of individual gene function, biological pathway, and protein-protein interaction network. PMID:22830977

  5. Quantifying prognosis with risk predictions.

    PubMed

    Pace, Nathan L; Eberhart, Leopold H J; Kranke, Peter R

    2012-01-01

    Prognosis is a forecast, based on present observations in a patient, of their probable outcome from disease, surgery and so on. Research methods for the development of risk probabilities may not be familiar to some anaesthesiologists. We briefly describe methods for identifying risk factors and risk scores. A probability prediction rule assigns a risk probability to a patient for the occurrence of a specific event. Probability reflects the continuum between absolute certainty (Pi = 1) and certified impossibility (Pi = 0). Biomarkers and clinical covariates that modify risk are known as risk factors. The Pi as modified by risk factors can be estimated by identifying the risk factors and their weighting; these are usually obtained by stepwise logistic regression. The accuracy of probabilistic predictors can be separated into the concepts of 'overall performance', 'discrimination' and 'calibration'. Overall performance is the mathematical distance between predictions and outcomes. Discrimination is the ability of the predictor to rank order observations with different outcomes. Calibration is the correctness of prediction probabilities on an absolute scale. Statistical methods include the Brier score, coefficient of determination (Nagelkerke R2), C-statistic and regression calibration. External validation is the comparison of the actual outcomes to the predicted outcomes in a new and independent patient sample. External validation uses the statistical methods of overall performance, discrimination and calibration and is uniformly recommended before acceptance of the prediction model. Evidence from randomised controlled clinical trials should be obtained to show the effectiveness of risk scores for altering patient management and patient outcomes.

  6. Innovations in bonding to zirconia based ceramics: Part III. Phosphate monomer resin cements.

    PubMed

    Mirmohammadi, Hesam; Aboushelib, Moustafa N M; Salameh, Ziad; Feilzer, Albert J; Kleverlaan, Cornelis J

    2010-08-01

    To compare the bond strength values and the ranking order of three phosphate monomer containing resin cements using microtensile (microTBS) and microshear (microSBS) bond strength tests. Zirconia discs (Procera Zirconia) were bonded to resin composite discs (Filtek Z250) using three different cements (Panavia F 2.0, RelyX UniCem, and Multilink). Two bond strength tests were used to determine zirconia resin bond strength; microtensile bond strength test (microTBS) and microshear bond strength test (microSBS). Ten specimens were tested for each group (n=10). Two-way analysis of variance (ANOVA) was used to analyze the data (alpha=0.05). There were statistical significant differences in bond strength values and in the ranking order obtained using the two test methods. microTBS reported significant differences in bond strength values, whereas microSBS failed to detect such effect. Both Multilink and Panavia demonstrated basically cohesive failure in the resin cement while RelyX UniCem demonstrated interfacial failure. Based on the findings of this study, the data obtained using either microTBS or microSBS could not be directly compared. microTBS was more sensitive to material differences compared to microSBS which failed to detect such differences. Copyright 2010 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.

  7. SCD-HeFT: Use of RR Interval Statistics for Long-term Risk Stratification for Arrhythmic Sudden Cardiac Death

    PubMed Central

    Au-yeung, Wan-tai M.; Reinhall, Per; Poole, Jeanne E.; Anderson, Jill; Johnson, George; Fletcher, Ross D.; Moore, Hans J.; Mark, Daniel B.; Lee, Kerry L.; Bardy, Gust H.

    2015-01-01

    Background In the SCD-HeFT a significant fraction of the congestive heart failure (CHF) patients ultimately did not die suddenly from arrhythmic causes. CHF patients will benefit from better tools to identify if ICD therapy is needed. Objective To identify predictor variables from baseline SCD-HeFT patients’ RR intervals that correlate to arrhythmic sudden cardiac death (SCD) and mortality and to design an ICD therapy screening test. Methods Ten predictor variables were extracted from pre-randomization Holter data from 475 patients enrolled in the SCD-HeFT ICD arm using novel and traditional heart rate variability methods. All variables were correlated to SCD using Mann Whitney-Wilcoxon test and receiver operating characteristic analysis. ICD therapy screening tests were designed by minimizing the cost of false classifications. Survival analysis, including log-rank test and Cox models, was also performed. Results α1 and α2 from detrended fluctuation analysis, the ratio of low to high frequency power, the number of PVCs per hour and heart rate turbulence slope are all statistically significant for predicting the occurrences of SCD (p<0.001) and survival (log-rank p<0.01). The most powerful multivariate predictor tool using the Cox Proportional Hazards was α2 with a hazard ratio of 0.0465 (95% CI: 0.00528 – 0.409, p<0.01). Conclusion Predictor variables from RR intervals correlate to the occurrences of SCD and distinguish survival among SCD-HeFT ICD patients. We believe SCD prediction models should incorporate Holter based RR interval analysis to refine ICD patient selection especially in removing patients who are unlikely to benefit from ICD therapy. PMID:26096609

  8. Three-Dimensional Eyeball and Orbit Volume Modification After LeFort III Midface Distraction.

    PubMed

    Smektala, Tomasz; Nysjö, Johan; Thor, Andreas; Homik, Aleksandra; Sporniak-Tutak, Katarzyna; Safranow, Krzysztof; Dowgierd, Krzysztof; Olszewski, Raphael

    2015-07-01

    The aim of our study was to evaluate orbital volume modification with LeFort III midface distraction in patients with craniosynostosis and its influence on eyeball volume and axial diameter modification. Orbital volume was assessed by the semiautomatic segmentation method based on deformable surface models and on 3-dimensional (3D) interaction with haptics. The eyeball volumes and diameters were automatically calculated after manual segmentation of computed tomographic scans with 3D slicer software. The mean, minimal, and maximal differences as well as the standard deviation and intraclass correlation coefficient (ICC) for intraobserver and interobserver measurements reliability were calculated. The Wilcoxon signed rank test was used to compare measured values before and after surgery. P < 0.05 was considered statistically significant. Intraobserver and interobserver ICC for haptic-aided semiautomatic orbital volume measurements were 0.98 and 0.99, respectively. The intraobserver and interobserver ICC values for manual segmentation of the eyeball volume were 0.87 and 0.86, respectively. The orbital volume increased significantly after surgery: 30.32% (mean, 5.96  mL) for the left orbit and 31.04% (mean, 6.31  mL) for the right orbit. The mean increase in eyeball volume was 12.3%. The mean increases in the eyeball axial dimensions were 7.3%, 9.3%, and 4.4% for the X-, Y-, and Z-axes, respectively. The Wilcoxon signed rank test showed that preoperative and postoperative eyeball volumes, as well as the diameters along the X- and Y-axes, were statistically significant. Midface distraction in patients with syndromic craniostenosis results in a significant increase (P < 0.05) in the orbit and eyeball volumes. The 2 methods (haptic-aided semiautomatic segmentation and manual 3D slicer segmentation) are reproducible techniques for orbit and eyeball volume measurements.

  9. Randomization in cancer clinical trials: permutation test and development of a computer program.

    PubMed Central

    Ohashi, Y

    1990-01-01

    When analyzing cancer clinical trial data where the treatment allocation is done using dynamic balancing methods such as the minimization method for balancing the distribution of important prognostic factors in each arm, conservativeness occurs if such a randomization scheme is ignored and a simple unstratified analysis is carried out. In this paper, the above conservativeness is demonstrated by computer simulation, and the development of a computer program that carries out permutation tests of the log-rank statistics for clinical trial data where the allocation is done by the minimization method or a stratified permuted block design is introduced. We are planning to use this program in practice to supplement a usual stratified analysis and model-based methods such as the Cox regression. The most serious problem in cancer clinical trials in Japan is how to carry out the quality control or data management in trials that are initiated and conducted by researchers without support from pharmaceutical companies. In the final section of this paper, one international collaborative work for developing international guidelines on data management in clinical trials of bladder cancer is briefly introduced, and the differences between the system adopted in US/European statistical centers and the Japanese system is described. PMID:2269216

  10. On standardization of low symmetry crystal fields

    NASA Astrophysics Data System (ADS)

    Gajek, Zbigniew

    2015-07-01

    Standardization methods of low symmetry - orthorhombic, monoclinic and triclinic - crystal fields are formulated and discussed. Two alternative approaches are presented, the conventional one, based on the second-rank parameters and the standardization based on the fourth-rank parameters. Mainly f-electron systems are considered but some guidelines for d-electron systems and the spin Hamiltonian describing the zero-field splitting are given. The discussion focuses on premises for choosing the most suitable method, in particular on inadequacy of the conventional one. Few examples from the literature illustrate this situation.

  11. "I like talking to people on the computer": Outcomes of a home-based intervention to develop social media skills in youth with disabilities living in rural communities.

    PubMed

    Raghavendra, Parimala; Hutchinson, Claire; Grace, Emma; Wood, Denise; Newman, Lareen

    2018-05-01

    To investigate the effectiveness of a home-based social media use intervention to enhance the social networks of rural youth with disabilities. Participants were nine youth (mean age = 17.0 years) with disabilities from two rural Australian communities. The intervention consisted of providing appropriate assistive technology and social media training on individualised goals. Using mixed methods, quantitative (a single group pre-post) and qualitative (interviews with participants and their carers) measures were used to examine outcomes of training, individual experiences of the intervention, and changes to online social networks. Participants increased their performance and satisfaction with performance on social media problem areas post-intervention; paired t-tests showed statistical significance at p < .001. There was also a significant increase in the number of online communication partners; Wilcoxon Signed Ranks showed statistical significance at p < .05. The interviews highlighted increased social participation, independence and improvements to literacy. Ongoing parental concerns regarding cyber safety and inappropriate online content were noted. The findings suggest that social media training is a feasible method for increasing social networks among rural-based youth with disabilities. To sustain ongoing benefits, parents need knowledge and training in integrating assistive technology and social media. Copyright © 2018 Elsevier Ltd. All rights reserved.

  12. Wikipedia ranking of world universities

    NASA Astrophysics Data System (ADS)

    Lages, José; Patt, Antoine; Shepelyansky, Dima L.

    2016-03-01

    We use the directed networks between articles of 24 Wikipedia language editions for producing the wikipedia ranking of world Universities (WRWU) using PageRank, 2DRank and CheiRank algorithms. This approach allows to incorporate various cultural views on world universities using the mathematical statistical analysis independent of cultural preferences. The Wikipedia ranking of top 100 universities provides about 60% overlap with the Shanghai university ranking demonstrating the reliable features of this approach. At the same time WRWU incorporates all knowledge accumulated at 24 Wikipedia editions giving stronger highlights for historically important universities leading to a different estimation of efficiency of world countries in university education. The historical development of university ranking is analyzed during ten centuries of their history.

  13. Measuring hospital performance in congenital heart surgery: administrative versus clinical registry data.

    PubMed

    Pasquali, Sara K; He, Xia; Jacobs, Jeffrey P; Jacobs, Marshall L; Gaies, Michael G; Shah, Samir S; Hall, Matthew; Gaynor, J William; Peterson, Eric D; Mayer, John E; Hirsch-Romano, Jennifer C

    2015-03-01

    In congenital heart surgery, hospital performance has historically been assessed using widely available administrative data sets. Recent studies have demonstrated inaccuracies in case ascertainment (coding and inclusion of eligible cases) in administrative versus clinical registry data; however, it is unclear whether this impacts assessment of performance on a hospital level. Merged data from The Society of Thoracic Surgeons (STS) database (clinical registry) and the Pediatric Health Information Systems (PHIS) database (administrative data set) for 46,056 children undergoing cardiac operations (2006-2010) were used to evaluate in-hospital mortality for 33 hospitals based on their administrative versus registry data. Standard methods to identify/classify cases were used: Risk Adjustment in Congenital Heart Surgery, version 1 (RACHS-1) in the administrative data and STS-European Association for Cardiothoracic Surgery (STAT) methodology in the registry. Median hospital surgical volume based on the registry data was 269 cases per year; mortality was 2.9%. Hospital volumes and mortality rates based on the administrative data were on average 10.7% and 4.7% lower, respectively, although this varied widely across hospitals. Hospital rankings for mortality based on the administrative versus registry data differed by 5 or more rank positions for 24% of hospitals, with a change in mortality tertile classification (high, middle, or low mortality) for 18% and a change in statistical outlier classification for 12%. Higher volume/complexity hospitals were most impacted. Agency for Healthcare Quality and Research (AHRQ) methods in the administrative data yielded similar results. Inaccuracies in case ascertainment in administrative versus clinical registry data can lead to important differences in assessment of hospital mortality rates for congenital heart surgery. Copyright © 2015 The Society of Thoracic Surgeons. Published by Elsevier Inc. All rights reserved.

  14. Rankings & Estimates: Rankings of the States 2010 and Estimates of School Statistics 2011

    ERIC Educational Resources Information Center

    National Education Association Research Department, 2010

    2010-01-01

    The data presented in this combined report--"Rankings & Estimates"--provide facts about the extent to which local, state, and national governments commit resources to public education. As one might expect in a nation as diverse as the United States--with respect to economics, geography, and politics--the level of commitment to…

  15. Rankings & Estimates: Rankings of the States 2015 and Estimates of School Statistics 2016

    ERIC Educational Resources Information Center

    National Education Association, 2016

    2016-01-01

    The data presented in this combined report--"Rankings & Estimates"--provide facts about the extent to which local, state, and national governments commit resources to public education. As one might expect in a nation as diverse as the United States--with respect to economics, geography, and politics--the level of commitment to…

  16. How Variances in Business School Rankings Affect Enrollment Trends and Practices

    ERIC Educational Resources Information Center

    De Veyga, Guillermo A.

    2016-01-01

    This study examined the effect that variances in the"U.S. News & World Report" rankings have on enrollment trends and practices in both top and non-top 25 business schools. The purpose of this study was to determine whether mobility in the rankings was met with a statistically significant response to the research questions presented.…

  17. Estimation of the chemical rank for the three-way data: a principal norm vector orthogonal projection approach.

    PubMed

    Hong-Ping, Xie; Jian-Hui, Jiang; Guo-Li, Shen; Ru-Qin, Yu

    2002-01-01

    A new approach for estimating the chemical rank of the three-way array called the principal norm vector orthogonal projection method has been proposed. The method is based on the fact that the chemical rank of the three-way data array is equal to one of the column space of the unfolded matrix along the spectral or chromatographic mode. A vector with maximum Frobenius norm is selected among all the column vectors of the unfolded matrix as the principal norm vector (PNV). A transformation is conducted for the column vectors with an orthogonal projection matrix formulated by PNV. The mathematical rank of the column space of the residual matrix thus obtained should decrease by one. Such orthogonal projection is carried out repeatedly till the contribution of chemical species to the signal data is all deleted. At this time the decrease of the mathematical rank would equal that of the chemical rank, and the remaining residual subspace would entirely be due to the noise contribution. The chemical rank can be estimated easily by using an F-test. The method has been used successfully to the simulated HPLC-DAD type three-way data array and two real excitation-emission fluorescence data sets of amino acid mixtures and dye mixtures. The simulation with added relatively high level noise shows that the method is robust in resisting the heteroscedastic noise. The proposed algorithm is simple and easy to program with quite light computational burden.

  18. An empirical comparison of methods for analyzing correlated data from a discrete choice survey to elicit patient preference for colorectal cancer screening

    PubMed Central

    2012-01-01

    Background A discrete choice experiment (DCE) is a preference survey which asks participants to make a choice among product portfolios comparing the key product characteristics by performing several choice tasks. Analyzing DCE data needs to account for within-participant correlation because choices from the same participant are likely to be similar. In this study, we empirically compared some commonly-used statistical methods for analyzing DCE data while accounting for within-participant correlation based on a survey of patient preference for colorectal cancer (CRC) screening tests conducted in Hamilton, Ontario, Canada in 2002. Methods A two-stage DCE design was used to investigate the impact of six attributes on participants' preferences for CRC screening test and willingness to undertake the test. We compared six models for clustered binary outcomes (logistic and probit regressions using cluster-robust standard error (SE), random-effects and generalized estimating equation approaches) and three models for clustered nominal outcomes (multinomial logistic and probit regressions with cluster-robust SE and random-effects multinomial logistic model). We also fitted a bivariate probit model with cluster-robust SE treating the choices from two stages as two correlated binary outcomes. The rank of relative importance between attributes and the estimates of β coefficient within attributes were used to assess the model robustness. Results In total 468 participants with each completing 10 choices were analyzed. Similar results were reported for the rank of relative importance and β coefficients across models for stage-one data on evaluating participants' preferences for the test. The six attributes ranked from high to low as follows: cost, specificity, process, sensitivity, preparation and pain. However, the results differed across models for stage-two data on evaluating participants' willingness to undertake the tests. Little within-patient correlation (ICC ≈ 0) was found in stage-one data, but substantial within-patient correlation existed (ICC = 0.659) in stage-two data. Conclusions When small clustering effect presented in DCE data, results remained robust across statistical models. However, results varied when larger clustering effect presented. Therefore, it is important to assess the robustness of the estimates via sensitivity analysis using different models for analyzing clustered data from DCE studies. PMID:22348526

  19. Statistical methods of fracture characterization using acoustic borehole televiewer log interpretation

    NASA Astrophysics Data System (ADS)

    Massiot, Cécile; Townend, John; Nicol, Andrew; McNamara, David D.

    2017-08-01

    Acoustic borehole televiewer (BHTV) logs provide measurements of fracture attributes (orientations, thickness, and spacing) at depth. Orientation, censoring, and truncation sampling biases similar to those described for one-dimensional outcrop scanlines, and other logging or drilling artifacts specific to BHTV logs, can affect the interpretation of fracture attributes from BHTV logs. K-means, fuzzy K-means, and agglomerative clustering methods provide transparent means of separating fracture groups on the basis of their orientation. Fracture spacing is calculated for each of these fracture sets. Maximum likelihood estimation using truncated distributions permits the fitting of several probability distributions to the fracture attribute data sets within truncation limits, which can then be extrapolated over the entire range where they naturally occur. Akaike Information Criterion (AIC) and Schwartz Bayesian Criterion (SBC) statistical information criteria rank the distributions by how well they fit the data. We demonstrate these attribute analysis methods with a data set derived from three BHTV logs acquired from the high-temperature Rotokawa geothermal field, New Zealand. Varying BHTV log quality reduces the number of input data points, but careful selection of the quality levels where fractures are deemed fully sampled increases the reliability of the analysis. Spacing data analysis comprising up to 300 data points and spanning three orders of magnitude can be approximated similarly well (similar AIC rankings) with several distributions. Several clustering configurations and probability distributions can often characterize the data at similar levels of statistical criteria. Thus, several scenarios should be considered when using BHTV log data to constrain numerical fracture models.

  20. Adaptive low-rank subspace learning with online optimization for robust visual tracking.

    PubMed

    Liu, Risheng; Wang, Di; Han, Yuzhuo; Fan, Xin; Luo, Zhongxuan

    2017-04-01

    In recent years, sparse and low-rank models have been widely used to formulate appearance subspace for visual tracking. However, most existing methods only consider the sparsity or low-rankness of the coefficients, which is not sufficient enough for appearance subspace learning on complex video sequences. Moreover, as both the low-rank and the column sparse measures are tightly related to all the samples in the sequences, it is challenging to incrementally solve optimization problems with both nuclear norm and column sparse norm on sequentially obtained video data. To address above limitations, this paper develops a novel low-rank subspace learning with adaptive penalization (LSAP) framework for subspace based robust visual tracking. Different from previous work, which often simply decomposes observations as low-rank features and sparse errors, LSAP simultaneously learns the subspace basis, low-rank coefficients and column sparse errors to formulate appearance subspace. Within LSAP framework, we introduce a Hadamard production based regularization to incorporate rich generative/discriminative structure constraints to adaptively penalize the coefficients for subspace learning. It is shown that such adaptive penalization can significantly improve the robustness of LSAP on severely corrupted dataset. To utilize LSAP for online visual tracking, we also develop an efficient incremental optimization scheme for nuclear norm and column sparse norm minimizations. Experiments on 50 challenging video sequences demonstrate that our tracker outperforms other state-of-the-art methods. Copyright © 2017 Elsevier Ltd. All rights reserved.

  1. A new fast direct solver for the boundary element method

    NASA Astrophysics Data System (ADS)

    Huang, S.; Liu, Y. J.

    2017-09-01

    A new fast direct linear equation solver for the boundary element method (BEM) is presented in this paper. The idea of the new fast direct solver stems from the concept of the hierarchical off-diagonal low-rank matrix. The hierarchical off-diagonal low-rank matrix can be decomposed into the multiplication of several diagonal block matrices. The inverse of the hierarchical off-diagonal low-rank matrix can be calculated efficiently with the Sherman-Morrison-Woodbury formula. In this paper, a more general and efficient approach to approximate the coefficient matrix of the BEM with the hierarchical off-diagonal low-rank matrix is proposed. Compared to the current fast direct solver based on the hierarchical off-diagonal low-rank matrix, the proposed method is suitable for solving general 3-D boundary element models. Several numerical examples of 3-D potential problems with the total number of unknowns up to above 200,000 are presented. The results show that the new fast direct solver can be applied to solve large 3-D BEM models accurately and with better efficiency compared with the conventional BEM.

  2. A new method of optimal capacitor switching based on minimum spanning tree theory in distribution systems

    NASA Astrophysics Data System (ADS)

    Li, H. W.; Pan, Z. Y.; Ren, Y. B.; Wang, J.; Gan, Y. L.; Zheng, Z. Z.; Wang, W.

    2018-03-01

    According to the radial operation characteristics in distribution systems, this paper proposes a new method based on minimum spanning trees method for optimal capacitor switching. Firstly, taking the minimal active power loss as objective function and not considering the capacity constraints of capacitors and source, this paper uses Prim algorithm among minimum spanning trees algorithms to get the power supply ranges of capacitors and source. Then with the capacity constraints of capacitors considered, capacitors are ranked by the method of breadth-first search. In term of the order from high to low of capacitor ranking, capacitor compensation capacity based on their power supply range is calculated. Finally, IEEE 69 bus system is adopted to test the accuracy and practicality of the proposed algorithm.

  3. Improving the Rank Precision of Population Health Measures for Small Areas with Longitudinal and Joint Outcome Models

    PubMed Central

    Athens, Jessica K.; Remington, Patrick L.; Gangnon, Ronald E.

    2015-01-01

    Objectives The University of Wisconsin Population Health Institute has published the County Health Rankings since 2010. These rankings use population-based data to highlight health outcomes and the multiple determinants of these outcomes and to encourage in-depth health assessment for all United States counties. A significant methodological limitation, however, is the uncertainty of rank estimates, particularly for small counties. To address this challenge, we explore the use of longitudinal and pooled outcome data in hierarchical Bayesian models to generate county ranks with greater precision. Methods In our models we used pooled outcome data for three measure groups: (1) Poor physical and poor mental health days; (2) percent of births with low birth weight and fair or poor health prevalence; and (3) age-specific mortality rates for nine age groups. We used the fixed and random effects components of these models to generate posterior samples of rates for each measure. We also used time-series data in longitudinal random effects models for age-specific mortality. Based on the posterior samples from these models, we estimate ranks and rank quartiles for each measure, as well as the probability of a county ranking in its assigned quartile. Rank quartile probabilities for univariate, joint outcome, and/or longitudinal models were compared to assess improvements in rank precision. Results The joint outcome model for poor physical and poor mental health days resulted in improved rank precision, as did the longitudinal model for age-specific mortality rates. Rank precision for low birth weight births and fair/poor health prevalence based on the univariate and joint outcome models were equivalent. Conclusion Incorporating longitudinal or pooled outcome data may improve rank certainty, depending on characteristics of the measures selected. For measures with different determinants, joint modeling neither improved nor degraded rank precision. This approach suggests a simple way to use existing information to improve the precision of small-area measures of population health. PMID:26098858

  4. Study of Methods for Assessing Research Topic Elicitation and pRioritization (SMARTER): Study Protocol to Compare Qualitative Research Methods and Advance Patient Engagement in Research

    PubMed Central

    Comstock, Bryan

    2017-01-01

    Background Involving patients as partners in research is a defining characteristic of patient-centered outcomes research (PCOR). While patients’ experiential knowledge of a health condition or treatment may yield research priorities not reflected by researchers and policy makers, the methods for identifying and effectively collaborating with patients are still evolving. Patient registries and crowdsourcing may offer ease of access and convenience to both researchers and patients. Surveys and focus groups, including online modalities, have been described for prioritizing research topics. However, little is known about how these different methods compare in producing consistent priorities and similar perceptions of engagement quality among participants. Objective The aims of this study are (1) to compare how different engagement methods used to elicit patient priorities for research perform as measured by rankings for priorities generated and participant satisfaction; and (2) to determine characteristics of individuals choosing to participate in research prioritization activities. Methods Participants in the Back pain Outcomes using Longitudinal Data (BOLD) patient registry, established to evaluate the natural history of back pain among individuals 65 years and older, and participants on the Amazon Mechanical Turk (MTurk) crowdsourcing platform, to provide input on priorities for research via a questionnaire, are invited. For BOLD participants, we subsequently randomize interested respondents to 1 of 3 interactive prioritization activities to further develop priorities: a Delphi panel, an online crowd voting activity, or an in-person facilitated prioritization activity using nominal group technique (NGT). Participants involved in each activity complete a survey to evaluate the quality of the experience and a subset of these participants discuss their experience further in an interview. Descriptive statistics are used to characterize the rankings produced by each method and compare the top 5 rated topics resulting from each prioritization activity. We use rank-ordered logistic regression models to identify associations of the ranked priority topics with baseline patient characteristics. We analyze responses to the evaluation using a mixed-methods approach wherein we tabulate responses to Likert-scale questions and use content analysis to enumerate themes emerging from interviews for the 3 activities. Results In Phase I, we invite approximately 3000 BOLD participants and 500 Amazon MTurk workers to complete a research topic prioritization survey. Based on these results, we include additional topics into a subsequent prioritization survey. In Phase II, we invite BOLD participants to join 1 of 3 activities: 90 participants for Delphi panel, 100 participants for crowd voting, and 60 participants for focus groups. Of the Phase II participants, 30 will be interviewed to evaluate the activities. Conclusions This study informs decisions about how to conduct outreach to patient registry participants for providing input on research priorities, how individuals 65 years and older wish to participate in engagement activities, and how different research prioritization methods compare in terms of rankings generated and participant satisfaction. PMID:28882810

  5. Quantum Entanglement and Reduced Density Matrices

    NASA Astrophysics Data System (ADS)

    Purwanto, Agus; Sukamto, Heru; Yuwana, Lila

    2018-05-01

    We investigate entanglement and separability criteria of multipartite (n-partite) state by examining ranks of its reduced density matrices. Firstly, we construct the general formula to determine the criterion. A rank of origin density matrix always equals one, meanwhile ranks of reduced matrices have various ranks. Next, separability and entanglement criterion of multipartite is determined by calculating ranks of reduced density matrices. In this article we diversify multipartite state criteria into completely entangled state, completely separable state, and compound state, i.e. sub-entangled state and sub-entangledseparable state. Furthermore, we also shorten the calculation proposed by the previous research to determine separability of multipartite state and expand the methods to be able to differ multipartite state based on criteria above.

  6. Effects of a work-based critical reflection program for novice nurses.

    PubMed

    Kim, Yeon Hee; Min, Ja; Kim, Soon Hee; Shin, Sujin

    2018-02-27

    Critical reflection is effective in improving students' communication abilities and confidence. The aim of this study was to evaluate the effectiveness of a work-based critical reflection program to enhance novice nurses' clinical critical-thinking abilities, communication competency, and job performance. The present study used a quasi-experimental design. From October 2014 to August 2015, we collected data from 44 novice nurses working in an advanced general hospital in S city in Korea. Nurses in the experimental group participated in a critical reflection program for six months. Outcome variables were clinical critical-thinking skills, communication abilities, and job performance. A non-parametric Mann-Whitney U-test and a Wilcoxon rank sum test were selected to evaluate differences in mean ranks and to assess the null hypothesis that the medians were equal across the groups. The results showed that the clinical critical-thinking skills of those in the experimental group improved significantly (p = 0.003). The differences in mean ranks of communication ability between two groups was significantly statistically different (p = 0.028). Job performance improved significantly in both the experimental group and the control group, so there was no statistical difference (p = 0.294). We therefore suggest that a critical reflection program be considered an essential tool for improving critical thinking and communication abilities among novice nurses who need to adapt to the clinical environment as quickly as possible. Further, we suggest conducting research into critical reflection programs among larger and more diverse samples.

  7. SIMULTANEOUS MULTISLICE MAGNETIC RESONANCE FINGERPRINTING WITH LOW-RANK AND SUBSPACE MODELING

    PubMed Central

    Zhao, Bo; Bilgic, Berkin; Adalsteinsson, Elfar; Griswold, Mark A.; Wald, Lawrence L.; Setsompop, Kawin

    2018-01-01

    Magnetic resonance fingerprinting (MRF) is a new quantitative imaging paradigm that enables simultaneous acquisition of multiple magnetic resonance tissue parameters (e.g., T1, T2, and spin density). Recently, MRF has been integrated with simultaneous multislice (SMS) acquisitions to enable volumetric imaging with faster scan time. In this paper, we present a new image reconstruction method based on low-rank and subspace modeling for improved SMS-MRF. Here the low-rank model exploits strong spatiotemporal correlation among contrast-weighted images, while the subspace model captures the temporal evolution of magnetization dynamics. With the proposed model, the image reconstruction problem is formulated as a convex optimization problem, for which we develop an algorithm based on variable splitting and the alternating direction method of multipliers. The performance of the proposed method has been evaluated by numerical experiments, and the results demonstrate that the proposed method leads to improved accuracy over the conventional approach. Practically, the proposed method has a potential to allow for a 3x speedup with minimal reconstruction error, resulting in less than 5 sec imaging time per slice. PMID:29060594

  8. The Typicality Ranking Task: A New Method to Derive Typicality Judgments from Children.

    PubMed

    Djalal, Farah Mutiasari; Ameel, Eef; Storms, Gert

    2016-01-01

    An alternative method for deriving typicality judgments, applicable in young children that are not familiar with numerical values yet, is introduced, allowing researchers to study gradedness at younger ages in concept development. Contrary to the long tradition of using rating-based procedures to derive typicality judgments, we propose a method that is based on typicality ranking rather than rating, in which items are gradually sorted according to their typicality, and that requires a minimum of linguistic knowledge. The validity of the method is investigated and the method is compared to the traditional typicality rating measurement in a large empirical study with eight different semantic concepts. The results show that the typicality ranking task can be used to assess children's category knowledge and to evaluate how this knowledge evolves over time. Contrary to earlier held assumptions in studies on typicality in young children, our results also show that preference is not so much a confounding variable to be avoided, but that both variables are often significantly correlated in older children and even in adults.

  9. The Typicality Ranking Task: A New Method to Derive Typicality Judgments from Children

    PubMed Central

    Ameel, Eef; Storms, Gert

    2016-01-01

    An alternative method for deriving typicality judgments, applicable in young children that are not familiar with numerical values yet, is introduced, allowing researchers to study gradedness at younger ages in concept development. Contrary to the long tradition of using rating-based procedures to derive typicality judgments, we propose a method that is based on typicality ranking rather than rating, in which items are gradually sorted according to their typicality, and that requires a minimum of linguistic knowledge. The validity of the method is investigated and the method is compared to the traditional typicality rating measurement in a large empirical study with eight different semantic concepts. The results show that the typicality ranking task can be used to assess children’s category knowledge and to evaluate how this knowledge evolves over time. Contrary to earlier held assumptions in studies on typicality in young children, our results also show that preference is not so much a confounding variable to be avoided, but that both variables are often significantly correlated in older children and even in adults. PMID:27322371

  10. Simultaneous multislice magnetic resonance fingerprinting with low-rank and subspace modeling.

    PubMed

    Bo Zhao; Bilgic, Berkin; Adalsteinsson, Elfar; Griswold, Mark A; Wald, Lawrence L; Setsompop, Kawin

    2017-07-01

    Magnetic resonance fingerprinting (MRF) is a new quantitative imaging paradigm that enables simultaneous acquisition of multiple magnetic resonance tissue parameters (e.g., T 1 , T 2 , and spin density). Recently, MRF has been integrated with simultaneous multislice (SMS) acquisitions to enable volumetric imaging with faster scan time. In this paper, we present a new image reconstruction method based on low-rank and subspace modeling for improved SMS-MRF. Here the low-rank model exploits strong spatiotemporal correlation among contrast-weighted images, while the subspace model captures the temporal evolution of magnetization dynamics. With the proposed model, the image reconstruction problem is formulated as a convex optimization problem, for which we develop an algorithm based on variable splitting and the alternating direction method of multipliers. The performance of the proposed method has been evaluated by numerical experiments, and the results demonstrate that the proposed method leads to improved accuracy over the conventional approach. Practically, the proposed method has a potential to allow for a 3× speedup with minimal reconstruction error, resulting in less than 5 sec imaging time per slice.

  11. Development of Welding Fumes Health Index (WFHI) for Welding Workplace’s Safety and Health Assessment

    PubMed Central

    HARIRI, Azian; PAIMAN, Nuur Azreen; LEMAN, Abdul Mutalib; MD. YUSOF, Mohammad Zainal

    2014-01-01

    Abstract Background This study aimed to develop an index that can rank welding workplace that associate well with possible health risk of welders. Methods Welding Fumes Health Index (WFHI) were developed based on data from case studies conducted in Plant 1 and Plant 2. Personal sampling of welding fumes to assess the concentration of metal constituents along with series of lung function tests was conducted. Fifteen metal constituents were investigated in each case study. Index values were derived from aggregation analysis of metal constituent concentration while significant lung functions were recognized through statistical analysis in each plant. Results The results showed none of the metal constituent concentration was exceeding the permissible exposure limit (PEL) for all plants. However, statistical analysis showed significant mean differences of lung functions between welders and non-welders. The index was then applied to one of the welding industry (Plant 3) for verification purpose. The developed index showed its promising ability to rank welding workplace, according to the multiple constituent concentrations of welding fumes that associates well with lung functions of the investigated welders. Conclusion There was possibility that some of the metal constituents were below the detection limit leading to ‘0’ value of sub index, thus the multiplicative form of aggregation model was not suitable for analysis. On the other hand, maximum or minimum operator forms suffer from compensation issues and were not considered in this study. PMID:25927034

  12. The Evolution of Dental Journals from 2003 to 2012: A Bibliometric Analysis

    PubMed Central

    Jayaratne, Yasas Shri Nalaka; Zwahlen, Roger Arthur

    2015-01-01

    Bibliometrics are a set of methods, which can be used to analyze academic literature quantitatively and its changes over time. The objectives of this study were 1) to evaluate trends related to academic performance of dental journals from 2003 to 2012 using bibliometric indices, and 2) monitor the changes of the five dental journals with the highest and lowest impact factor (IF) published in 2003. Data for the subject category "Dentistry, Oral Surgery & Medicine" was retrieved from the Journal Citation Reports (JCR) published from 2003 to 2012. Linear regressions analysis was used to determine statistical trends over the years with each bibliometric indicator as the dependent variable and the JCR year as the predictor variable. Statistically significant rise in the total number of dental journals, the number of all articles with the steepest rise observed for research articles, the number of citations and the aggregate IF was observed from 2003 to 2012. The analysis of the five top and five bottom-tire dental journals revealed a rise in IF however, with a wide variation in relation to the magnitude of this rise. Although the IF of the top five journals remained relatively constant, the percentile ranks of the four lowest ranking journals in 2003 increased significantly with the sharpest rise being noted for the British Journal of Oral & Maxillofacial Surgery. This study revealed significant growth of dental literature in absolute terms, as well as upward trends for most of the citation-based bibliometric indices from 2003 to 2012. PMID:25781486

  13. The evolution of dental journals from 2003 to 2012: a bibliometric analysis.

    PubMed

    Jayaratne, Yasas Shri Nalaka; Zwahlen, Roger Arthur

    2015-01-01

    Bibliometrics are a set of methods, which can be used to analyze academic literature quantitatively and its changes over time. The objectives of this study were 1) to evaluate trends related to academic performance of dental journals from 2003 to 2012 using bibliometric indices, and 2) monitor the changes of the five dental journals with the highest and lowest impact factor (IF) published in 2003. Data for the subject category "Dentistry, Oral Surgery & Medicine" was retrieved from the Journal Citation Reports (JCR) published from 2003 to 2012. Linear regressions analysis was used to determine statistical trends over the years with each bibliometric indicator as the dependent variable and the JCR year as the predictor variable. Statistically significant rise in the total number of dental journals, the number of all articles with the steepest rise observed for research articles, the number of citations and the aggregate IF was observed from 2003 to 2012. The analysis of the five top and five bottom-tire dental journals revealed a rise in IF however, with a wide variation in relation to the magnitude of this rise. Although the IF of the top five journals remained relatively constant, the percentile ranks of the four lowest ranking journals in 2003 increased significantly with the sharpest rise being noted for the British Journal of Oral & Maxillofacial Surgery. This study revealed significant growth of dental literature in absolute terms, as well as upward trends for most of the citation-based bibliometric indices from 2003 to 2012.

  14. A generalized Kruskal-Wallis test incorporating group uncertainty with application to genetic association studies.

    PubMed

    Acar, Elif F; Sun, Lei

    2013-06-01

    Motivated by genetic association studies of SNPs with genotype uncertainty, we propose a generalization of the Kruskal-Wallis test that incorporates group uncertainty when comparing k samples. The extended test statistic is based on probability-weighted rank-sums and follows an asymptotic chi-square distribution with k - 1 degrees of freedom under the null hypothesis. Simulation studies confirm the validity and robustness of the proposed test in finite samples. Application to a genome-wide association study of type 1 diabetic complications further demonstrates the utilities of this generalized Kruskal-Wallis test for studies with group uncertainty. The method has been implemented as an open-resource R program, GKW. © 2013, The International Biometric Society.

  15. A learning framework for age rank estimation based on face images with scattering transform.

    PubMed

    Chang, Kuang-Yu; Chen, Chu-Song

    2015-03-01

    This paper presents a cost-sensitive ordinal hyperplanes ranking algorithm for human age estimation based on face images. The proposed approach exploits relative-order information among the age labels for rank prediction. In our approach, the age rank is obtained by aggregating a series of binary classification results, where cost sensitivities among the labels are introduced to improve the aggregating performance. In addition, we give a theoretical analysis on designing the cost of individual binary classifier so that the misranking cost can be bounded by the total misclassification costs. An efficient descriptor, scattering transform, which scatters the Gabor coefficients and pooled with Gaussian smoothing in multiple layers, is evaluated for facial feature extraction. We show that this descriptor is a generalization of conventional bioinspired features and is more effective for face-based age inference. Experimental results demonstrate that our method outperforms the state-of-the-art age estimation approaches.

  16. Stochastic constructions of flows of rank 1

    NASA Astrophysics Data System (ADS)

    Prikhod'ko, A. A.

    2001-12-01

    Automorphisms of rank 1 appeared in the well-known papers of Chacon (1965), who constructed an example of a weakly mixing automorphism not having the strong mixing property, and Ornstein (1970), who proved the existence of mixing automorphisms without a square root. Ornstein's construction is essentially stochastic, since its parameters are chosen in a "sufficiently random manner" according to a certain random law.In the present article it is shown that mixing flows of rank 1 exist. The construction given is also stochastic and is based to a large extent on ideas in Ornstein's paper. At the same time it complements Ornstein's paper and makes it more transparent. The construction can be used also to obtain automorphisms with various approximation and statistical properties. It is established that the new examples of dynamical systems are not isomorphic to Ornstein automorphisms, that is, they are qualitatively new.

  17. Stochastic constructions of flows of rank 1

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Prikhod'ko, A A

    2001-12-31

    Automorphisms of rank 1 appeared in the well-known papers of Chacon (1965), who constructed an example of a weakly mixing automorphism not having the strong mixing property, and Ornstein (1970), who proved the existence of mixing automorphisms without a square root. Ornstein's construction is essentially stochastic, since its parameters are chosen in a 'sufficiently random manner' according to a certain random law. In the present article it is shown that mixing flows of rank 1 exist. The construction given is also stochastic and is based to a large extent on ideas in Ornstein's paper. At the same time it complementsmore » Ornstein's paper and makes it more transparent. The construction can be used also to obtain automorphisms with various approximation and statistical properties. It is established that the new examples of dynamical systems are not isomorphic to Ornstein automorphisms, that is, they are qualitatively new.« less

  18. Using Bloom's Taxonomy to Evaluate the Cognitive Levels of Master Class Textbook's Questions

    ERIC Educational Resources Information Center

    Assaly, Ibtihal R.; Smadi, Oqlah M.

    2015-01-01

    This study aimed at evaluating the cognitive levels of the questions following the reading texts of Master Class textbook. A checklist based on Bloom's Taxonomy was the instrument used to categorize the cognitive levels of these questions. The researchers used proper statistics to rank the cognitive levels of the comprehension questions. The…

  19. Rasch Based Analysis of Oral Proficiency Test Data.

    ERIC Educational Resources Information Center

    Nakamura, Yuji

    2001-01-01

    This paper examines the rating scale data of oral proficiency tests analyzed by a Rasch Analysis focusing on an item map and factor analysis. In discussing the item map, the difficulty order of six items and students' answering patterns are analyzed using descriptive statistics and measures of central tendency of test scores. The data ranks the…

  20. Tuition at PhD-Granting Institutions: A Supply and Demand Model.

    ERIC Educational Resources Information Center

    Koshal, Rajindar K.; And Others

    1994-01-01

    Builds and estimates a model that explains educational supply and demand behavior at PhD-granting institutions in the United States. The statistical analysis based on 1988-89 data suggests that student quantity, educational costs, average SAT score, class size, percentage of faculty with a PhD, graduation rate, ranking, and existence of a medical…

Top