apply statistical methods: Topics by Science.gov

Sample records for apply statistical methods

Statistical methods used in articles published by the Journal of Periodontal and Implant Science.

PubMed

Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young

2014-12-01

The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.
Evaluation of PDA Technical Report No 33. Statistical Testing Recommendations for a Rapid Microbiological Method Case Study.

PubMed

Murphy, Thomas; Schwedock, Julie; Nguyen, Kham; Mills, Anna; Jones, David

2015-01-01

New recommendations for the validation of rapid microbiological methods have been included in the revised Technical Report 33 release from the PDA. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This case study applies those statistical methods to accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological methods system being evaluated for water bioburden testing. Results presented demonstrate that the statistical methods described in the PDA Technical Report 33 chapter can all be successfully applied to the rapid microbiological method data sets and gave the same interpretation for equivalence to the standard method. The rapid microbiological method was in general able to pass the requirements of PDA Technical Report 33, though the study shows that there can be occasional outlying results and that caution should be used when applying statistical methods to low average colony-forming unit values. Prior to use in a quality-controlled environment, any new method or technology has to be shown to work as designed by the manufacturer for the purpose required. For new rapid microbiological methods that detect and enumerate contaminating microorganisms, additional recommendations have been provided in the revised PDA Technical Report No. 33. The changes include a more comprehensive review of the statistical methods to be used to analyze data obtained during validation. This paper applies those statistical methods to analyze accuracy, precision, ruggedness, and equivalence data obtained using a rapid microbiological method system being validated for water bioburden testing. The case study demonstrates that the statistical methods described in the PDA Technical Report No. 33 chapter can be successfully applied to rapid microbiological method data sets and give the same comparability results for similarity or difference as the standard method. © PDA, Inc. 2015.
Cluster detection methods applied to the Upper Cape Cod cancer data.

PubMed

Ozonoff, Al; Webster, Thomas; Vieira, Veronica; Weinberg, Janice; Ozonoff, David; Aschengrau, Ann

2005-09-15

A variety of statistical methods have been suggested to assess the degree and/or the location of spatial clustering of disease cases. However, there is relatively little in the literature devoted to comparison and critique of different methods. Most of the available comparative studies rely on simulated data rather than real data sets. We have chosen three methods currently used for examining spatial disease patterns: the M-statistic of Bonetti and Pagano; the Generalized Additive Model (GAM) method as applied by Webster; and Kulldorff's spatial scan statistic. We apply these statistics to analyze breast cancer data from the Upper Cape Cancer Incidence Study using three different latency assumptions. The three different latency assumptions produced three different spatial patterns of cases and controls. For 20 year latency, all three methods generally concur. However, for 15 year latency and no latency assumptions, the methods produce different results when testing for global clustering. The comparative analyses of real data sets by different statistical methods provides insight into directions for further research. We suggest a research program designed around examining real data sets to guide focused investigation of relevant features using simulated data, for the purpose of understanding how to interpret statistical methods applied to epidemiological data with a spatial component.
Applied Behavior Analysis and Statistical Process Control?

ERIC Educational Resources Information Center

Hopkins, B. L.

1995-01-01

Incorporating statistical process control (SPC) methods into applied behavior analysis is discussed. It is claimed that SPC methods would likely reduce applied behavior analysts' intimate contacts with problems and would likely yield poor treatment and research decisions. Cases and data presented by Pfadt and Wheeler (1995) are cited as examples.…
A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

PubMed

Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

2013-01-01

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
Recurrence of attic cholesteatoma: different methods of estimating recurrence rates.

PubMed

Stangerup, S E; Drozdziewicz, D; Tos, M; Hougaard-Jensen, A

2000-09-01

One problem in cholesteatoma surgery is recurrence of cholesteatoma, which is reported to vary from 5% to 71%. This great variability can be explained by issues such as the type of cholesteatoma, surgical technique, follow-up rate, length of the postoperative observation period, and statistical method applied. The aim of this study was to illustrate the impact of applying different statistical methods to the same material. Thirty-three children underwent single-stage surgery for attic cholesteatoma during a 15-year period. Thirty patients (94%) attended a re-evaluation. During the observation period of 15 years, recurrence of cholesteatoma occurred in 10 ears. The cumulative total recurrence rate varied from 30% to 67%, depending on the statistical method applied. In conclusion, the choice of statistical method should depend on the number of patients, follow-up rates, length of the postoperative observation period and presence of censored data.
Introducing 3D U-statistic method for separating anomaly from background in exploration geochemical data with associated software development

NASA Astrophysics Data System (ADS)

Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir

2016-03-01

The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.
A study of two statistical methods as applied to shuttle solid rocket booster expenditures

NASA Technical Reports Server (NTRS)

Perlmutter, M.; Huang, Y.; Graves, M.

1974-01-01

The state probability technique and the Monte Carlo technique are applied to finding shuttle solid rocket booster expenditure statistics. For a given attrition rate per launch, the probable number of boosters needed for a given mission of 440 launches is calculated. Several cases are considered, including the elimination of the booster after a maximum of 20 consecutive launches. Also considered is the case where the booster is composed of replaceable components with independent attrition rates. A simple cost analysis is carried out to indicate the number of boosters to build initially, depending on booster costs. Two statistical methods were applied in the analysis: (1) state probability method which consists of defining an appropriate state space for the outcome of the random trials, and (2) model simulation method or the Monte Carlo technique. It was found that the model simulation method was easier to formulate while the state probability method required less computing time and was more accurate.
Analysis of Statistical Methods and Errors in the Articles Published in the Korean Journal of Pain

PubMed Central

Yim, Kyoung Hoon; Han, Kyoung Ah; Park, Soo Young

2010-01-01

Background Statistical analysis is essential in regard to obtaining objective reliability for medical research. However, medical researchers do not have enough statistical knowledge to properly analyze their study data. To help understand and potentially alleviate this problem, we have analyzed the statistical methods and errors of articles published in the Korean Journal of Pain (KJP), with the intention to improve the statistical quality of the journal. Methods All the articles, except case reports and editorials, published from 2004 to 2008 in the KJP were reviewed. The types of applied statistical methods and errors in the articles were evaluated. Results One hundred and thirty-nine original articles were reviewed. Inferential statistics and descriptive statistics were used in 119 papers and 20 papers, respectively. Only 20.9% of the papers were free from statistical errors. The most commonly adopted statistical method was the t-test (21.0%) followed by the chi-square test (15.9%). Errors of omission were encountered 101 times in 70 papers. Among the errors of omission, "no statistics used even though statistical methods were required" was the most common (40.6%). The errors of commission were encountered 165 times in 86 papers, among which "parametric inference for nonparametric data" was the most common (33.9%). Conclusions We found various types of statistical errors in the articles published in the KJP. This suggests that meticulous attention should be given not only in the applying statistical procedures but also in the reviewing process to improve the value of the article. PMID:20552071
The estimation of the measurement results with using statistical methods

NASA Astrophysics Data System (ADS)

Velychko, O.; Gordiyenko, T.

2015-02-01

The row of international standards and guides describe various statistical methods that apply for a management, control and improvement of processes with the purpose of realization of analysis of the technical measurement results. The analysis of international standards and guides on statistical methods estimation of the measurement results recommendations for those applications in laboratories is described. For realization of analysis of standards and guides the cause-and-effect Ishikawa diagrams concerting to application of statistical methods for estimation of the measurement results are constructed.
HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

PubMed

Song, Chi; Tseng, George C

2014-01-01

Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.
A critique of the usefulness of inferential statistics in applied behavior analysis

PubMed Central

Hopkins, B. L.; Cole, Brian L.; Mason, Tina L.

1998-01-01

Researchers continue to recommend that applied behavior analysts use inferential statistics in making decisions about effects of independent variables on dependent variables. In many other approaches to behavioral science, inferential statistics are the primary means for deciding the importance of effects. Several possible uses of inferential statistics are considered. Rather than being an objective means for making decisions about effects, as is often claimed, inferential statistics are shown to be subjective. It is argued that the use of inferential statistics adds nothing to the complex and admittedly subjective nonstatistical methods that are often employed in applied behavior analysis. Attacks on inferential statistics that are being made, perhaps with increasing frequency, by those who are not behavior analysts, are discussed. These attackers are calling for banning the use of inferential statistics in research publications and commonly recommend that behavioral scientists should switch to using statistics aimed at interval estimation or the method of confidence intervals. Interval estimation is shown to be contrary to the fundamental assumption of behavior analysis that only individuals behave. It is recommended that authors who wish to publish the results of inferential statistics be asked to justify them as a means for helping us to identify any ways in which they may be useful. PMID:22478304
Compendium of Methods for Applying Measured Data to Vibration and Acoustic Problems

DTIC Science & Technology

1985-10-01

statistical energy analysis , finite element models, transfer function...Procedures for the Modal Analysis Method .............................................. 8-22 8.4 Summary of the Procedures for the Statistical Energy Analysis Method... statistical energy analysis . 8-1 • o + . . i... "_+,A" L + "+..• •+A ’! i, + +.+ +• o.+ -ore -+. • -..- , .%..% ". • 2 -".-2- ;.-.’, . o . It is helpful
Scene-based nonuniformity correction using local constant statistics.

PubMed

Zhang, Chao; Zhao, Wenyi

2008-06-01

In scene-based nonuniformity correction, the statistical approach assumes all possible values of the true-scene pixel are seen at each pixel location. This global-constant-statistics assumption does not distinguish fixed pattern noise from spatial variations in the average image. This often causes the "ghosting" artifacts in the corrected images since the existing spatial variations are treated as noises. We introduce a new statistical method to reduce the ghosting artifacts. Our method proposes a local-constant statistics that assumes that the temporal signal distribution is not constant at each pixel but is locally true. This considers statistically a constant distribution in a local region around each pixel but uneven distribution in a larger scale. Under the assumption that the fixed pattern noise concentrates in a higher spatial-frequency domain than the distribution variation, we apply a wavelet method to the gain and offset image of the noise and separate out the pattern noise from the spatial variations in the temporal distribution of the scene. We compare the results to the global-constant-statistics method using a clean sequence with large artificial pattern noises. We also apply the method to a challenging CCD video sequence and a LWIR sequence to show how effective it is in reducing noise and the ghosting artifacts.
Experimental Mathematics and Computational Statistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H.; Borwein, Jonathan M.

2009-04-30

The field of statistics has long been noted for techniques to detect patterns and regularities in numerical data. In this article we explore connections between statistics and the emerging field of 'experimental mathematics'. These includes both applications of experimental mathematics in statistics, as well as statistical methods applied to computational mathematics.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

PubMed

Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

2016-12-01

In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
[Statistics for statistics?--Thoughts about psychological tools].

PubMed

Berger, Uwe; Stöbel-Richter, Yve

2007-12-01

Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.
Transmission overhaul and replacement predictions using Weibull and renewel theory

NASA Technical Reports Server (NTRS)

Savage, M.; Lewicki, D. G.

1989-01-01

A method to estimate the frequency of transmission overhauls is presented. This method is based on the two-parameter Weibull statistical distribution for component life. A second method is presented to estimate the number of replacement components needed to support the transmission overhaul pattern. The second method is based on renewal theory. Confidence statistics are applied with both methods to improve the statistical estimate of sample behavior. A transmission example is also presented to illustrate the use of the methods. Transmission overhaul frequency and component replacement calculations are included in the example.
AN EMPIRICAL BAYES APPROACH TO COMBINING ESTIMATES OF THE VALUE OF A STATISTICAL LIFE FOR ENVIRONMENTAL POLICY ANALYSIS

EPA Science Inventory

This analysis updates EPA's standard VSL estimate by using a more comprehensive collection of VSL studies that include studies published between 1992 and 2000, as well as applying a more appropriate statistical method. We provide a pooled effect VSL estimate by applying the empi...
Errors in statistical decision making Chapter 2 in Applied Statistics in Agricultural, Biological, and Environmental Sciences

USDA-ARS?s Scientific Manuscript database

Agronomic and Environmental research experiments result in data that are analyzed using statistical methods. These data are unavoidably accompanied by uncertainty. Decisions about hypotheses, based on statistical analyses of these data are therefore subject to error. This error is of three types,...

BAYESIAN ESTIMATION OF THERMONUCLEAR REACTION RATES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Iliadis, C.; Anderson, K. S.; Coc, A.

The problem of estimating non-resonant astrophysical S -factors and thermonuclear reaction rates, based on measured nuclear cross sections, is of major interest for nuclear energy generation, neutrino physics, and element synthesis. Many different methods have been applied to this problem in the past, almost all of them based on traditional statistics. Bayesian methods, on the other hand, are now in widespread use in the physical sciences. In astronomy, for example, Bayesian statistics is applied to the observation of extrasolar planets, gravitational waves, and Type Ia supernovae. However, nuclear physics, in particular, has been slow to adopt Bayesian methods. We presentmore » astrophysical S -factors and reaction rates based on Bayesian statistics. We develop a framework that incorporates robust parameter estimation, systematic effects, and non-Gaussian uncertainties in a consistent manner. The method is applied to the reactions d(p, γ ){sup 3}He, {sup 3}He({sup 3}He,2p){sup 4}He, and {sup 3}He( α , γ ){sup 7}Be, important for deuterium burning, solar neutrinos, and Big Bang nucleosynthesis.« less
Regression modeling of ground-water flow

USGS Publications Warehouse

Cooley, R.L.; Naff, R.L.

1985-01-01

Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
[Diversity and frequency of scientific research design and statistical methods in the "Arquivos Brasileiros de Oftalmologia": a systematic review of the "Arquivos Brasileiros de Oftalmologia"--1993-2002].

PubMed

Crosta, Fernando; Nishiwaki-Dantas, Maria Cristina; Silvino, Wilmar; Dantas, Paulo Elias Correa

2005-01-01

To verify the frequency of study design, applied statistical analysis and approval by institutional review offices (Ethics Committee) of articles published in the "Arquivos Brasileiros de Oftalmologia" during a 10-year interval, with later comparative and critical analysis by some of the main international journals in the field of Ophthalmology. Systematic review without metanalysis was performed. Scientific papers published in the "Arquivos Brasileiros de Oftalmologia" between January 1993 and December 2002 were reviewed by two independent reviewers and classified according to the applied study design, statistical analysis and approval by the institutional review offices. To categorize those variables, a descriptive statistical analysis was used. After applying inclusion and exclusion criteria, 584 articles for evaluation of statistical analysis and, 725 articles for evaluation of study design were reviewed. Contingency table (23.10%) was the most frequently applied statistical method, followed by non-parametric tests (18.19%), Student's t test (12.65%), central tendency measures (10.60%) and analysis of variance (9.81%). Of 584 reviewed articles, 291 (49.82%) presented no statistical analysis. Observational case series (26.48%) was the most frequently used type of study design, followed by interventional case series (18.48%), observational case description (13.37%), non-random clinical study (8.96%) and experimental study (8.55%). We found a higher frequency of observational clinical studies, lack of statistical analysis in almost half of the published papers. Increase in studies with approval by institutional review Ethics Committee was noted since it became mandatory in 1996.
[Evaluation of using statistical methods in selected national medical journals].

PubMed

Sych, Z

1996-01-01

The paper covers the performed evaluation of frequency with which the statistical methods were applied in analyzed works having been published in six selected, national medical journals in the years 1988-1992. For analysis the following journals were chosen, namely: Klinika Oczna, Medycyna Pracy, Pediatria Polska, Polski Tygodnik Lekarski, Roczniki Państwowego Zakładu Higieny, Zdrowie Publiczne. Appropriate number of works up to the average in the remaining medical journals was randomly selected from respective volumes of Pol. Tyg. Lek. The studies did not include works wherein the statistical analysis was not implemented, which referred both to national and international publications. That exemption was also extended to review papers, casuistic ones, reviews of books, handbooks, monographies, reports from scientific congresses, as well as papers on historical topics. The number of works was defined in each volume. Next, analysis was performed to establish the mode of finding out a suitable sample in respective studies, differentiating two categories: random and target selections. Attention was also paid to the presence of control sample in the individual works. In the analysis attention was also focussed on the existence of sample characteristics, setting up three categories: complete, partial and lacking. In evaluating the analyzed works an effort was made to present the results of studies in tables and figures (Tab. 1, 3). Analysis was accomplished with regard to the rate of employing statistical methods in analyzed works in relevant volumes of six selected, national medical journals for the years 1988-1992, simultaneously determining the number of works, in which no statistical methods were used. Concurrently the frequency of applying the individual statistical methods was analyzed in the scrutinized works. Prominence was given to fundamental statistical methods in the field of descriptive statistics (measures of position, measures of dispersion) as well as most important methods of mathematical statistics such as parametric tests of significance, analysis of variance (in single and dual classifications). non-parametric tests of significance, correlation and regression. The works, in which use was made of either multiple correlation or multiple regression or else more complex methods of studying the relationship for two or more numbers of variables, were incorporated into the works whose statistical methods were constituted by correlation and regression as well as other methods, e.g. statistical methods being used in epidemiology (coefficients of incidence and morbidity, standardization of coefficients, survival tables) factor analysis conducted by Jacobi-Hotellng's method, taxonomic methods and others. On the basis of the performed studies it has been established that the frequency of employing statistical methods in the six selected national, medical journals in the years 1988-1992 was 61.1-66.0% of the analyzed works (Tab. 3), and they generally were almost similar to the frequency provided in English language medical journals. On a whole, no significant differences were disclosed in the frequency of applied statistical methods (Tab. 4) as well as in frequency of random tests (Tab. 3) in the analyzed works, appearing in the medical journals in respective years 1988-1992. The most frequently used statistical methods in analyzed works for 1988-1992 were the measures of position 44.2-55.6% and measures of dispersion 32.5-38.5% as well as parametric tests of significance 26.3-33.1% of the works analyzed (Tab. 4). For the purpose of increasing the frequency and reliability of the used statistical methods, the didactics should be widened in the field of biostatistics at medical studies and postgraduation training designed for physicians and scientific-didactic workers.
An Applied Statistics Course for Systematics and Ecology PhD Students

ERIC Educational Resources Information Center

Ojeda, Mario Miguel; Sosa, Victoria

2002-01-01

Statistics education is under review at all educational levels. Statistical concepts, as well as the use of statistical methods and techniques, can be taught in at least two contrasting ways. Specifically, (1) teaching can be theoretically and mathematically oriented, or (2) it can be less mathematically oriented being focused, instead, on…
Investigation of Statistical Inference Methodologies Through Scale Model Propagation Experiments

DTIC Science & Technology

2015-09-30

statistical inference methodologies for ocean- acoustic problems by investigating and applying statistical methods to data collected from scale-model...to begin planning experiments for statistical inference applications. APPROACH In the ocean acoustics community over the past two decades...solutions for waveguide parameters. With the introduction of statistical inference to the field of ocean acoustics came the desire to interpret marginal
Statistics of Smoothed Cosmic Fields in Perturbation Theory. I. Formulation and Useful Formulae in Second-Order Perturbation Theory

NASA Astrophysics Data System (ADS)

Matsubara, Takahiko

2003-02-01

We formulate a general method for perturbative evaluations of statistics of smoothed cosmic fields and provide useful formulae for application of the perturbation theory to various statistics. This formalism is an extensive generalization of the method used by Matsubara, who derived a weakly nonlinear formula of the genus statistic in a three-dimensional density field. After describing the general method, we apply the formalism to a series of statistics, including genus statistics, level-crossing statistics, Minkowski functionals, and a density extrema statistic, regardless of the dimensions in which each statistic is defined. The relation between the Minkowski functionals and other geometrical statistics is clarified. These statistics can be applied to several cosmic fields, including three-dimensional density field, three-dimensional velocity field, two-dimensional projected density field, and so forth. The results are detailed for second-order theory of the formalism. The effect of the bias is discussed. The statistics of smoothed cosmic fields as functions of rescaled threshold by volume fraction are discussed in the framework of second-order perturbation theory. In CDM-like models, their functional deviations from linear predictions plotted against the rescaled threshold are generally much smaller than that plotted against the direct threshold. There is still a slight meatball shift against rescaled threshold, which is characterized by asymmetry in depths of troughs in the genus curve. A theory-motivated asymmetry factor in the genus curve is proposed.
Pathway analysis with next-generation sequencing data.

PubMed

Zhao, Jinying; Zhu, Yun; Boerwinkle, Eric; Xiong, Momiao

2015-04-01

Although pathway analysis methods have been developed and successfully applied to association studies of common variants, the statistical methods for pathway-based association analysis of rare variants have not been well developed. Many investigators observed highly inflated false-positive rates and low power in pathway-based tests of association of rare variants. The inflated false-positive rates and low true-positive rates of the current methods are mainly due to their lack of ability to account for gametic phase disequilibrium. To overcome these serious limitations, we develop a novel statistic that is based on the smoothed functional principal component analysis (SFPCA) for pathway association tests with next-generation sequencing data. The developed statistic has the ability to capture position-level variant information and account for gametic phase disequilibrium. By intensive simulations, we demonstrate that the SFPCA-based statistic for testing pathway association with either rare or common or both rare and common variants has the correct type 1 error rates. Also the power of the SFPCA-based statistic and 22 additional existing statistics are evaluated. We found that the SFPCA-based statistic has a much higher power than other existing statistics in all the scenarios considered. To further evaluate its performance, the SFPCA-based statistic is applied to pathway analysis of exome sequencing data in the early-onset myocardial infarction (EOMI) project. We identify three pathways significantly associated with EOMI after the Bonferroni correction. In addition, our preliminary results show that the SFPCA-based statistic has much smaller P-values to identify pathway association than other existing methods.
Estimation of Lithological Classification in Taipei Basin: A Bayesian Maximum Entropy Method

NASA Astrophysics Data System (ADS)

Wu, Meng-Ting; Lin, Yuan-Chien; Yu, Hwa-Lung

2015-04-01

In environmental or other scientific applications, we must have a certain understanding of geological lithological composition. Because of restrictions of real conditions, only limited amount of data can be acquired. To find out the lithological distribution in the study area, many spatial statistical methods used to estimate the lithological composition on unsampled points or grids. This study applied the Bayesian Maximum Entropy (BME method), which is an emerging method of the geological spatiotemporal statistics field. The BME method can identify the spatiotemporal correlation of the data, and combine not only the hard data but the soft data to improve estimation. The data of lithological classification is discrete categorical data. Therefore, this research applied Categorical BME to establish a complete three-dimensional Lithological estimation model. Apply the limited hard data from the cores and the soft data generated from the geological dating data and the virtual wells to estimate the three-dimensional lithological classification in Taipei Basin. Keywords: Categorical Bayesian Maximum Entropy method, Lithological Classification, Hydrogeological Setting
Applying Bayesian statistics to the study of psychological trauma: A suggestion for future research.

PubMed

Yalch, Matthew M

2016-03-01

Several contemporary researchers have noted the virtues of Bayesian methods of data analysis. Although debates continue about whether conventional or Bayesian statistics is the "better" approach for researchers in general, there are reasons why Bayesian methods may be well suited to the study of psychological trauma in particular. This article describes how Bayesian statistics offers practical solutions to the problems of data non-normality, small sample size, and missing data common in research on psychological trauma. After a discussion of these problems and the effects they have on trauma research, this article explains the basic philosophical and statistical foundations of Bayesian statistics and how it provides solutions to these problems using an applied example. Results of the literature review and the accompanying example indicates the utility of Bayesian statistics in addressing problems common in trauma research. Bayesian statistics provides a set of methodological tools and a broader philosophical framework that is useful for trauma researchers. Methodological resources are also provided so that interested readers can learn more. (c) 2016 APA, all rights reserved).
A method for classification of multisource data using interval-valued probabilities and its application to HIRIS data

NASA Technical Reports Server (NTRS)

Kim, H.; Swain, P. H.

1991-01-01

A method of classifying multisource data in remote sensing is presented. The proposed method considers each data source as an information source providing a body of evidence, represents statistical evidence by interval-valued probabilities, and uses Dempster's rule to integrate information based on multiple data source. The method is applied to the problems of ground-cover classification of multispectral data combined with digital terrain data such as elevation, slope, and aspect. Then this method is applied to simulated 201-band High Resolution Imaging Spectrometer (HIRIS) data by dividing the dimensionally huge data source into smaller and more manageable pieces based on the global statistical correlation information. It produces higher classification accuracy than the Maximum Likelihood (ML) classification method when the Hughes phenomenon is apparent.
Statistical Method to Overcome Overfitting Issue in Rational Function Models

NASA Astrophysics Data System (ADS)

Alizadeh Moghaddam, S. H.; Mokhtarzade, M.; Alizadeh Naeini, A.; Alizadeh Moghaddam, S. A.

2017-09-01

Rational function models (RFMs) are known as one of the most appealing models which are extensively applied in geometric correction of satellite images and map production. Overfitting is a common issue, in the case of terrain dependent RFMs, that degrades the accuracy of RFMs-derived geospatial products. This issue, resulting from the high number of RFMs' parameters, leads to ill-posedness of the RFMs. To tackle this problem, in this study, a fast and robust statistical approach is proposed and compared to Tikhonov regularization (TR) method, as a frequently-used solution to RFMs' overfitting. In the proposed method, a statistical test, namely, significance test is applied to search for the RFMs' parameters that are resistant against overfitting issue. The performance of the proposed method was evaluated for two real data sets of Cartosat-1 satellite images. The obtained results demonstrate the efficiency of the proposed method in term of the achievable level of accuracy. This technique, indeed, shows an improvement of 50-80% over the TR.
A power comparison of generalized additive models and the spatial scan statistic in a case-control setting.

PubMed

Young, Robin L; Weinberg, Janice; Vieira, Verónica; Ozonoff, Al; Webster, Thomas F

2010-07-19

A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic.
A power comparison of generalized additive models and the spatial scan statistic in a case-control setting

PubMed Central

2010-01-01

Background A common, important problem in spatial epidemiology is measuring and identifying variation in disease risk across a study region. In application of statistical methods, the problem has two parts. First, spatial variation in risk must be detected across the study region and, second, areas of increased or decreased risk must be correctly identified. The location of such areas may give clues to environmental sources of exposure and disease etiology. One statistical method applicable in spatial epidemiologic settings is a generalized additive model (GAM) which can be applied with a bivariate LOESS smoother to account for geographic location as a possible predictor of disease status. A natural hypothesis when applying this method is whether residential location of subjects is associated with the outcome, i.e. is the smoothing term necessary? Permutation tests are a reasonable hypothesis testing method and provide adequate power under a simple alternative hypothesis. These tests have yet to be compared to other spatial statistics. Results This research uses simulated point data generated under three alternative hypotheses to evaluate the properties of the permutation methods and compare them to the popular spatial scan statistic in a case-control setting. Case 1 was a single circular cluster centered in a circular study region. The spatial scan statistic had the highest power though the GAM method estimates did not fall far behind. Case 2 was a single point source located at the center of a circular cluster and Case 3 was a line source at the center of the horizontal axis of a square study region. Each had linearly decreasing logodds with distance from the point. The GAM methods outperformed the scan statistic in Cases 2 and 3. Comparing sensitivity, measured as the proportion of the exposure source correctly identified as high or low risk, the GAM methods outperformed the scan statistic in all three Cases. Conclusions The GAM permutation testing methods provide a regression-based alternative to the spatial scan statistic. Across all hypotheses examined in this research, the GAM methods had competing or greater power estimates and sensitivities exceeding that of the spatial scan statistic. PMID:20642827
REANALYSIS OF F-STATISTIC GRAVITATIONAL-WAVE SEARCHES WITH THE HIGHER CRITICISM STATISTIC

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bennett, M. F.; Melatos, A.; Delaigle, A.

2013-04-01

We propose a new method of gravitational-wave detection using a modified form of higher criticism, a statistical technique introduced by Donoho and Jin. Higher criticism is designed to detect a group of sparse, weak sources, none of which are strong enough to be reliably estimated or detected individually. We apply higher criticism as a second-pass method to synthetic F-statistic and C-statistic data for a monochromatic periodic source in a binary system and quantify the improvement relative to the first-pass methods. We find that higher criticism on C-statistic data is more sensitive by {approx}6% than the C-statistic alone under optimal conditionsmore » (i.e., binary orbit known exactly) and the relative advantage increases as the error in the orbital parameters increases. Higher criticism is robust even when the source is not monochromatic (e.g., phase-wandering in an accreting system). Applying higher criticism to a phase-wandering source over multiple time intervals gives a {approx}> 30% increase in detectability with few assumptions about the frequency evolution. By contrast, in all-sky searches for unknown periodic sources, which are dominated by the brightest source, second-pass higher criticism does not provide any benefits over a first-pass search.« less
Cognitive Transfer Outcomes for a Simulation-Based Introductory Statistics Curriculum

ERIC Educational Resources Information Center

Backman, Matthew D.; Delmas, Robert C.; Garfield, Joan

2017-01-01

Cognitive transfer is the ability to apply learned skills and knowledge to new applications and contexts. This investigation evaluates cognitive transfer outcomes for a tertiary-level introductory statistics course using the CATALST curriculum, which exclusively used simulation-based methods to develop foundations of statistical inference. A…
Trends in study design and the statistical methods employed in a leading general medicine journal.

PubMed

Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

2018-02-01

Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.
Statistical Data Analyses of Trace Chemical, Biochemical, and Physical Analytical Signatures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Udey, Ruth Norma

Analytical and bioanalytical chemistry measurement results are most meaningful when interpreted using rigorous statistical treatments of the data. The same data set may provide many dimensions of information depending on the questions asked through the applied statistical methods. Three principal projects illustrated the wealth of information gained through the application of statistical data analyses to diverse problems.
Teaching Introductory Business Statistics Using the DCOVA Framework

ERIC Educational Resources Information Center

Levine, David M.; Stephan, David F.

2011-01-01

Introductory business statistics students often receive little guidance on how to apply the methods they learn to further business objectives they may one day face. And those students may fail to see the continuity among the topics taught in an introductory course if they learn those methods outside a context that provides a unifying framework.…
Change Detection in Rough Time Series

DTIC Science & Technology

2014-09-01

Business Statistics : An Inferential Approach, Dellen: San Francisco. [18] Winston, W. (1997) Operations Research Applications and Algorithms, Duxbury...distribution that can present significant challenges to conventional statistical tracking techniques. To address this problem the proposed method...applies hybrid fuzzy statistical techniques to series granules instead of to individual measures. Three examples demonstrated the robust nature of the

Statistics and Title VII Proof: Prima Facie Case and Rebuttal.

ERIC Educational Resources Information Center

Whitten, David

1978-01-01

The method and means by which statistics can raise a prima facie case of Title VII violation are analyzed. A standard is identified that can be applied to determine whether a statistical disparity is sufficient to shift the burden to the employer to rebut a prima facie case of discrimination. (LBH)
Interference detection and correction applied to incoherent-scatter radar power spectrum measurement

NASA Technical Reports Server (NTRS)

Ying, W. P.; Mathews, J. D.; Rastogi, P. K.

1986-01-01

A median filter based interference detection and correction technique is evaluated and the method applied to the Arecibo incoherent scatter radar D-region ionospheric power spectrum is discussed. The method can be extended to other kinds of data when the statistics involved in the process are still valid.
Electronic Health Record Systems and Intent to Apply for Meaningful Use Incentives among Office-based Physician ...

MedlinePlus

... on Vital and Health Statistics Annual Reports Health Survey Research Methods Conference Reports from the National Medical Care Utilization and Expenditure Survey Clearinghouse on Health Indexes Statistical Notes for Health ...
Anomaly detection of turbopump vibration in Space Shuttle Main Engine using statistics and neural networks

NASA Technical Reports Server (NTRS)

Lo, C. F.; Wu, K.; Whitehead, B. A.

1993-01-01

The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.
Paroxysmal atrial fibrillation prediction method with shorter HRV sequences.

PubMed

Boon, K H; Khalil-Hani, M; Malarvili, M B; Sia, C W

2016-10-01

This paper proposes a method that predicts the onset of paroxysmal atrial fibrillation (PAF), using heart rate variability (HRV) segments that are shorter than those applied in existing methods, while maintaining good prediction accuracy. PAF is a common cardiac arrhythmia that increases the health risk of a patient, and the development of an accurate predictor of the onset of PAF is clinical important because it increases the possibility to stabilize (electrically) and prevent the onset of atrial arrhythmias with different pacing techniques. We investigate the effect of HRV features extracted from different lengths of HRV segments prior to PAF onset with the proposed PAF prediction method. The pre-processing stage of the predictor includes QRS detection, HRV quantification and ectopic beat correction. Time-domain, frequency-domain, non-linear and bispectrum features are then extracted from the quantified HRV. In the feature selection, the HRV feature set and classifier parameters are optimized simultaneously using an optimization procedure based on genetic algorithm (GA). Both full feature set and statistically significant feature subset are optimized by GA respectively. For the statistically significant feature subset, Mann-Whitney U test is used to filter non-statistical significance features that cannot pass the statistical test at 20% significant level. The final stage of our predictor is the classifier that is based on support vector machine (SVM). A 10-fold cross-validation is applied in performance evaluation, and the proposed method achieves 79.3% prediction accuracy using 15-minutes HRV segment. This accuracy is comparable to that achieved by existing methods that use 30-minutes HRV segments, most of which achieves accuracy of around 80%. More importantly, our method significantly outperforms those that applied segments shorter than 30 minutes. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
[Application of Stata software to test heterogeneity in meta-analysis method].

PubMed

Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

2008-07-01

To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.
Statistics in three biomedical journals.

PubMed

Pilcík, T

2003-01-01

In this paper we analyze the use of statistics and associated problems, in three Czech biological journals in the year 2000. We investigated 23 articles Folia Biologica, 60 articles in Folia Microbiologica, and 88 articles in Physiological Research. The highest frequency of publications with statistical content have used descriptive statistics and t-test. The most usual mistake concerns the absence of reference about the used statistical software and insufficient description of the data. We have compared our results with the results of similar studies in some other medical journals. The use of important statistical methods is comparable with those used in most medical journals, the proportion of articles, in which the applied method is described insufficiently is moderately low.
Comparative analysis of ferroelectric domain statistics via nonlinear diffraction in random nonlinear materials.

PubMed

Wang, B; Switowski, K; Cojocaru, C; Roppo, V; Sheng, Y; Scalora, M; Kisielewski, J; Pawlak, D; Vilaseca, R; Akhouayri, H; Krolikowski, W; Trull, J

2018-01-22

We present an indirect, non-destructive optical method for domain statistic characterization in disordered nonlinear crystals having homogeneous refractive index and spatially random distribution of ferroelectric domains. This method relies on the analysis of the wave-dependent spatial distribution of the second harmonic, in the plane perpendicular to the optical axis in combination with numerical simulations. We apply this technique to the characterization of two different media, Calcium Barium Niobate and Strontium Barium Niobate, with drastically different statistical distributions of ferroelectric domains.
Archaeology Through Computational Linguistics: Inscription Statistics Predict Excavation Sites of Indus Valley Artifacts.

PubMed

Recchia, Gabriel L; Louwerse, Max M

2016-11-01

Computational techniques comparing co-occurrences of city names in texts allow the relative longitudes and latitudes of cities to be estimated algorithmically. However, these techniques have not been applied to estimate the provenance of artifacts with unknown origins. Here, we estimate the geographic origin of artifacts from the Indus Valley Civilization, applying methods commonly used in cognitive science to the Indus script. We show that these methods can accurately predict the relative locations of archeological sites on the basis of artifacts of known provenance, and we further apply these techniques to determine the most probable excavation sites of four sealings of unknown provenance. These findings suggest that inscription statistics reflect historical interactions among locations in the Indus Valley region, and they illustrate how computational methods can help localize inscribed archeological artifacts of unknown origin. The success of this method offers opportunities for the cognitive sciences in general and for computational anthropology specifically. Copyright © 2015 Cognitive Science Society, Inc.
SSD for R: A Comprehensive Statistical Package to Analyze Single-System Data

ERIC Educational Resources Information Center

Auerbach, Charles; Schudrich, Wendy Zeitlin

2013-01-01

The need for statistical analysis in single-subject designs presents a challenge, as analytical methods that are applied to group comparison studies are often not appropriate in single-subject research. "SSD for R" is a robust set of statistical functions with wide applicability to single-subject research. It is a comprehensive package…
Fear and loathing: undergraduate nursing students' experiences of a mandatory course in applied statistics.

PubMed

Hagen, Brad; Awosoga, Oluwagbohunmi A; Kellett, Peter; Damgaard, Marie

2013-04-23

This article describes the results of a qualitative research study evaluating nursing students' experiences of a mandatory course in applied statistics, and the perceived effectiveness of teaching methods implemented during the course. Fifteen nursing students in the third year of a four-year baccalaureate program in nursing participated in focus groups before and after taking the mandatory course in statistics. The interviews were transcribed and analyzed using content analysis to reveal four major themes: (i) "one of those courses you throw out?," (ii) "numbers and terrifying equations," (iii) "first aid for statistics casualties," and (iv) "re-thinking curriculum." Overall, the data revealed that although nursing students initially enter statistics courses with considerable skepticism, fear, and anxiety, there are a number of concrete actions statistics instructors can take to reduce student fear and increase the perceived relevance of courses in statistics.
Does bad inference drive out good?

PubMed

Marozzi, Marco

2015-07-01

The (mis)use of statistics in practice is widely debated, and a field where the debate is particularly active is medicine. Many scholars emphasize that a large proportion of published medical research contains statistical errors. It has been noted that top class journals like Nature Medicine and The New England Journal of Medicine publish a considerable proportion of papers that contain statistical errors and poorly document the application of statistical methods. This paper joins the debate on the (mis)use of statistics in the medical literature. Even though the validation process of a statistical result may be quite elusive, a careful assessment of underlying assumptions is central in medicine as well as in other fields where a statistical method is applied. Unfortunately, a careful assessment of underlying assumptions is missing in many papers, including those published in top class journals. In this paper, it is shown that nonparametric methods are good alternatives to parametric methods when the assumptions for the latter ones are not satisfied. A key point to solve the problem of the misuse of statistics in the medical literature is that all journals have their own statisticians to review the statistical method/analysis section in each submitted paper. © 2015 Wiley Publishing Asia Pty Ltd.
A UNIFYING CONCEPT FOR ASSESSING TOXICOLOGICAL INTERACTIONS: CHANGES IN SLOPE

EPA Science Inventory

Robust statistical methods are important to the evaluation of interactions among chemicals in a mixture. However, different concepts of interaction as applied to the statistical analysis of chemical mixture toxicology data or as used in environmental risk assessment often can ap...
Applied statistics in agricultural, biological, and environmental sciences.

USDA-ARS?s Scientific Manuscript database

Agronomic research often involves measurement and collection of multiple response variables in an effort to understand the more complex nature of the system being studied. Multivariate statistical methods encompass the simultaneous analysis of all random variables measured on each experimental or s...
Text mining by Tsallis entropy

NASA Astrophysics Data System (ADS)

Jamaati, Maryam; Mehri, Ali

2018-01-01

Long-range correlations between the elements of natural languages enable them to convey very complex information. Complex structure of human language, as a manifestation of natural languages, motivates us to apply nonextensive statistical mechanics in text mining. Tsallis entropy appropriately ranks the terms' relevance to document subject, taking advantage of their spatial correlation length. We apply this statistical concept as a new powerful word ranking metric in order to extract keywords of a single document. We carry out an experimental evaluation, which shows capability of the presented method in keyword extraction. We find that, Tsallis entropy has reliable word ranking performance, at the same level of the best previous ranking methods.
Statistical iterative reconstruction for streak artefact reduction when using multidetector CT to image the dento-alveolar structures.

PubMed

Dong, J; Hayakawa, Y; Kober, C

2014-01-01

When metallic prosthetic appliances and dental fillings exist in the oral cavity, the appearance of metal-induced streak artefacts is not avoidable in CT images. The aim of this study was to develop a method for artefact reduction using the statistical reconstruction on multidetector row CT images. Adjacent CT images often depict similar anatomical structures. Therefore, reconstructed images with weak artefacts were attempted using projection data of an artefact-free image in a neighbouring thin slice. Images with moderate and strong artefacts were continuously processed in sequence by successive iterative restoration where the projection data was generated from the adjacent reconstructed slice. First, the basic maximum likelihood-expectation maximization algorithm was applied. Next, the ordered subset-expectation maximization algorithm was examined. Alternatively, a small region of interest setting was designated. Finally, the general purpose graphic processing unit machine was applied in both situations. The algorithms reduced the metal-induced streak artefacts on multidetector row CT images when the sequential processing method was applied. The ordered subset-expectation maximization and small region of interest reduced the processing duration without apparent detriments. A general-purpose graphic processing unit realized the high performance. A statistical reconstruction method was applied for the streak artefact reduction. The alternative algorithms applied were effective. Both software and hardware tools, such as ordered subset-expectation maximization, small region of interest and general-purpose graphic processing unit achieved fast artefact correction.
Gene flow analysis method, the D-statistic, is robust in a wide parameter space.

PubMed

Zheng, Yichen; Janke, Axel

2018-01-08

We evaluated the sensitivity of the D-statistic, a parsimony-like method widely used to detect gene flow between closely related species. This method has been applied to a variety of taxa with a wide range of divergence times. However, its parameter space and thus its applicability to a wide taxonomic range has not been systematically studied. Divergence time, population size, time of gene flow, distance of outgroup and number of loci were examined in a sensitivity analysis. The sensitivity study shows that the primary determinant of the D-statistic is the relative population size, i.e. the population size scaled by the number of generations since divergence. This is consistent with the fact that the main confounding factor in gene flow detection is incomplete lineage sorting by diluting the signal. The sensitivity of the D-statistic is also affected by the direction of gene flow, size and number of loci. In addition, we examined the ability of the f-statistics, [Formula: see text] and [Formula: see text], to estimate the fraction of a genome affected by gene flow; while these statistics are difficult to implement to practical questions in biology due to lack of knowledge of when the gene flow happened, they can be used to compare datasets with identical or similar demographic background. The D-statistic, as a method to detect gene flow, is robust against a wide range of genetic distances (divergence times) but it is sensitive to population size. The D-statistic should only be applied with critical reservation to taxa where population sizes are large relative to branch lengths in generations.
A glossary for big data in population and public health: discussion and commentary on terminology and research methods.

PubMed

Fuller, Daniel; Buote, Richard; Stanley, Kevin

2017-11-01

The volume and velocity of data are growing rapidly and big data analytics are being applied to these data in many fields. Population and public health researchers may be unfamiliar with the terminology and statistical methods used in big data. This creates a barrier to the application of big data analytics. The purpose of this glossary is to define terms used in big data and big data analytics and to contextualise these terms. We define the five Vs of big data and provide definitions and distinctions for data mining, machine learning and deep learning, among other terms. We provide key distinctions between big data and statistical analysis methods applied to big data. We contextualise the glossary by providing examples where big data analysis methods have been applied to population and public health research problems and provide brief guidance on how to learn big data analysis methods. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2017. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Statistical Prediction in Proprietary Rehabilitation.

ERIC Educational Resources Information Center

Johnson, Kurt L.; And Others

1987-01-01

Applied statistical methods to predict case expenditures for low back pain rehabilitation cases in proprietary rehabilitation. Extracted predictor variables from case records of 175 workers compensation claimants with some degree of permanent disability due to back injury. Performed several multiple regression analyses resulting in a formula that…
Bayesian Statistics for Biological Data: Pedigree Analysis

ERIC Educational Resources Information Center

Stanfield, William D.; Carlton, Matthew A.

2004-01-01

The use of Bayes' formula is applied to the biological problem of pedigree analysis to show that the Bayes' formula and non-Bayesian or "classical" methods of probability calculation give different answers. First year college students of biology can be introduced to the Bayesian statistics.

Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

DTIC Science & Technology

2010-06-01

Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler
The Effect on the 8th Grade Students' Attitude towards Statistics of Project Based Learning

ERIC Educational Resources Information Center

Koparan, Timur; Güven, Bülent

2014-01-01

This study investigates the effect of the project based learning approach on 8th grade students' attitude towards statistics. With this aim, an attitude scale towards statistics was developed. Quasi-experimental research model was used in this study. Following this model in the control group the traditional method was applied to teach statistics…
Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

PubMed

Schaid, Daniel J

2010-01-01

Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
Speckle reduction in optical coherence tomography by adaptive total variation method

NASA Astrophysics Data System (ADS)

Wu, Tong; Shi, Yaoyao; Liu, Youwen; He, Chongjun

2015-12-01

An adaptive total variation method based on the combination of speckle statistics and total variation restoration is proposed and developed for reducing speckle noise in optical coherence tomography (OCT) images. The statistical distribution of the speckle noise in OCT image is investigated and measured. With the measured parameters such as the mean value and variance of the speckle noise, the OCT image is restored by the adaptive total variation restoration method. The adaptive total variation restoration algorithm was applied to the OCT images of a volunteer's hand skin, which showed effective speckle noise reduction and image quality improvement. For image quality comparison, the commonly used median filtering method was also applied to the same images to reduce the speckle noise. The measured results demonstrate the superior performance of the adaptive total variation restoration method in terms of image signal-to-noise ratio, equivalent number of looks, contrast-to-noise ratio, and mean square error.
Analysis of Statistical Methods Currently used in Toxicology Journals

PubMed Central

Na, Jihye; Yang, Hyeri

2014-01-01

Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health. PMID:25343012
Analysis of Statistical Methods Currently used in Toxicology Journals.

PubMed

Na, Jihye; Yang, Hyeri; Bae, SeungJin; Lim, Kyung-Min

2014-09-01

Statistical methods are frequently used in toxicology, yet it is not clear whether the methods employed by the studies are used consistently and conducted based on sound statistical grounds. The purpose of this paper is to describe statistical methods used in top toxicology journals. More specifically, we sampled 30 papers published in 2014 from Toxicology and Applied Pharmacology, Archives of Toxicology, and Toxicological Science and described methodologies used to provide descriptive and inferential statistics. One hundred thirteen endpoints were observed in those 30 papers, and most studies had sample size less than 10, with the median and the mode being 6 and 3 & 6, respectively. Mean (105/113, 93%) was dominantly used to measure central tendency, and standard error of the mean (64/113, 57%) and standard deviation (39/113, 34%) were used to measure dispersion, while few studies provide justifications regarding why the methods being selected. Inferential statistics were frequently conducted (93/113, 82%), with one-way ANOVA being most popular (52/93, 56%), yet few studies conducted either normality or equal variance test. These results suggest that more consistent and appropriate use of statistical method is necessary which may enhance the role of toxicology in public health.
The potential for increased power from combining P-values testing the same hypothesis.

PubMed

Ganju, Jitendra; Julie Ma, Guoguang

2017-02-01

The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.
Statistical Model Selection for TID Hardness Assurance

NASA Technical Reports Server (NTRS)

Ladbury, R.; Gorelick, J. L.; McClure, S.

2010-01-01

Radiation Hardness Assurance (RHA) methodologies against Total Ionizing Dose (TID) degradation impose rigorous statistical treatments for data from a part's Radiation Lot Acceptance Test (RLAT) and/or its historical performance. However, no similar methods exist for using "similarity" data - that is, data for similar parts fabricated in the same process as the part under qualification. This is despite the greater difficulty and potential risk in interpreting of similarity data. In this work, we develop methods to disentangle part-to-part, lot-to-lot and part-type-to-part-type variation. The methods we develop apply not just for qualification decisions, but also for quality control and detection of process changes and other "out-of-family" behavior. We begin by discussing the data used in ·the study and the challenges of developing a statistic providing a meaningful measure of degradation across multiple part types, each with its own performance specifications. We then develop analysis techniques and apply them to the different data sets.
Chi-squared and C statistic minimization for low count per bin data

NASA Astrophysics Data System (ADS)

Nousek, John A.; Shue, David R.

1989-07-01

Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy

NASA Technical Reports Server (NTRS)

Nousek, John A.; Shue, David R.

1989-01-01

Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Neural network approaches versus statistical methods in classification of multisource remote sensing data

NASA Technical Reports Server (NTRS)

Benediktsson, Jon A.; Swain, Philip H.; Ersoy, Okan K.

1990-01-01

Neural network learning procedures and statistical classificaiton methods are applied and compared empirically in classification of multisource remote sensing and geographic data. Statistical multisource classification by means of a method based on Bayesian classification theory is also investigated and modified. The modifications permit control of the influence of the data sources involved in the classification process. Reliability measures are introduced to rank the quality of the data sources. The data sources are then weighted according to these rankings in the statistical multisource classification. Four data sources are used in experiments: Landsat MSS data and three forms of topographic data (elevation, slope, and aspect). Experimental results show that two different approaches have unique advantages and disadvantages in this classification application.
SHARE: system design and case studies for statistical health information release

PubMed Central

Gardner, James; Xiong, Li; Xiao, Yonghui; Gao, Jingjing; Post, Andrew R; Jiang, Xiaoqian; Ohno-Machado, Lucila

2013-01-01

Objectives We present SHARE, a new system for statistical health information release with differential privacy. We present two case studies that evaluate the software on real medical datasets and demonstrate the feasibility and utility of applying the differential privacy framework on biomedical data. Materials and Methods SHARE releases statistical information in electronic health records with differential privacy, a strong privacy framework for statistical data release. It includes a number of state-of-the-art methods for releasing multidimensional histograms and longitudinal patterns. We performed a variety of experiments on two real datasets, the surveillance, epidemiology and end results (SEER) breast cancer dataset and the Emory electronic medical record (EeMR) dataset, to demonstrate the feasibility and utility of SHARE. Results Experimental results indicate that SHARE can deal with heterogeneous data present in medical data, and that the released statistics are useful. The Kullback–Leibler divergence between the released multidimensional histograms and the original data distribution is below 0.5 and 0.01 for seven-dimensional and three-dimensional data cubes generated from the SEER dataset, respectively. The relative error for longitudinal pattern queries on the EeMR dataset varies between 0 and 0.3. While the results are promising, they also suggest that challenges remain in applying statistical data release using the differential privacy framework for higher dimensional data. Conclusions SHARE is one of the first systems to provide a mechanism for custodians to release differentially private aggregate statistics for a variety of use cases in the medical domain. This proof-of-concept system is intended to be applied to large-scale medical data warehouses. PMID:23059729
Two Paradoxes in Linear Regression Analysis.

PubMed

Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

2016-12-25

Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
Markov Logic Networks in the Analysis of Genetic Data

PubMed Central

Sakhanenko, Nikita A.

2010-01-01

Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Statistical analysis of flight times for space shuttle ferry flights

NASA Technical Reports Server (NTRS)

Graves, M. E.; Perlmutter, M.

1974-01-01

Markov chain and Monte Carlo analysis techniques are applied to the simulated Space Shuttle Orbiter Ferry flights to obtain statistical distributions of flight time duration between Edwards Air Force Base and Kennedy Space Center. The two methods are compared, and are found to be in excellent agreement. The flights are subjected to certain operational and meteorological requirements, or constraints, which cause eastbound and westbound trips to yield different results. Persistence of events theory is applied to the occurrence of inclement conditions to find their effect upon the statistical flight time distribution. In a sensitivity test, some of the constraints are varied to observe the corresponding changes in the results.
Individualized statistical learning from medical image databases: application to identification of brain lesions.

PubMed

Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos

2014-04-01

This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Individualized Statistical Learning from Medical Image Databases: Application to Identification of Brain Lesions

PubMed Central

Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos

2014-01-01

This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
A Selective Overview of Variable Selection in High Dimensional Feature Space

PubMed Central

Fan, Jianqing

2010-01-01

High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
Performance Characterization of an Instrument.

ERIC Educational Resources Information Center

Salin, Eric D.

1984-01-01

Describes an experiment designed to teach students to apply the same statistical awareness to instrumentation they commonly apply to classical techniques. Uses propagation of error techniques to pinpoint instrumental limitations and breakdowns and to demonstrate capabilities and limitations of volumetric and gravimetric methods. Provides lists of…
Scientific computations section monthly report, November 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Buckner, M.R.

1993-12-30

This progress report from the Savannah River Technology Center contains abstracts from papers from the computational modeling, applied statistics, applied physics, experimental thermal hydraulics, and packaging and transportation groups. Specific topics covered include: engineering modeling and process simulation, criticality methods and analysis, plutonium disposition.

Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis

PubMed Central

Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

2015-01-01

Background Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. Objectives This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. Methods We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. Results There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. Conclusion The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent. PMID:26053876
Drug Adverse Event Detection in Health Plan Data Using the Gamma Poisson Shrinker and Comparison to the Tree-based Scan Statistic

PubMed Central

Brown, Jeffrey S.; Petronis, Kenneth R.; Bate, Andrew; Zhang, Fang; Dashevsky, Inna; Kulldorff, Martin; Avery, Taliser R.; Davis, Robert L.; Chan, K. Arnold; Andrade, Susan E.; Boudreau, Denise; Gunter, Margaret J.; Herrinton, Lisa; Pawloski, Pamala A.; Raebel, Marsha A.; Roblin, Douglas; Smith, David; Reynolds, Robert

2013-01-01

Background: Drug adverse event (AE) signal detection using the Gamma Poisson Shrinker (GPS) is commonly applied in spontaneous reporting. AE signal detection using large observational health plan databases can expand medication safety surveillance. Methods: Using data from nine health plans, we conducted a pilot study to evaluate the implementation and findings of the GPS approach for two antifungal drugs, terbinafine and itraconazole, and two diabetes drugs, pioglitazone and rosiglitazone. We evaluated 1676 diagnosis codes grouped into 183 different clinical concepts and four levels of granularity. Several signaling thresholds were assessed. GPS results were compared to findings from a companion study using the identical analytic dataset but an alternative statistical method—the tree-based scan statistic (TreeScan). Results: We identified 71 statistical signals across two signaling thresholds and two methods, including closely-related signals of overlapping diagnosis definitions. Initial review found that most signals represented known adverse drug reactions or confounding. About 31% of signals met the highest signaling threshold. Conclusions: The GPS method was successfully applied to observational health plan data in a distributed data environment as a drug safety data mining method. There was substantial concordance between the GPS and TreeScan approaches. Key method implementation decisions relate to defining exposures and outcomes and informed choice of signaling thresholds. PMID:24300404
Transforming Graph Data for Statistical Relational Learning

DTIC Science & Technology

2012-10-01

Jordan, 2003), PLSA (Hofmann, 1999), ? Classification via RMN (Taskar et al., 2003) or SVM (Hasan, Chaoji, Salem , & Zaki, 2006) ? Hierarchical...dimensionality reduction methods such as Principal 407 Rossi, McDowell, Aha, & Neville Component Analysis (PCA), Principal Factor Analysis ( PFA ), and...clustering algorithm. Journal of the Royal Statistical Society. Series C, Applied statistics, 28, 100–108. Hasan, M. A., Chaoji, V., Salem , S., & Zaki, M
Nowcasting Cloud Fields for U.S. Air Force Special Operations

DTIC Science & Technology

2017-03-01

application of Bayes’ Rule offers many advantages over Kernel Density Estimation (KDE) and other commonly used statistical post-processing methods...reflectance and probability of cloud. A statistical post-processing technique is applied using Bayesian estimation to train the system from a set of past...nowcasting, low cloud forecasting, cloud reflectance, ISR, Bayesian estimation, statistical post-processing, machine learning 15. NUMBER OF PAGES
Two Paradoxes in Linear Regression Analysis

PubMed Central

FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

2016-01-01

Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Genetic structure of populations and differentiation in forest trees

Treesearch

Raymond P. Guries; F. Thomas Ledig

1981-01-01

Electrophoretic techniques permit population biologists to analyze genetic structure of natural populations by using large numbers of allozyme loci. Several methods of analysis have been applied to allozyme data, including chi-square contingency tests, F-statistics, and genetic distance. This paper compares such statistics for pitch pine (Pinus rigida...
Survival analysis, or what to do with upper limits in astronomical surveys

NASA Technical Reports Server (NTRS)

Isobe, Takashi; Feigelson, Eric D.

1986-01-01

A field of applied statistics called survival analysis has been developed over several decades to deal with censored data, which occur in astronomical surveys when objects are too faint to be detected. How these methods can assist in the statistical interpretation of astronomical data are reviewed.
Cause-specific mortality time series analysis: a general method to detect and correct for abrupt data production changes

PubMed Central

2011-01-01

Background Monitoring the time course of mortality by cause is a key public health issue. However, several mortality data production changes may affect cause-specific time trends, thus altering the interpretation. This paper proposes a statistical method that detects abrupt changes ("jumps") and estimates correction factors that may be used for further analysis. Methods The method was applied to a subset of the AMIEHS (Avoidable Mortality in the European Union, toward better Indicators for the Effectiveness of Health Systems) project mortality database and considered for six European countries and 13 selected causes of deaths. For each country and cause of death, an automated jump detection method called Polydect was applied to the log mortality rate time series. The plausibility of a data production change associated with each detected jump was evaluated through literature search or feedback obtained from the national data producers. For each plausible jump position, the statistical significance of the between-age and between-gender jump amplitude heterogeneity was evaluated by means of a generalized additive regression model, and correction factors were deduced from the results. Results Forty-nine jumps were detected by the Polydect method from 1970 to 2005. Most of the detected jumps were found to be plausible. The age- and gender-specific amplitudes of the jumps were estimated when they were statistically heterogeneous, and they showed greater by-age heterogeneity than by-gender heterogeneity. Conclusion The method presented in this paper was successfully applied to a large set of causes of death and countries. The method appears to be an alternative to bridge coding methods when the latter are not systematically implemented because they are time- and resource-consuming. PMID:21929756
Statistical study of generalized nonlinear phase step estimation methods in phase-shifting interferometry.

PubMed

Langoju, Rajesh; Patil, Abhijit; Rastogi, Pramod

2007-11-20

Signal processing methods based on maximum-likelihood theory, discrete chirp Fourier transform, and spectral estimation methods have enabled accurate measurement of phase in phase-shifting interferometry in the presence of nonlinear response of the piezoelectric transducer to the applied voltage. We present the statistical study of these generalized nonlinear phase step estimation methods to identify the best method by deriving the Cramér-Rao bound. We also address important aspects of these methods for implementation in practical applications and compare the performance of the best-identified method with other bench marking algorithms in the presence of harmonics and noise.
Expectation maximization for hard X-ray count modulation profiles

NASA Astrophysics Data System (ADS)

Benvenuto, F.; Schwartz, R.; Piana, M.; Massone, A. M.

2013-07-01

Context. This paper is concerned with the image reconstruction problem when the measured data are solar hard X-ray modulation profiles obtained from the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI) instrument. Aims: Our goal is to demonstrate that a statistical iterative method classically applied to the image deconvolution problem is very effective when utilized to analyze count modulation profiles in solar hard X-ray imaging based on rotating modulation collimators. Methods: The algorithm described in this paper solves the maximum likelihood problem iteratively and encodes a positivity constraint into the iterative optimization scheme. The result is therefore a classical expectation maximization method this time applied not to an image deconvolution problem but to image reconstruction from count modulation profiles. The technical reason that makes our implementation particularly effective in this application is the use of a very reliable stopping rule which is able to regularize the solution providing, at the same time, a very satisfactory Cash-statistic (C-statistic). Results: The method is applied to both reproduce synthetic flaring configurations and reconstruct images from experimental data corresponding to three real events. In this second case, the performance of expectation maximization, when compared to Pixon image reconstruction, shows a comparable accuracy and a notably reduced computational burden; when compared to CLEAN, shows a better fidelity with respect to the measurements with a comparable computational effectiveness. Conclusions: If optimally stopped, expectation maximization represents a very reliable method for image reconstruction in the RHESSI context when count modulation profiles are used as input data.
Methods of editing cloud and atmospheric layer affected pixels from satellite data

NASA Technical Reports Server (NTRS)

Nixon, P. R. (Principal Investigator); Wiegand, C. L.; Richardson, A. J.; Johnson, M. P.

1982-01-01

Practical methods of computer screening cloud-contaminated pixels from data of various satellite systems are proposed. Examples are given of the location of clouds and representative landscape features in HCMM spectral space of reflectance (VIS) vs emission (IR). Methods of screening out cloud affected HCMM are discussed. The character of subvisible absorbing-emitting atmospheric layers (subvisible cirrus or SCi) in HCMM data is considered and radiosonde soundings are examined in relation to the presence of SCi. The statistical characteristics of multispectral meteorological satellite data in clear and SCi affected areas are discussed. Examples in TIROS-N and NOAA-7 data from several states and Mexico are presented. The VIS-IR cluster screening method for removing clouds is applied to a 262, 144 pixel HCMM scene from south Texas and northeast Mexico. The SCi that remain after cluster screening are sited out by applying a statistically determined IR limit.
Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics.

PubMed

Cao, Hui; Markatou, Marianthi; Melton, Genevieve B; Chiang, Michael F; Hripcsak, George

2005-01-01

This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, chi2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by chi2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-chi2, KB-PCI). An extrinsic evaluation showed that both KB-chi2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system.
A Lattice Boltzmann Method for Turbomachinery Simulations

NASA Technical Reports Server (NTRS)

Hsu, A. T.; Lopez, I.

2003-01-01

Lattice Boltzmann (LB) Method is a relatively new method for flow simulations. The start point of LB method is statistic mechanics and Boltzmann equation. The LB method tries to set up its model at molecular scale and simulate the flow at macroscopic scale. LBM has been applied to mostly incompressible flows and simple geometry.
A Statistical Approach for the Concurrent Coupling of Molecular Dynamics and Finite Element Methods

NASA Technical Reports Server (NTRS)

Saether, E.; Yamakov, V.; Glaessgen, E.

2007-01-01

Molecular dynamics (MD) methods are opening new opportunities for simulating the fundamental processes of material behavior at the atomistic level. However, increasing the size of the MD domain quickly presents intractable computational demands. A robust approach to surmount this computational limitation has been to unite continuum modeling procedures such as the finite element method (FEM) with MD analyses thereby reducing the region of atomic scale refinement. The challenging problem is to seamlessly connect the two inherently different simulation techniques at their interface. In the present work, a new approach to MD-FEM coupling is developed based on a restatement of the typical boundary value problem used to define a coupled domain. The method uses statistical averaging of the atomistic MD domain to provide displacement interface boundary conditions to the surrounding continuum FEM region, which, in return, generates interface reaction forces applied as piecewise constant traction boundary conditions to the MD domain. The two systems are computationally disconnected and communicate only through a continuous update of their boundary conditions. With the use of statistical averages of the atomistic quantities to couple the two computational schemes, the developed approach is referred to as an embedded statistical coupling method (ESCM) as opposed to a direct coupling method where interface atoms and FEM nodes are individually related. The methodology is inherently applicable to three-dimensional domains, avoids discretization of the continuum model down to atomic scales, and permits arbitrary temperatures to be applied.
A Statistical Method for Reducing Sidelobe Clutter for the Ku-Band Precipitation Radar on Board the GPM Core Observatory

NASA Technical Reports Server (NTRS)

Kubota, Takuji; Iguchi, Toshio; Kojima, Masahiro; Liao, Liang; Masaki, Takeshi; Hanado, Hiroshi; Meneghini, Robert; Oki, Riko

2016-01-01

A statistical method to reduce the sidelobe clutter of the Ku-band precipitation radar (KuPR) of the Dual-Frequency Precipitation Radar (DPR) on board the Global Precipitation Measurement (GPM) Core Observatory is described and evaluated using DPR observations. The KuPR sidelobe clutter was much more severe than that of the Precipitation Radar on board the Tropical Rainfall Measuring Mission (TRMM), and it has caused the misidentification of precipitation. The statistical method to reduce sidelobe clutter was constructed by subtracting the estimated sidelobe power, based upon a multiple regression model with explanatory variables of the normalized radar cross section (NRCS) of surface, from the received power of the echo. The saturation of the NRCS at near-nadir angles, resulting from strong surface scattering, was considered in the calculation of the regression coefficients.The method was implemented in the KuPR algorithm and applied to KuPR-observed data. It was found that the received power from sidelobe clutter over the ocean was largely reduced by using the developed method, although some of the received power from the sidelobe clutter still remained. From the statistical results of the evaluations, it was shown that the number of KuPR precipitation events in the clutter region, after the method was applied, was comparable to that in the clutter-free region. This confirms the reasonable performance of the method in removing sidelobe clutter. For further improving the effectiveness of the method, it is necessary to improve the consideration of the NRCS saturation, which will be explored in future work.
[Statistical analysis of articles in "Chinese journal of applied physiology" from 1999 to 2008].

PubMed

Du, Fei; Fang, Tao; Ge, Xue-ming; Jin, Peng; Zhang, Xiao-hong; Sun, Jin-li

2010-05-01

To evaluate the academic level and influence of "Chinese Journal of Applied Physiology" through statistical analysis for the fund sponsored articles published in the recent ten years. The articles of "Chinese Journal of Applied Physiology" from 1999 to 2008 were investigated. The number and the percentage of the fund sponsored articles, the fund organization and the author region were quantitatively analyzed by using the literature metrology method. The number of the fund sponsored articles increased unceasingly. The ratio of the fund from local government significantly enhanced in the latter five years. Most of the articles were from institutes located at Beijing, Zhejiang and Tianjin. "Chinese Journal of Applied Physiology" has a fine academic level and social influence.
Evaluating the statistical performance of less applied algorithms in classification of worldview-3 imagery data in an urbanized landscape

NASA Astrophysics Data System (ADS)

Ranaie, Mehrdad; Soffianian, Alireza; Pourmanafi, Saeid; Mirghaffari, Noorollah; Tarkesh, Mostafa

2018-03-01

In recent decade, analyzing the remotely sensed imagery is considered as one of the most common and widely used procedures in the environmental studies. In this case, supervised image classification techniques play a central role. Hence, taking a high resolution Worldview-3 over a mixed urbanized landscape in Iran, three less applied image classification methods including Bagged CART, Stochastic gradient boosting model and Neural network with feature extraction were tested and compared with two prevalent methods: random forest and support vector machine with linear kernel. To do so, each method was run ten time and three validation techniques was used to estimate the accuracy statistics consist of cross validation, independent validation and validation with total of train data. Moreover, using ANOVA and Tukey test, statistical difference significance between the classification methods was significantly surveyed. In general, the results showed that random forest with marginal difference compared to Bagged CART and stochastic gradient boosting model is the best performing method whilst based on independent validation there was no significant difference between the performances of classification methods. It should be finally noted that neural network with feature extraction and linear support vector machine had better processing speed than other.
Overview of the SAMSI year-long program on Statistical, Mathematical and Computational Methods for Astronomy

NASA Astrophysics Data System (ADS)

Jogesh Babu, G.

2017-01-01

A year-long research (Aug 2016- May 2017) program on `Statistical, Mathematical and Computational Methods for Astronomy (ASTRO)’ is well under way at Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation research institute in Research Triangle Park, NC. This program has brought together astronomers, computer scientists, applied mathematicians and statisticians. The main aims of this program are: to foster cross-disciplinary activities; to accelerate the adoption of modern statistical and mathematical tools into modern astronomy; and to develop new tools needed for important astronomical research problems. The program provides multiple avenues for cross-disciplinary interactions, including several workshops, long-term visitors, and regular teleconferences, so participants can continue collaborations, even if they can only spend limited time in residence at SAMSI. The main program is organized around five working groups:i) Uncertainty Quantification and Astrophysical Emulationii) Synoptic Time Domain Surveysiii) Multivariate and Irregularly Sampled Time Seriesiv) Astrophysical Populationsv) Statistics, computation, and modeling in cosmology.A brief description of each of the work under way by these groups will be given. Overlaps among various working groups will also be highlighted. How the wider astronomy community can both participate and benefit from the activities, will be briefly mentioned.

Examination of two methods for statistical analysis of data with magnitude and direction emphasizing vestibular research applications

NASA Technical Reports Server (NTRS)

Calkins, D. S.

1998-01-01

When the dependent (or response) variable response variable in an experiment has direction and magnitude, one approach that has been used for statistical analysis involves splitting magnitude and direction and applying univariate statistical techniques to the components. However, such treatment of quantities with direction and magnitude is not justifiable mathematically and can lead to incorrect conclusions about relationships among variables and, as a result, to flawed interpretations. This note discusses a problem with that practice and recommends mathematically correct procedures to be used with dependent variables that have direction and magnitude for 1) computation of mean values, 2) statistical contrasts of and confidence intervals for means, and 3) correlation methods.
A spatial scan statistic for survival data based on Weibull distribution.

PubMed

Bhatt, Vijaya; Tiwari, Neeraj

2014-05-20

The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.
Some Epistemological Considerations Concerning Quantitative Analysis

ERIC Educational Resources Information Center

Dobrescu, Emilian

2008-01-01

This article presents the author's address at the 2007 "Journal of Applied Quantitative Methods" ("JAQM") prize awarding festivity. The festivity was included in the opening of the 4th International Conference on Applied Statistics, November 22, 2008, Bucharest, Romania. In the address, the author reflects on three theses that…
Method and system for efficient video compression with low-complexity encoder

NASA Technical Reports Server (NTRS)

Chen, Jun (Inventor); He, Dake (Inventor); Sheinin, Vadim (Inventor); Jagmohan, Ashish (Inventor); Lu, Ligang (Inventor)

2012-01-01

Disclosed are a method and system for video compression, wherein the video encoder has low computational complexity and high compression efficiency. The disclosed system comprises a video encoder and a video decoder, wherein the method for encoding includes the steps of converting a source frame into a space-frequency representation; estimating conditional statistics of at least one vector of space-frequency coefficients; estimating encoding rates based on the said conditional statistics; and applying Slepian-Wolf codes with the said computed encoding rates. The preferred method for decoding includes the steps of; generating a side-information vector of frequency coefficients based on previously decoded source data, encoder statistics, and previous reconstructions of the source frequency vector; and performing Slepian-Wolf decoding of at least one source frequency vector based on the generated side-information, the Slepian-Wolf code bits and the encoder statistics.
Comparative study on the selectivity of various spectrophotometric techniques for the determination of binary mixture of fenbendazole and rafoxanide.

PubMed

Saad, Ahmed S; Attia, Ali K; Alaraki, Manal S; Elzanfaly, Eman S

2015-11-05

Five different spectrophotometric methods were applied for simultaneous determination of fenbendazole and rafoxanide in their binary mixture; namely first derivative, derivative ratio, ratio difference, dual wavelength and H-point standard addition spectrophotometric methods. Different factors affecting each of the applied spectrophotometric methods were studied and the selectivity of the applied methods was compared. The applied methods were validated as per the ICH guidelines and good accuracy; specificity and precision were proven within the concentration range of 5-50 μg/mL for both drugs. Statistical analysis using one-way ANOVA proved no significant differences among the proposed methods for the determination of the two drugs. The proposed methods successfully determined both drugs in laboratory prepared and commercially available binary mixtures, and were found applicable for the routine analysis in quality control laboratories. Copyright © 2015 Elsevier B.V. All rights reserved.
Cognition of and Demand for Education and Teaching in Medical Statistics in China: A Systematic Review and Meta-Analysis.

PubMed

Wu, Yazhou; Zhou, Liang; Li, Gaoming; Yi, Dali; Wu, Xiaojiao; Liu, Xiaoyu; Zhang, Yanqi; Liu, Ling; Yi, Dong

2015-01-01

Although a substantial number of studies focus on the teaching and application of medical statistics in China, few studies comprehensively evaluate the recognition of and demand for medical statistics. In addition, the results of these various studies differ and are insufficiently comprehensive and systematic. This investigation aimed to evaluate the general cognition of and demand for medical statistics by undergraduates, graduates, and medical staff in China. We performed a comprehensive database search related to the cognition of and demand for medical statistics from January 2007 to July 2014 and conducted a meta-analysis of non-controlled studies with sub-group analysis for undergraduates, graduates, and medical staff. There are substantial differences with respect to the cognition of theory in medical statistics among undergraduates (73.5%), graduates (60.7%), and medical staff (39.6%). The demand for theory in medical statistics is high among graduates (94.6%), undergraduates (86.1%), and medical staff (88.3%). Regarding specific statistical methods, the cognition of basic statistical methods is higher than of advanced statistical methods. The demand for certain advanced statistical methods, including (but not limited to) multiple analysis of variance (ANOVA), multiple linear regression, and logistic regression, is higher than that for basic statistical methods. The use rates of the Statistical Package for the Social Sciences (SPSS) software and statistical analysis software (SAS) are only 55% and 15%, respectively. The overall statistical competence of undergraduates, graduates, and medical staff is insufficient, and their ability to practically apply their statistical knowledge is limited, which constitutes an unsatisfactory state of affairs for medical statistics education. Because the demand for skills in this area is increasing, the need to reform medical statistics education in China has become urgent.
Estimating the proportion of true null hypotheses when the statistics are discrete.

PubMed

Dialsingh, Isaac; Austin, Stefanie R; Altman, Naomi S

2015-07-15

In high-dimensional testing problems π0, the proportion of null hypotheses that are true is an important parameter. For discrete test statistics, the P values come from a discrete distribution with finite support and the null distribution may depend on an ancillary statistic such as a table margin that varies among the test statistics. Methods for estimating π0 developed for continuous test statistics, which depend on a uniform or identical null distribution of P values, may not perform well when applied to discrete testing problems. This article introduces a number of π0 estimators, the regression and 'T' methods that perform well with discrete test statistics and also assesses how well methods developed for or adapted from continuous tests perform with discrete tests. We demonstrate the usefulness of these estimators in the analysis of high-throughput biological RNA-seq and single-nucleotide polymorphism data. implemented in R. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical Inference and Patterns of Inequality in the Global North

ERIC Educational Resources Information Center

Moran, Timothy Patrick

2006-01-01

Cross-national inequality trends have historically been a crucial field of inquiry across the social sciences, and new methodological techniques of statistical inference have recently improved the ability to analyze these trends over time. This paper applies Monte Carlo, bootstrap inference methods to the income surveys of the Luxembourg Income…
Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

NASA Technical Reports Server (NTRS)

Tripp, John S.; Tcheng, Ping

1999-01-01

Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
Rank-based testing of equal survivorship based on cross-sectional survival data with or without prospective follow-up.

PubMed

Chan, Kwun Chuen Gary; Qin, Jing

2015-10-01

Existing linear rank statistics cannot be applied to cross-sectional survival data without follow-up since all subjects are essentially censored. However, partial survival information are available from backward recurrence times and are frequently collected from health surveys without prospective follow-up. Under length-biased sampling, a class of linear rank statistics is proposed based only on backward recurrence times without any prospective follow-up. When follow-up data are available, the proposed rank statistic and a conventional rank statistic that utilizes follow-up information from the same sample are shown to be asymptotically independent. We discuss four ways to combine these two statistics when follow-up is present. Simulations show that all combined statistics have substantially improved power compared with conventional rank statistics, and a Mantel-Haenszel test performed the best among the proposal statistics. The method is applied to a cross-sectional health survey without follow-up and a study of Alzheimer's disease with prospective follow-up. © The Author 2015. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Statistical methods to estimate treatment effects from multichannel electroencephalography (EEG) data in clinical trials.

PubMed

Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir

2010-07-15

With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.
Measurement of turbulent spatial structure and kinetic energy spectrum by exact temporal-to-spatial mapping

NASA Astrophysics Data System (ADS)

Buchhave, Preben; Velte, Clara M.

2017-08-01

We present a method for converting a time record of turbulent velocity measured at a point in a flow to a spatial velocity record consisting of consecutive convection elements. The spatial record allows computation of dynamic statistical moments such as turbulent kinetic wavenumber spectra and spatial structure functions in a way that completely bypasses the need for Taylor's hypothesis. The spatial statistics agree with the classical counterparts, such as the total kinetic energy spectrum, at least for spatial extents up to the Taylor microscale. The requirements for applying the method are access to the instantaneous velocity magnitude, in addition to the desired flow quantity, and a high temporal resolution in comparison to the relevant time scales of the flow. We map, without distortion and bias, notoriously difficult developing turbulent high intensity flows using three main aspects that distinguish these measurements from previous work in the field: (1) The measurements are conducted using laser Doppler anemometry and are therefore not contaminated by directional ambiguity (in contrast to, e.g., frequently employed hot-wire anemometers); (2) the measurement data are extracted using a correctly and transparently functioning processor and are analysed using methods derived from first principles to provide unbiased estimates of the velocity statistics; (3) the exact mapping proposed herein has been applied to the high turbulence intensity flows investigated to avoid the significant distortions caused by Taylor's hypothesis. The method is first confirmed to produce the correct statistics using computer simulations and later applied to measurements in some of the most difficult regions of a round turbulent jet—the non-equilibrium developing region and the outermost parts of the developed jet. The proposed mapping is successfully validated using corresponding directly measured spatial statistics in the fully developed jet, even in the difficult outer regions of the jet where the average convection velocity is negligible and turbulence intensities increase dramatically. The measurements in the developing region reveal interesting features of an incomplete Richardson-Kolmogorov cascade under development.
Putting engineering back into protein engineering: bioinformatic approaches to catalyst design.

PubMed

Gustafsson, Claes; Govindarajan, Sridhar; Minshull, Jeremy

2003-08-01

Complex multivariate engineering problems are commonplace and not unique to protein engineering. Mathematical and data-mining tools developed in other fields of engineering have now been applied to analyze sequence-activity relationships of peptides and proteins and to assist in the design of proteins and peptides with specified properties. Decreasing costs of DNA sequencing in conjunction with methods to quickly synthesize statistically representative sets of proteins allow modern heuristic statistics to be applied to protein engineering. This provides an alternative approach to expensive assays or unreliable high-throughput surrogate screens.
Data exploration systems for databases

NASA Technical Reports Server (NTRS)

Greene, Richard J.; Hield, Christopher

1992-01-01

Data exploration systems apply machine learning techniques, multivariate statistical methods, information theory, and database theory to databases to identify significant relationships among the data and summarize information. The result of applying data exploration systems should be a better understanding of the structure of the data and a perspective of the data enabling an analyst to form hypotheses for interpreting the data. This paper argues that data exploration systems need a minimum amount of domain knowledge to guide both the statistical strategy and the interpretation of the resulting patterns discovered by these systems.
Variable Selection in the Presence of Missing Data: Imputation-based Methods.

PubMed

Zhao, Yize; Long, Qi

2017-01-01

Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.
Statistical analysis of activation and reaction energies with quasi-variational coupled-cluster theory

NASA Astrophysics Data System (ADS)

Black, Joshua A.; Knowles, Peter J.

2018-06-01

The performance of quasi-variational coupled-cluster (QV) theory applied to the calculation of activation and reaction energies has been investigated. A statistical analysis of results obtained for six different sets of reactions has been carried out, and the results have been compared to those from standard single-reference methods. In general, the QV methods lead to increased activation energies and larger absolute reaction energies compared to those obtained with traditional coupled-cluster theory.
Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

NASA Astrophysics Data System (ADS)

Yakunin, A. G.; Hussein, H. M.

2018-01-01

The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.
Non-Destructive Measurement Methods (Neutron-, X-ray Radiography, Vibration Diagnostics and Ultrasound) in the Inspection of Helicopter Rotor Blades

DTIC Science & Technology

2005-04-01

the radiography gauging. In addition to the Statistical Energy Analysis (SEA) measurement a small exciter table (BK4810) and impedance head (BK 8000... Statistical Energy Analysis ; 7th Conf. on Vehicle System Dynamics, Identification and Anomalies (VSDIA2000), 6-8 Nov. 2000 Budapest, Proc. pp. 491-493... Energy Analysis (SEA) and Ultrasound Test. (UT) were concurrently applied. These methods collect accessory information on the objects under inspection
Rtop - an R package for interpolation along the stream network

NASA Astrophysics Data System (ADS)

Skøien, J. O.

2009-04-01

Rtop - an R package for interpolation along the stream network Geostatistical methods have been used to a limited extent for estimation along stream networks, with a few exceptions(Gottschalk, 1993; Gottschalk, et al., 2006; Sauquet, et al., 2000; Skøien, et al., 2006). Interpolation of runoff characteristics are more complicated than the traditional random variables estimated by geostatistical methods, as the measurements have a more complicated support, and many catchments are nested. Skøien et al. (2006) presented the model Top-kriging which takes these effects into account for interpolation of stream flow characteristics (exemplified by the 100 year flood). The method has here been implemented as a package in the statistical environment R (R Development Core Team, 2004). Taking advantage of the existing methods in R for working with spatial objects, and the extensive possibilities for visualizing the result, this makes it considerably easier to apply the method on new data sets, in comparison to earlier implementation of the method. Gottschalk, L. 1993. Interpolation of runoff applying objective methods. Stochastic Hydrology and Hydraulics, 7, 269-281. Gottschalk, L., I. Krasovskaia, E. Leblois, and E. Sauquet. 2006. Mapping mean and variance of runoff in a river basin. Hydrology and Earth System Sciences, 10, 469-484. R Development Core Team. 2004. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Sauquet, E., L. Gottschalk, and E. Leblois. 2000. Mapping average annual runoff: a hierarchical approach applying a stochastic interpolation scheme. Hydrological Sciences Journal, 45 (6), 799-815. Skøien, J. O., R. Merz, and G. Blöschl. 2006. Top-kriging - geostatistics on stream networks. Hydrology and Earth System Sciences, 10, 277-287.
Implementing informative priors for heterogeneity in meta-analysis using meta-regression and pseudo data.

PubMed

Rhodes, Kirsty M; Turner, Rebecca M; White, Ian R; Jackson, Dan; Spiegelhalter, David J; Higgins, Julian P T

2016-12-20

Many meta-analyses combine results from only a small number of studies, a situation in which the between-study variance is imprecisely estimated when standard methods are applied. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, providing the potential for more robust inference on the effect size of interest. We present a method for performing Bayesian meta-analysis using data augmentation, in which we represent an informative conjugate prior for between-study variance by pseudo data and use meta-regression for estimation. To assist in this, we derive predictive inverse-gamma distributions for the between-study variance expected in future meta-analyses. These may serve as priors for heterogeneity in new meta-analyses. In a simulation study, we compare approximate Bayesian methods using meta-regression and pseudo data against fully Bayesian approaches based on importance sampling techniques and Markov chain Monte Carlo (MCMC). We compare the frequentist properties of these Bayesian methods with those of the commonly used frequentist DerSimonian and Laird procedure. The method is implemented in standard statistical software and provides a less complex alternative to standard MCMC approaches. An importance sampling approach produces almost identical results to standard MCMC approaches, and results obtained through meta-regression and pseudo data are very similar. On average, data augmentation provides closer results to MCMC, if implemented using restricted maximum likelihood estimation rather than DerSimonian and Laird or maximum likelihood estimation. The methods are applied to real datasets, and an extension to network meta-analysis is described. The proposed method facilitates Bayesian meta-analysis in a way that is accessible to applied researchers. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

14 CFR 417.203 - Compliance.

Code of Federal Regulations, 2012 CFR

2012-01-01

... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...
14 CFR 417.203 - Compliance.

Code of Federal Regulations, 2014 CFR

2014-01-01

... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...
14 CFR 417.203 - Compliance.

Code of Federal Regulations, 2013 CFR

2013-01-01

... analysis method is based on accurate data and scientific principles and is statistically valid. The FAA... safety analysis must also meet the requirements for methods of analysis contained in appendices A and B... from an identical or similar launch if the analysis still applies to the later launch. (b) Method of...
Bootstrapping Methods Applied for Simulating Laboratory Works

ERIC Educational Resources Information Center

Prodan, Augustin; Campean, Remus

2005-01-01

Purpose: The aim of this work is to implement bootstrapping methods into software tools, based on Java. Design/methodology/approach: This paper presents a category of software e-tools aimed at simulating laboratory works and experiments. Findings: Both students and teaching staff use traditional statistical methods to infer the truth from sample…
Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics

PubMed Central

Cao, Hui; Markatou, Marianthi; Melton, Genevieve B.; Chiang, Michael F.; Hripcsak, George

2005-01-01

This paper applies co-occurrence statistics to discover disease-finding associations in a clinical data warehouse. We used two methods, χ2 statistics and the proportion confidence interval (PCI) method, to measure the dependence of pairs of diseases and findings, and then used heuristic cutoff values for association selection. An intrinsic evaluation showed that 94 percent of disease-finding associations obtained by χ2 statistics and 76.8 percent obtained by the PCI method were true associations. The selected associations were used to construct knowledge bases of disease-finding relations (KB-χ2, KB-PCI). An extrinsic evaluation showed that both KB-χ2 and KB-PCI could assist in eliminating clinically non-informative and redundant findings from problem lists generated by our automated problem list summarization system. PMID:16779011
Quantification of histone modification ChIP-seq enrichment for data mining and machine learning applications

PubMed Central

2011-01-01

Background The advent of ChIP-seq technology has made the investigation of epigenetic regulatory networks a computationally tractable problem. Several groups have applied statistical computing methods to ChIP-seq datasets to gain insight into the epigenetic regulation of transcription. However, methods for estimating enrichment levels in ChIP-seq data for these computational studies are understudied and variable. Since the conclusions drawn from these data mining and machine learning applications strongly depend on the enrichment level inputs, a comparison of estimation methods with respect to the performance of statistical models should be made. Results Various methods were used to estimate the gene-wise ChIP-seq enrichment levels for 20 histone methylations and the histone variant H2A.Z. The Multivariate Adaptive Regression Splines (MARS) algorithm was applied for each estimation method using the estimation of enrichment levels as predictors and gene expression levels as responses. The methods used to estimate enrichment levels included tag counting and model-based methods that were applied to whole genes and specific gene regions. These methods were also applied to various sizes of estimation windows. The MARS model performance was assessed with the Generalized Cross-Validation Score (GCV). We determined that model-based methods of enrichment estimation that spatially weight enrichment based on average patterns provided an improvement over tag counting methods. Also, methods that included information across the entire gene body provided improvement over methods that focus on a specific sub-region of the gene (e.g., the 5' or 3' region). Conclusion The performance of data mining and machine learning methods when applied to histone modification ChIP-seq data can be improved by using data across the entire gene body, and incorporating the spatial distribution of enrichment. Refinement of enrichment estimation ultimately improved accuracy of model predictions. PMID:21834981
A Study on Predictive Analytics Application to Ship Machinery Maintenance

DTIC Science & Technology

2013-09-01

Looking at the nature of the time series forecasting method , it would be better applied to offline analysis . The application for real- time online...other system attributes in future. Two techniques of statistical analysis , mainly time series models and cumulative sum control charts, are discussed in...statistical tool employed for the two techniques of statistical analysis . Both time series forecasting as well as CUSUM control charts are shown to be
A method of classification for multisource data in remote sensing based on interval-valued probabilities

NASA Technical Reports Server (NTRS)

Kim, Hakil; Swain, Philip H.

1990-01-01

An axiomatic approach to intervalued (IV) probabilities is presented, where the IV probability is defined by a pair of set-theoretic functions which satisfy some pre-specified axioms. On the basis of this approach representation of statistical evidence and combination of multiple bodies of evidence are emphasized. Although IV probabilities provide an innovative means for the representation and combination of evidential information, they make the decision process rather complicated. It entails more intelligent strategies for making decisions. The development of decision rules over IV probabilities is discussed from the viewpoint of statistical pattern recognition. The proposed method, so called evidential reasoning method, is applied to the ground-cover classification of a multisource data set consisting of Multispectral Scanner (MSS) data, Synthetic Aperture Radar (SAR) data, and digital terrain data such as elevation, slope, and aspect. By treating the data sources separately, the method is able to capture both parametric and nonparametric information and to combine them. Then the method is applied to two separate cases of classifying multiband data obtained by a single sensor. In each case a set of multiple sources is obtained by dividing the dimensionally huge data into smaller and more manageable pieces based on the global statistical correlation information. By a divide-and-combine process, the method is able to utilize more features than the conventional maximum likelihood method.
Statistical bias correction method applied on CMIP5 datasets over the Indian region during the summer monsoon season for climate change applications

NASA Astrophysics Data System (ADS)

Prasanna, V.

2018-01-01

This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.
Automated grain mapping using wide angle convergent beam electron diffraction in transmission electron microscope for nanomaterials.

PubMed

Kumar, Vineet

2011-12-01

The grain size statistics, commonly derived from the grain map of a material sample, are important microstructure characteristics that greatly influence its properties. The grain map for nanomaterials is usually obtained manually by visual inspection of the transmission electron microscope (TEM) micrographs because automated methods do not perform satisfactorily. While the visual inspection method provides reliable results, it is a labor intensive process and is often prone to human errors. In this article, an automated grain mapping method is developed using TEM diffraction patterns. The presented method uses wide angle convergent beam diffraction in the TEM. The automated technique was applied on a platinum thin film sample to obtain the grain map and subsequently derive grain size statistics from it. The grain size statistics obtained with the automated method were found in good agreement with the visual inspection method.
Statistical methods for conducting agreement (comparison of clinical tests) and precision (repeatability or reproducibility) studies in optometry and ophthalmology.

PubMed

McAlinden, Colm; Khadka, Jyoti; Pesudovs, Konrad

2011-07-01

The ever-expanding choice of ocular metrology and imaging equipment has driven research into the validity of their measurements. Consequently, studies of the agreement between two instruments or clinical tests have proliferated in the ophthalmic literature. It is important that researchers apply the appropriate statistical tests in agreement studies. Correlation coefficients are hazardous and should be avoided. The 'limits of agreement' method originally proposed by Altman and Bland in 1983 is the statistical procedure of choice. Its step-by-step use and practical considerations in relation to optometry and ophthalmology are detailed in addition to sample size considerations and statistical approaches to precision (repeatability or reproducibility) estimates. Ophthalmic & Physiological Optics © 2011 The College of Optometrists.
Covariance approximation for fast and accurate computation of channelized Hotelling observer statistics

NASA Astrophysics Data System (ADS)

Bonetto, P.; Qi, Jinyi; Leahy, R. M.

2000-08-01

Describes a method for computing linear observer statistics for maximum a posteriori (MAP) reconstructions of PET images. The method is based on a theoretical approximation for the mean and covariance of MAP reconstructions. In particular, the authors derive here a closed form for the channelized Hotelling observer (CHO) statistic applied to 2D MAP images. The theoretical analysis models both the Poission statistics of PET data and the inhomogeneity of tracer uptake. The authors show reasonably good correspondence between these theoretical results and Monte Carlo studies. The accuracy and low computational cost of the approximation allow the authors to analyze the observer performance over a wide range of operating conditions and parameter settings for the MAP reconstruction algorithm.
How Does One Assess the Accuracy of Academic Success Predictors? ROC Analysis Applied to University Entrance Factors

ERIC Educational Resources Information Center

Vivo, Juana-Maria; Franco, Manuel

2008-01-01

This article attempts to present a novel application of a method of measuring accuracy for academic success predictors that could be used as a standard. This procedure is known as the receiver operating characteristic (ROC) curve, which comes from statistical decision techniques. The statistical prediction techniques provide predictor models and…
Application of binomial and multinomial probability statistics to the sampling design process of a global grain tracing and recall system

USDA-ARS?s Scientific Manuscript database

Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
An Empirical Investigation of Methods for Assessing Item Fit for Mixed Format Tests

ERIC Educational Resources Information Center

Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N.

2013-01-01

Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Parameter estimation techniques based on optimizing goodness-of-fit statistics for structural reliability

NASA Technical Reports Server (NTRS)

Starlinger, Alois; Duffy, Stephen F.; Palko, Joseph L.

1993-01-01

New methods are presented that utilize the optimization of goodness-of-fit statistics in order to estimate Weibull parameters from failure data. It is assumed that the underlying population is characterized by a three-parameter Weibull distribution. Goodness-of-fit tests are based on the empirical distribution function (EDF). The EDF is a step function, calculated using failure data, and represents an approximation of the cumulative distribution function for the underlying population. Statistics (such as the Kolmogorov-Smirnov statistic and the Anderson-Darling statistic) measure the discrepancy between the EDF and the cumulative distribution function (CDF). These statistics are minimized with respect to the three Weibull parameters. Due to nonlinearities encountered in the minimization process, Powell's numerical optimization procedure is applied to obtain the optimum value of the EDF. Numerical examples show the applicability of these new estimation methods. The results are compared to the estimates obtained with Cooper's nonlinear regression algorithm.
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research

PubMed Central

van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B; Neyer, Franz J; van Aken, Marcel AG

2014-01-01

Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are introduced using a simplified example. Thereafter, the advantages and pitfalls of the specification of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in a second example a series of studies that examine the theoretical framework of dynamic interactionism are considered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, and guidelines on how to report on Bayesian statistics are provided. PMID:24116396
Extractive-spectrophotometric determination of disopyramide and irbesartan in their pharmaceutical formulation

NASA Astrophysics Data System (ADS)

Abdellatef, Hisham E.

2007-04-01

Picric acid, bromocresol green, bromothymol blue, cobalt thiocyanate and molybdenum(V) thiocyanate have been tested as spectrophotometric reagents for the determination of disopyramide and irbesartan. Reaction conditions have been optimized to obtain coloured comoplexes of higher sensitivity and longer stability. The absorbance of ion-pair complexes formed were found to increases linearity with increases in concentrations of disopyramide and irbesartan which were corroborated by correction coefficient values. The developed methods have been successfully applied for the determination of disopyramide and irbesartan in bulk drugs and pharmaceutical formulations. The common excipients and additives did not interfere in their determination. The results obtained by the proposed methods have been statistically compared by means of student t-test and by the variance ratio F-test. The validity was assessed by applying the standard addition technique. The results were compared statistically with the official or reference methods showing a good agreement with high precision and accuracy.
STATISTICAL ANALYSIS OF SNAP 10A THERMOELECTRIC CONVERTER ELEMENT PROCESS DEVELOPMENT VARIABLES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fitch, S.H.; Morris, J.W.

1962-12-15

Statistical analysis, primarily analysis of variance, was applied to evaluate several factors involved in the development of suitable fabrication and processing techniques for the production of lead telluride thermoelectric elements for the SNAP 10A energy conversion system. The analysis methods are described as to their application for determining the effects of various processing steps, estabIishing the value of individual operations, and evaluating the significance of test results. The elimination of unnecessary or detrimental processing steps was accomplished and the number of required tests was substantially reduced by application of these statistical methods to the SNAP 10A production development effort. (auth)
Calculation of precise firing statistics in a neural network model

NASA Astrophysics Data System (ADS)

Cho, Myoung Won

2017-08-01

A precise prediction of neural firing dynamics is requisite to understand the function of and the learning process in a biological neural network which works depending on exact spike timings. Basically, the prediction of firing statistics is a delicate manybody problem because the firing probability of a neuron at a time is determined by the summation over all effects from past firing states. A neural network model with the Feynman path integral formulation is recently introduced. In this paper, we present several methods to calculate firing statistics in the model. We apply the methods to some cases and compare the theoretical predictions with simulation results.

A general solution strategy of modified power method for higher mode solutions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Peng; Lee, Hyunsuk; Lee, Deokjung, E-mail: deokjung@unist.ac.kr

2016-01-15

A general solution strategy of the modified power iteration method for calculating higher eigenmodes has been developed and applied in continuous energy Monte Carlo simulation. The new approach adopts four features: 1) the eigen decomposition of transfer matrix, 2) weight cancellation for higher modes, 3) population control with higher mode weights, and 4) stabilization technique of statistical fluctuations using multi-cycle accumulations. The numerical tests of neutron transport eigenvalue problems successfully demonstrate that the new strategy can significantly accelerate the fission source convergence with stable convergence behavior while obtaining multiple higher eigenmodes at the same time. The advantages of the newmore » strategy can be summarized as 1) the replacement of the cumbersome solution step of high order polynomial equations required by Booth's original method with the simple matrix eigen decomposition, 2) faster fission source convergence in inactive cycles, 3) more stable behaviors in both inactive and active cycles, and 4) smaller variances in active cycles. Advantages 3 and 4 can be attributed to the lower sensitivity of the new strategy to statistical fluctuations due to the multi-cycle accumulations. The application of the modified power method to continuous energy Monte Carlo simulation and the higher eigenmodes up to 4th order are reported for the first time in this paper. -- Graphical abstract: -- Highlights: •Modified power method is applied to continuous energy Monte Carlo simulation. •Transfer matrix is introduced to generalize the modified power method. •All mode based population control is applied to get the higher eigenmodes. •Statistic fluctuation can be greatly reduced using accumulated tally results. •Fission source convergence is accelerated with higher mode solutions.« less
Sound source measurement by using a passive sound insulation and a statistical approach

NASA Astrophysics Data System (ADS)

Dragonetti, Raffaele; Di Filippo, Sabato; Mercogliano, Francesco; Romano, Rosario A.

2015-10-01

This paper describes a measurement technique developed by the authors that allows carrying out acoustic measurements inside noisy environments reducing background noise effects. The proposed method is based on the integration of a traditional passive noise insulation system with a statistical approach. The latter is applied to signals picked up by usual sensors (microphones and accelerometers) equipping the passive sound insulation system. The statistical approach allows improving of the sound insulation given only by the passive sound insulation system at low frequency. The developed measurement technique has been validated by means of numerical simulations and measurements carried out inside a real noisy environment. For the case-studies here reported, an average improvement of about 10 dB has been obtained in a frequency range up to about 250 Hz. Considerations on the lower sound pressure level that can be measured by applying the proposed method and the measurement error related to its application are reported as well.
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.

DTIC Science & Technology

1983-09-01

books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Applying Item Response Theory Methods to Examine the Impact of Different Response Formats

ERIC Educational Resources Information Center

Hohensinn, Christine; Kubinger, Klaus D.

2011-01-01

In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
Enhance Video Film using Retnix method

NASA Astrophysics Data System (ADS)

Awad, Rasha; Al-Zuky, Ali A.; Al-Saleh, Anwar H.; Mohamad, Haidar J.

2018-05-01

An enhancement technique used to improve the studied video quality. Algorithms like mean and standard deviation are used as a criterion within this paper, and it applied for each video clip that divided into 80 images. The studied filming environment has different light intensity (315, 566, and 644Lux). This different environment gives similar reality to the outdoor filming. The outputs of the suggested algorithm are compared with the results before applying it. This method is applied into two ways: first, it is applied for the full video clip to get the enhanced film; second, it is applied for every individual image to get the enhanced image then compiler them to get the enhanced film. This paper shows that the enhancement technique gives good quality video film depending on a statistical method, and it is recommended to use it in different application.
Fulfilling the law of a single independent variable and improving the result of mathematical educational research

NASA Astrophysics Data System (ADS)

Pardimin, H.; Arcana, N.

2018-01-01

Many types of research in the field of mathematics education apply the Quasi-Experimental method and statistical analysis use t-test. Quasi-experiment has a weakness that is difficult to fulfil “the law of a single independent variable”. T-test also has a weakness that is a generalization of the conclusions obtained is less powerful. This research aimed to find ways to reduce the weaknesses of the Quasi-experimental method and improved the generalization of the research results. The method applied in the research was a non-interactive qualitative method, and the type was concept analysis. Concepts analysed are the concept of statistics, research methods of education, and research reports. The result represented a way to overcome the weaknesses of quasi-Experiments and T-test. In addition, the way was to apply a combination of Factorial Design and Balanced Design, which the authors refer to as Factorial-Balanced Design. The advantages of this design are: (1) almost fulfilling “the low of single independent variable” so no need to test the similarity of the academic ability, (2) the sample size of the experimental group and the control group became larger and equal; so it becomes robust to deal with violations of the assumptions of the ANOVA test.
UVPROM dosimetry, microdosimetry and applications to SEU and extreme value theory

NASA Astrophysics Data System (ADS)

Scheick, Leif Zebediah

A new method is described for characterizing a device in terms of the statistical distribution of first failures. The method is based on the erasure of a commercial Ultra- Violet erasable Programmable Read Only Memory (UVPROM). The method of readout would be used on a spacecraft or in other restrictive radiation environments. The measurement of the charge remaining on the floating gate is used to determine absorbed dose. The method of determining dose does not require the detector to be destroyed or erased nor does it effect the ability for taking further measurements. This is compared to extreme value theory applied to the statistical distributions that apply to this device. This technique predicts the threshold of Single Event Effects (SEE), like anomalous changes in erasure time in programmable devices due to high microdose energy-deposition events. This technique also allows for advanced non-destructive, screening of a single microelectronic devices for predictable response in a stressful, i.e. radiation, environments.
A comparison between EEG source localization and fMRI during the processing of emotional visual stimuli

NASA Astrophysics Data System (ADS)

Hu, Jin; Tian, Jie; Pan, Xiaohong; Liu, Jiangang

2007-03-01

The purpose of this paper is to compare between EEG source localization and fMRI during emotional processing. 108 pictures for EEG (categorized as positive, negative and neutral) and 72 pictures for fMRI were presented to 24 healthy, right-handed subjects. The fMRI data were analyzed using statistical parametric mapping with SPM2. LORETA was applied to grand averaged ERP data to localize intracranial sources. Statistical analysis was implemented to compare spatiotemporal activation of fMRI and EEG. The fMRI results are in accordance with EEG source localization to some extent, while part of mismatch in localization between the two methods was also observed. In the future we should apply the method for simultaneous recording of EEG and fMRI to our study.
Detecting subtle hydrochemical anomalies with multivariate statistics: an example from homogeneous groundwaters in the Great Artesian Basin, Australia

NASA Astrophysics Data System (ADS)

O'Shea, Bethany; Jankowski, Jerzy

2006-12-01

The major ion composition of Great Artesian Basin groundwater in the lower Namoi River valley is relatively homogeneous in chemical composition. Traditional graphical techniques have been combined with multivariate statistical methods to determine whether subtle differences in the chemical composition of these waters can be delineated. Hierarchical cluster analysis and principal components analysis were successful in delineating minor variations within the groundwaters of the study area that were not visually identified in the graphical techniques applied. Hydrochemical interpretation allowed geochemical processes to be identified in each statistically defined water type and illustrated how these groundwaters differ from one another. Three main geochemical processes were identified in the groundwaters: ion exchange, precipitation, and mixing between waters from different sources. Both statistical methods delineated an anomalous sample suspected of being influenced by magmatic CO2 input. The use of statistical methods to complement traditional graphical techniques for waters appearing homogeneous is emphasized for all investigations of this type. Copyright
Realistic finite temperature simulations of magnetic systems using quantum statistics

NASA Astrophysics Data System (ADS)

Bergqvist, Lars; Bergman, Anders

2018-01-01

We have performed realistic atomistic simulations at finite temperatures using Monte Carlo and atomistic spin dynamics simulations incorporating quantum (Bose-Einstein) statistics. The description is much improved at low temperatures compared to classical (Boltzmann) statistics normally used in these kind of simulations, while at higher temperatures the classical statistics are recovered. This corrected low-temperature description is reflected in both magnetization and the magnetic specific heat, the latter allowing for improved modeling of the magnetic contribution to free energies. A central property in the method is the magnon density of states at finite temperatures, and we have compared several different implementations for obtaining it. The method has no restrictions regarding chemical and magnetic order of the considered materials. This is demonstrated by applying the method to elemental ferromagnetic systems, including Fe and Ni, as well as Fe-Co random alloys and the ferrimagnetic system GdFe3.
Statistical Methods Applied to Gamma-ray Spectroscopy Algorithms in Nuclear Security Missions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fagan, Deborah K.; Robinson, Sean M.; Runkle, Robert C.

2012-10-01

In a wide range of nuclear security missions, gamma-ray spectroscopy is a critical research and development priority. One particularly relevant challenge is the interdiction of special nuclear material for which gamma-ray spectroscopy supports the goals of detecting and identifying gamma-ray sources. This manuscript examines the existing set of spectroscopy methods, attempts to categorize them by the statistical methods on which they rely, and identifies methods that have yet to be considered. Our examination shows that current methods effectively estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty—ones that are significantly moremore » complex. We thus explore the premise that significantly improving algorithm performance requires greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods have the potential to reduce decision uncertainty by more rigorously and comprehensively incorporating all sources of uncertainty. We expect that application of such methods will demonstrate progress in meeting the needs of nuclear security missions by improving on the existing numerical infrastructure for which these analyses have not been conducted.« less
Evaluation of methods for managing censored results when calculating the geometric mean.

PubMed

Mikkonen, Hannah G; Clarke, Bradley O; Dasika, Raghava; Wallis, Christian J; Reichman, Suzie M

2018-01-01

Currently, there are conflicting views on the best statistical methods for managing censored environmental data. The method commonly applied by environmental science researchers and professionals is to substitute half the limit of reporting for derivation of summary statistics. This approach has been criticised by some researchers, raising questions around the interpretation of historical scientific data. This study evaluated four complete soil datasets, at three levels of simulated censorship, to test the accuracy of a range of censored data management methods for calculation of the geometric mean. The methods assessed included removal of censored results, substitution of a fixed value (near zero, half the limit of reporting and the limit of reporting), substitution by nearest neighbour imputation, maximum likelihood estimation, regression on order substitution and Kaplan-Meier/survival analysis. This is the first time such a comprehensive range of censored data management methods have been applied to assess the accuracy of calculation of the geometric mean. The results of this study show that, for describing the geometric mean, the simple method of substitution of half the limit of reporting is comparable or more accurate than alternative censored data management methods, including nearest neighbour imputation methods. Copyright © 2017 Elsevier Ltd. All rights reserved.
IMPLEMENTATION AND VALIDATION OF STATISTICAL TESTS IN RESEARCH'S SOFTWARE HELPING DATA COLLECTION AND PROTOCOLS ANALYSIS IN SURGERY.

PubMed

Kuretzki, Carlos Henrique; Campos, Antônio Carlos Ligocki; Malafaia, Osvaldo; Soares, Sandramara Scandelari Kusano de Paula; Tenório, Sérgio Bernardo; Timi, Jorge Rufino Ribas

2016-03-01

The use of information technology is often applied in healthcare. With regard to scientific research, the SINPE(c) - Integrated Electronic Protocols was created as a tool to support researchers, offering clinical data standardization. By the time, SINPE(c) lacked statistical tests obtained by automatic analysis. Add to SINPE(c) features for automatic realization of the main statistical methods used in medicine . The study was divided into four topics: check the interest of users towards the implementation of the tests; search the frequency of their use in health care; carry out the implementation; and validate the results with researchers and their protocols. It was applied in a group of users of this software in their thesis in the strict sensu master and doctorate degrees in one postgraduate program in surgery. To assess the reliability of the statistics was compared the data obtained both automatically by SINPE(c) as manually held by a professional in statistics with experience with this type of study. There was concern for the use of automatic statistical tests, with good acceptance. The chi-square, Mann-Whitney, Fisher and t-Student were considered as tests frequently used by participants in medical studies. These methods have been implemented and thereafter approved as expected. The incorporation of the automatic SINPE (c) Statistical Analysis was shown to be reliable and equal to the manually done, validating its use as a research tool for medical research.
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

2003-03-01

Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
Time Series Expression Analyses Using RNA-seq: A Statistical Approach

PubMed Central

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P.

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis. PMID:23586021
Time series expression analyses using RNA-seq: a statistical approach.

PubMed

Oh, Sunghee; Song, Seongho; Grabowski, Gregory; Zhao, Hongyu; Noonan, James P

2013-01-01

RNA-seq is becoming the de facto standard approach for transcriptome analysis with ever-reducing cost. It has considerable advantages over conventional technologies (microarrays) because it allows for direct identification and quantification of transcripts. Many time series RNA-seq datasets have been collected to study the dynamic regulations of transcripts. However, statistically rigorous and computationally efficient methods are needed to explore the time-dependent changes of gene expression in biological systems. These methods should explicitly account for the dependencies of expression patterns across time points. Here, we discuss several methods that can be applied to model timecourse RNA-seq data, including statistical evolutionary trajectory index (SETI), autoregressive time-lagged regression (AR(1)), and hidden Markov model (HMM) approaches. We use three real datasets and simulation studies to demonstrate the utility of these dynamic methods in temporal analysis.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment

PubMed Central

Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P.; Patterson, Nick; Price, Alkes L.

2014-01-01

Motivation: Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. Results: In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1–5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case–control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of χ2 association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Availability and implementation: Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. Contact: bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary information: Supplementary materials are available at Bioinformatics online. PMID:24990607
Citation of previous meta-analyses on the same topic: a clue to perpetuation of incorrect methods?

PubMed

Li, Tianjing; Dickersin, Kay

2013-06-01

Systematic reviews and meta-analyses serve as a basis for decision-making and clinical practice guidelines and should be carried out using appropriate methodology to avoid incorrect inferences. We describe the characteristics, statistical methods used for meta-analyses, and citation patterns of all 21 glaucoma systematic reviews we identified pertaining to the effectiveness of prostaglandin analog eye drops in treating primary open-angle glaucoma, published between December 2000 and February 2012. We abstracted data, assessed whether appropriate statistical methods were applied in meta-analyses, and examined citation patterns of included reviews. We identified two forms of problematic statistical analyses in 9 of the 21 systematic reviews examined. Except in 1 case, none of the 9 reviews that used incorrect statistical methods cited a previously published review that used appropriate methods. Reviews that used incorrect methods were cited 2.6 times more often than reviews that used appropriate statistical methods. We speculate that by emulating the statistical methodology of previous systematic reviews, systematic review authors may have perpetuated incorrect approaches to meta-analysis. The use of incorrect statistical methods, perhaps through emulating methods described in previous research, calls conclusions of systematic reviews into question and may lead to inappropriate patient care. We urge systematic review authors and journal editors to seek the advice of experienced statisticians before undertaking or accepting for publication a systematic review and meta-analysis. The author(s) have no proprietary or commercial interest in any materials discussed in this article. Copyright © 2013 American Academy of Ophthalmology. Published by Elsevier Inc. All rights reserved.
Bayesian models based on test statistics for multiple hypothesis testing problems.

PubMed

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
[Application of the life table method to the estimation of late complications of normal tissues after radiotherapy].

PubMed

Morita, K; Uchiyama, Y; Tominaga, S

1987-06-01

In order to evaluate the treatment results of radiotherapy it is important to estimate the degree of complications of the surrounding normal tissues as well as the frequency of tumor control. In this report, the cumulative incidence rate of the late radiation injuries of the normal tissues was calculated using the modified actuarial method (Cutler-Ederer's method) or Kaplan-Meier's method, which is usually applied to the calculation of the survival rate. By the use of this method of calculation, an accurate cumulative incidence rate over time can be easily obtained and applied to the statistical evaluation of the late radiation injuries.

CORSSA: The Community Online Resource for Statistical Seismicity Analysis

USGS Publications Warehouse

Michael, Andrew J.; Wiemer, Stefan

2010-01-01

Statistical seismology is the application of rigorous statistical methods to earthquake science with the goal of improving our knowledge of how the earth works. Within statistical seismology there is a strong emphasis on the analysis of seismicity data in order to improve our scientific understanding of earthquakes and to improve the evaluation and testing of earthquake forecasts, earthquake early warning, and seismic hazards assessments. Given the societal importance of these applications, statistical seismology must be done well. Unfortunately, a lack of educational resources and available software tools make it difficult for students and new practitioners to learn about this discipline. The goal of the Community Online Resource for Statistical Seismicity Analysis (CORSSA) is to promote excellence in statistical seismology by providing the knowledge and resources necessary to understand and implement the best practices, so that the reader can apply these methods to their own research. This introduction describes the motivation for and vision of CORRSA. It also describes its structure and contents.
Quantifying the impact of between-study heterogeneity in multivariate meta-analyses

PubMed Central

Jackson, Dan; White, Ian R; Riley, Richard D

2012-01-01

Measures that quantify the impact of heterogeneity in univariate meta-analysis, including the very popular I2 statistic, are now well established. Multivariate meta-analysis, where studies provide multiple outcomes that are pooled in a single analysis, is also becoming more commonly used. The question of how to quantify heterogeneity in the multivariate setting is therefore raised. It is the univariate R2 statistic, the ratio of the variance of the estimated treatment effect under the random and fixed effects models, that generalises most naturally, so this statistic provides our basis. This statistic is then used to derive a multivariate analogue of I2, which we call . We also provide a multivariate H2 statistic, the ratio of a generalisation of Cochran's heterogeneity statistic and its associated degrees of freedom, with an accompanying generalisation of the usual I2 statistic, . Our proposed heterogeneity statistics can be used alongside all the usual estimates and inferential procedures used in multivariate meta-analysis. We apply our methods to some real datasets and show how our statistics are equally appropriate in the context of multivariate meta-regression, where study level covariate effects are included in the model. Our heterogeneity statistics may be used when applying any procedure for fitting the multivariate random effects model. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22763950
On the statistical properties and tail risk of violent conflicts

NASA Astrophysics Data System (ADS)

Cirillo, Pasquale; Taleb, Nassim Nicholas

2016-06-01

We examine statistical pictures of violent conflicts over the last 2000 years, providing techniques for dealing with the unreliability of historical data. We make use of a novel approach to deal with fat-tailed random variables with a remote but nonetheless finite upper bound, by defining a corresponding unbounded dual distribution (given that potential war casualties are bounded by the world population). This approach can also be applied to other fields of science where power laws play a role in modeling, like geology, hydrology, statistical physics and finance. We apply methods from extreme value theory on the dual distribution and derive its tail properties. The dual method allows us to calculate the real tail mean of war casualties, which proves to be considerably larger than the corresponding sample mean for large thresholds, meaning severe underestimation of the tail risks of conflicts from naive observation. We analyze the robustness of our results to errors in historical reports. We study inter-arrival times between tail events and find that no particular trend can be asserted. All the statistical pictures obtained are at variance with the prevailing claims about ;long peace;, namely that violence has been declining over time.
Inference on network statistics by restricting to the network space: applications to sexual history data.

PubMed

Goyal, Ravi; De Gruttola, Victor

2018-01-30

Analysis of sexual history data intended to describe sexual networks presents many challenges arising from the fact that most surveys collect information on only a very small fraction of the population of interest. In addition, partners are rarely identified and responses are subject to reporting biases. Typically, each network statistic of interest, such as mean number of sexual partners for men or women, is estimated independently of other network statistics. There is, however, a complex relationship among networks statistics; and knowledge of these relationships can aid in addressing concerns mentioned earlier. We develop a novel method that constrains a posterior predictive distribution of a collection of network statistics in order to leverage the relationships among network statistics in making inference about network properties of interest. The method ensures that inference on network properties is compatible with an actual network. Through extensive simulation studies, we also demonstrate that use of this method can improve estimates in settings where there is uncertainty that arises both from sampling and from systematic reporting bias compared with currently available approaches to estimation. To illustrate the method, we apply it to estimate network statistics using data from the Chicago Health and Social Life Survey. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Estimation of signal coherence threshold and concealed spectral lines applied to detection of turbofan engine combustion noise.

PubMed

Miles, Jeffrey Hilton

2011-05-01

Combustion noise from turbofan engines has become important, as the noise from sources like the fan and jet are reduced. An aligned and un-aligned coherence technique has been developed to determine a threshold level for the coherence and thereby help to separate the coherent combustion noise source from other noise sources measured with far-field microphones. This method is compared with a statistics based coherence threshold estimation method. In addition, the un-aligned coherence procedure at the same time also reveals periodicities, spectral lines, and undamped sinusoids hidden by broadband turbofan engine noise. In calculating the coherence threshold using a statistical method, one may use either the number of independent records or a larger number corresponding to the number of overlapped records used to create the average. Using data from a turbofan engine and a simulation this paper shows that applying the Fisher z-transform to the un-aligned coherence can aid in making the proper selection of samples and produce a reasonable statistics based coherence threshold. Examples are presented showing that the underlying tonal and coherent broad band structure which is buried under random broadband noise and jet noise can be determined. The method also shows the possible presence of indirect combustion noise.
Scan statistics with local vote for target detection in distributed system

NASA Astrophysics Data System (ADS)

Luo, Junhai; Wu, Qi

2017-12-01

Target detection has occupied a pivotal position in distributed system. Scan statistics, as one of the most efficient detection methods, has been applied to a variety of anomaly detection problems and significantly improves the probability of detection. However, scan statistics cannot achieve the expected performance when the noise intensity is strong, or the signal emitted by the target is weak. The local vote algorithm can also achieve higher target detection rate. After the local vote, the counting rule is always adopted for decision fusion. The counting rule does not use the information about the contiguity of sensors but takes all sensors' data into consideration, which makes the result undesirable. In this paper, we propose a scan statistics with local vote (SSLV) method. This method combines scan statistics with local vote decision. Before scan statistics, each sensor executes local vote decision according to the data of its neighbors and its own. By combining the advantages of both, our method can obtain higher detection rate in low signal-to-noise ratio environment than the scan statistics. After the local vote decision, the distribution of sensors which have detected the target becomes more intensive. To make full use of local vote decision, we introduce a variable-step-parameter for the SSLV. It significantly shortens the scan period especially when the target is absent. Analysis and simulations are presented to demonstrate the performance of our method.
Statistical deprojection of galaxy pairs

NASA Astrophysics Data System (ADS)

Nottale, Laurent; Chamaraux, Pierre

2018-06-01

Aims: The purpose of the present paper is to provide methods of statistical analysis of the physical properties of galaxy pairs. We perform this study to apply it later to catalogs of isolated pairs of galaxies, especially two new catalogs we recently constructed that contain ≈1000 and ≈13 000 pairs, respectively. We are particularly interested by the dynamics of those pairs, including the determination of their masses. Methods: We could not compute the dynamical parameters directly since the necessary data are incomplete. Indeed, we only have at our disposal one component of the intervelocity between the members, namely along the line of sight, and two components of their interdistance, i.e., the projection on the sky-plane. Moreover, we know only one point of each galaxy orbit. Hence we need statistical methods to find the probability distribution of 3D interdistances and 3D intervelocities from their projections; we designed those methods under the term deprojection. Results: We proceed in two steps to determine and use the deprojection methods. First we derive the probability distributions expected for the various relevant projected quantities, namely intervelocity vz, interdistance rp, their ratio, and the product rp v_z^2, which is involved in mass determination. In a second step, we propose various methods of deprojection of those parameters based on the previous analysis. We start from a histogram of the projected data and we apply inversion formulae to obtain the deprojected distributions; lastly, we test the methods by numerical simulations, which also allow us to determine the uncertainties involved.
Statistical dependency in visual scanning

NASA Technical Reports Server (NTRS)

Ellis, Stephen R.; Stark, Lawrence

1986-01-01

A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.
A Survey of Probabilistic Methods for Dynamical Systems with Uncertain Parameters.

DTIC Science & Technology

1986-05-01

J., "An Approach to the Theoretical Background of Statistical Energy Analysis Applied to Structural Vibration," Journ. Acoust. Soc. Amer., Vol. 69...1973, Sect. 8.3. 80. Lyon, R.H., " Statistical Energy Analysis of Dynamical Systems," M.I.T. Press, 1975. e) Late References added in Proofreading !! 81...Dowell, E.H., and Kubota, Y., "Asymptotic Modal Analysis and ’~ y C-" -165- Statistical Energy Analysis of Dynamical Systems," Journ. Appi. - Mech
Quasi-Static Probabilistic Structural Analyses Process and Criteria

NASA Technical Reports Server (NTRS)

Goldberg, B.; Verderaime, V.

1999-01-01

Current deterministic structural methods are easily applied to substructures and components, and analysts have built great design insights and confidence in them over the years. However, deterministic methods cannot support systems risk analyses, and it was recently reported that deterministic treatment of statistical data is inconsistent with error propagation laws that can result in unevenly conservative structural predictions. Assuming non-nal distributions and using statistical data formats throughout prevailing stress deterministic processes lead to a safety factor in statistical format, which integrated into the safety index, provides a safety factor and first order reliability relationship. The embedded safety factor in the safety index expression allows a historically based risk to be determined and verified over a variety of quasi-static metallic substructures consistent with the traditional safety factor methods and NASA Std. 5001 criteria.
Point process statistics in atom probe tomography.

PubMed

Philippe, T; Duguay, S; Grancher, G; Blavette, D

2013-09-01

We present a review of spatial point processes as statistical models that we have designed for the analysis and treatment of atom probe tomography (APT) data. As a major advantage, these methods do not require sampling. The mean distance to nearest neighbour is an attractive approach to exhibit a non-random atomic distribution. A χ(2) test based on distance distributions to nearest neighbour has been developed to detect deviation from randomness. Best-fit methods based on first nearest neighbour distance (1 NN method) and pair correlation function are presented and compared to assess the chemical composition of tiny clusters. Delaunay tessellation for cluster selection has been also illustrated. These statistical tools have been applied to APT experiments on microelectronics materials. Copyright © 2012 Elsevier B.V. All rights reserved.
Potential errors and misuse of statistics in studies on leakage in endodontics.

PubMed

Lucena, C; Lopez, J M; Pulgar, R; Abalos, C; Valderrama, M J

2013-04-01

To assess the quality of the statistical methodology used in studies of leakage in Endodontics, and to compare the results found using appropriate versus inappropriate inferential statistical methods. The search strategy used the descriptors 'root filling' 'microleakage', 'dye penetration', 'dye leakage', 'polymicrobial leakage' and 'fluid filtration' for the time interval 2001-2010 in journals within the categories 'Dentistry, Oral Surgery and Medicine' and 'Materials Science, Biomaterials' of the Journal Citation Report. All retrieved articles were reviewed to find potential pitfalls in statistical methodology that may be encountered during study design, data management or data analysis. The database included 209 papers. In all the studies reviewed, the statistical methods used were appropriate for the category attributed to the outcome variable, but in 41% of the cases, the chi-square test or parametric methods were inappropriately selected subsequently. In 2% of the papers, no statistical test was used. In 99% of cases, a statistically 'significant' or 'not significant' effect was reported as a main finding, whilst only 1% also presented an estimation of the magnitude of the effect. When the appropriate statistical methods were applied in the studies with originally inappropriate data analysis, the conclusions changed in 19% of the cases. Statistical deficiencies in leakage studies may affect their results and interpretation and might be one of the reasons for the poor agreement amongst the reported findings. Therefore, more effort should be made to standardize statistical methodology. © 2012 International Endodontic Journal.
Statistical inference for tumor growth inhibition T/C ratio.

PubMed

Wu, Jianrong

2010-09-01

The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
Application of closed-form solutions to a mesh point field in silicon solar cells

NASA Technical Reports Server (NTRS)

Lamorte, M. F.

1985-01-01

A computer simulation method is discussed that provides for equivalent simulation accuracy, but that exhibits significantly lower CPU running time per bias point compared to other techniques. This new method is applied to a mesh point field as is customary in numerical integration (NI) techniques. The assumption of a linear approximation for the dependent variable, which is typically used in the finite difference and finite element NI methods, is not required. Instead, the set of device transport equations is applied to, and the closed-form solutions obtained for, each mesh point. The mesh point field is generated so that the coefficients in the set of transport equations exhibit small changes between adjacent mesh points. Application of this method to high-efficiency silicon solar cells is described; and the method by which Auger recombination, ambipolar considerations, built-in and induced electric fields, bandgap narrowing, carrier confinement, and carrier diffusivities are treated. Bandgap narrowing has been investigated using Fermi-Dirac statistics, and these results show that bandgap narrowing is more pronounced and that it is temperature-dependent in contrast to the results based on Boltzmann statistics.
[Triple-type theory of statistics and its application in the scientific research of biomedicine].

PubMed

Hu, Liang-ping; Liu, Hui-gang

2005-07-20

To point out the crux of why so many people failed to grasp statistics and to bring forth a "triple-type theory of statistics" to solve the problem in a creative way. Based on the experience in long-time teaching and research in statistics, the "three-type theory" was raised and clarified. Examples were provided to demonstrate that the 3 types, i.e., expressive type, prototype and the standardized type are the essentials for people to apply statistics rationally both in theory and practice, and moreover, it is demonstrated by some instances that the "three types" are correlated with each other. It can help people to see the essence by interpreting and analyzing the problems of experimental designs and statistical analyses in medical research work. Investigations reveal that for some questions, the three types are mutually identical; for some questions, the prototype is their standardized type; however, for some others, the three types are distinct from each other. It has been shown that in some multifactor experimental researches, it leads to the nonexistence of the standardized type corresponding to the prototype at all, because some researchers have committed the mistake of "incomplete control" in setting experimental groups. This is a problem which should be solved by the concept and method of "division". Once the "triple-type" for each question is clarified, a proper experimental design and statistical method can be carried out easily. "Triple-type theory of statistics" can help people to avoid committing statistical mistakes or at least to decrease the misuse rate dramatically and improve the quality, level and speed of biomedical research during the process of applying statistics. It can also help people to improve the quality of statistical textbooks and the teaching effect of statistics and it has demonstrated how to advance biomedical statistics.
The Use of Invariance and Bootstrap Procedures as a Method to Establish the Reliability of Research Results.

ERIC Educational Resources Information Center

Sandler, Andrew B.

Statistical significance is misused in educational and psychological research when it is applied as a method to establish the reliability of research results. Other techniques have been developed which can be correctly utilized to establish the generalizability of findings. Methods that do provide such estimates are known as invariance or…
Fourier Descriptor Analysis and Unification of Voice Range Profile Contours: Method and Applications

ERIC Educational Resources Information Center

Pabon, Peter; Ternstrom, Sten; Lamarche, Anick

2011-01-01

Purpose: To describe a method for unified description, statistical modeling, and comparison of voice range profile (VRP) contours, even from diverse sources. Method: A morphologic modeling technique, which is based on Fourier descriptors (FDs), is applied to the VRP contour. The technique, which essentially involves resampling of the curve of the…
ADHD and Method Variance: A Latent Variable Approach Applied to a Nationally Representative Sample of College Freshmen

ERIC Educational Resources Information Center

Konold, Timothy R.; Glutting, Joseph J.

2008-01-01

This study employed a correlated trait-correlated method application of confirmatory factor analysis to disentangle trait and method variance from measures of attention-deficit/hyperactivity disorder obtained at the college level. The two trait factors were "Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition" ("DSM-IV")…
Summary goodness-of-fit statistics for binary generalized linear models with noncanonical link functions.

PubMed

Canary, Jana D; Blizzard, Leigh; Barry, Ronald P; Hosmer, David W; Quinn, Stephen J

2016-05-01

Generalized linear models (GLM) with a canonical logit link function are the primary modeling technique used to relate a binary outcome to predictor variables. However, noncanonical links can offer more flexibility, producing convenient analytical quantities (e.g., probit GLMs in toxicology) and desired measures of effect (e.g., relative risk from log GLMs). Many summary goodness-of-fit (GOF) statistics exist for logistic GLM. Their properties make the development of GOF statistics relatively straightforward, but it can be more difficult under noncanonical links. Although GOF tests for logistic GLM with continuous covariates (GLMCC) have been applied to GLMCCs with log links, we know of no GOF tests in the literature specifically developed for GLMCCs that can be applied regardless of link function chosen. We generalize the Tsiatis GOF statistic originally developed for logistic GLMCCs, (TG), so that it can be applied under any link function. Further, we show that the algebraically related Hosmer-Lemeshow (HL) and Pigeon-Heyse (J(2) ) statistics can be applied directly. In a simulation study, TG, HL, and J(2) were used to evaluate the fit of probit, log-log, complementary log-log, and log models, all calculated with a common grouping method. The TG statistic consistently maintained Type I error rates, while those of HL and J(2) were often lower than expected if terms with little influence were included. Generally, the statistics had similar power to detect an incorrect model. An exception occurred when a log GLMCC was incorrectly fit to data generated from a logistic GLMCC. In this case, TG had more power than HL or J(2) . © 2015 John Wiley & Sons Ltd/London School of Economics.
Statistical-Dynamical Seasonal Forecasts of Central-Southwest Asian Winter Precipitation.

NASA Astrophysics Data System (ADS)

Tippett, Michael K.; Goddard, Lisa; Barnston, Anthony G.

2005-06-01

Interannual precipitation variability in central-southwest (CSW) Asia has been associated with East Asian jet stream variability and western Pacific tropical convection. However, atmospheric general circulation models (AGCMs) forced by observed sea surface temperature (SST) poorly simulate the region's interannual precipitation variability. The statistical-dynamical approach uses statistical methods to correct systematic deficiencies in the response of AGCMs to SST forcing. Statistical correction methods linking model-simulated Indo-west Pacific precipitation and observed CSW Asia precipitation result in modest, but statistically significant, cross-validated simulation skill in the northeast part of the domain for the period from 1951 to 1998. The statistical-dynamical method is also applied to recent (winter 1998/99 to 2002/03) multimodel, two-tier December-March precipitation forecasts initiated in October. This period includes 4 yr (winter of 1998/99 to 2001/02) of severe drought. Tercile probability forecasts are produced using ensemble-mean forecasts and forecast error estimates. The statistical-dynamical forecasts show enhanced probability of below-normal precipitation for the four drought years and capture the return to normal conditions in part of the region during the winter of 2002/03.May Kabul be without gold, but not without snow.—Traditional Afghan proverb

Regional projection of climate impact indices over the Mediterranean region

NASA Astrophysics Data System (ADS)

Casanueva, Ana; Frías, M.; Dolores; Herrera, Sixto; Bedia, Joaquín; San Martín, Daniel; Gutiérrez, José Manuel; Zaninovic, Ksenija

2014-05-01

Climate Impact Indices (CIIs) are being increasingly used in different socioeconomic sectors to transfer information about climate change impacts and risks to stakeholders. CIIs are typically based on different weather variables such as temperature, wind speed, precipitation or humidity and comprise, in a single index, the relevant meteorological information for the particular impact sector (in this study wildfires and tourism). This dependence on several climate variables poses important limitations to the application of statistical downscaling techniques, since physical consistency among variables is required in most cases to obtain reliable local projections. The present study assesses the suitability of the "direct" downscaling approach, in which the downscaling method is directly applied to the CII. In particular, for illustrative purposes, we consider two popular indices used in the wildfire and tourism sectors, the Fire Weather Index (FWI) and the Physiological Equivalent Temperature (PET), respectively. As an example, two case studies are analysed over two representative Mediterranean regions of interest for the EU CLIM-RUN project: continental Spain for the FWI and Croatia for the PET. Results obtained with this "direct" downscaling approach are similar to those found from the application of the statistical downscaling to the individual meteorological drivers prior to the index calculation ("component" downscaling) thus, a wider range of statistical downscaling methods could be used. As an illustration, future changes in both indices are projected by applying two direct statistical downscaling methods, analogs and linear regression, to the ECHAM5 model. Larger differences were found between the two direct statistical downscaling approaches than between the direct and the component approaches with a single downscaling method. While these examples focus on particular indices and Mediterranean regions of interest for CLIM-RUN stakeholders, the same study could be extended to other indices and regions.
High-resolution image reconstruction technique applied to the optical testing of ground-based astronomical telescopes

NASA Astrophysics Data System (ADS)

Jin, Zhenyu; Lin, Jing; Liu, Zhong

2008-07-01

By study of the classical testing techniques (such as Shack-Hartmann Wave-front Sensor) adopted in testing the aberration of ground-based astronomical optical telescopes, we bring forward two testing methods on the foundation of high-resolution image reconstruction technology. One is based on the averaged short-exposure OTF and the other is based on the Speckle Interferometric OTF by Antoine Labeyrie. Researches made by J.Ohtsubo, F. Roddier, Richard Barakat and J.-Y. ZHANG indicated that the SITF statistical results would be affected by the telescope optical aberrations, which means the SITF statistical results is a function of optical system aberration and the atmospheric Fried parameter (seeing). Telescope diffraction-limited information can be got through two statistics methods of abundant speckle images: by the first method, we can extract the low frequency information such as the full width at half maximum (FWHM) of the telescope PSF to estimate the optical quality; by the second method, we can get a more precise description of the telescope PSF with high frequency information. We will apply the two testing methods to the 2.4m optical telescope of the GMG Observatory, in china to validate their repeatability and correctness and compare the testing results with that of the Shack-Hartmann Wave-Front Sensor got. This part will be described in detail in our paper.
Line identification studies using traditional techniques and wavelength coincidence statistics

NASA Technical Reports Server (NTRS)

Cowley, Charles R.; Adelman, Saul J.

1990-01-01

Traditional line identification techniques result in the assignment of individual lines to an atomic or ionic species. These methods may be supplemented by wavelength coincidence statistics (WCS). The strength and weakness of these methods are discussed using spectra of a number of normal and peculiar B and A stars that have been studied independently by both methods. The present results support the overall findings of some earlier studies. WCS would be most useful in a first survey, before traditional methods have been applied. WCS can quickly make a global search for all species and in this way may enable identifications of an unexpected spectrum that could easily be omitted entirely from a traditional study. This is illustrated by O I. WCS is a subject to well known weakness of any statistical technique, for example, a predictable number of spurious results are to be expected. The danger of small number statistics are illustrated. WCS is at its best relative to traditional methods in finding a line-rich atomic species that is only weakly present in a complicated stellar spectrum.
Bayesian approach for counting experiment statistics applied to a neutrino point source analysis

NASA Astrophysics Data System (ADS)

Bose, D.; Brayeur, L.; Casier, M.; de Vries, K. D.; Golup, G.; van Eijndhoven, N.

2013-12-01

In this paper we present a model independent analysis method following Bayesian statistics to analyse data from a generic counting experiment and apply it to the search for neutrinos from point sources. We discuss a test statistic defined following a Bayesian framework that will be used in the search for a signal. In case no signal is found, we derive an upper limit without the introduction of approximations. The Bayesian approach allows us to obtain the full probability density function for both the background and the signal rate. As such, we have direct access to any signal upper limit. The upper limit derivation directly compares with a frequentist approach and is robust in the case of low-counting observations. Furthermore, it allows also to account for previous upper limits obtained by other analyses via the concept of prior information without the need of the ad hoc application of trial factors. To investigate the validity of the presented Bayesian approach, we have applied this method to the public IceCube 40-string configuration data for 10 nearby blazars and we have obtained a flux upper limit, which is in agreement with the upper limits determined via a frequentist approach. Furthermore, the upper limit obtained compares well with the previously published result of IceCube, using the same data set.
Statistical analysis of solid waste composition data: Arithmetic mean, standard deviation and correlation coefficients.

PubMed

Edjabou, Maklawe Essonanawe; Martín-Fernández, Josep Antoni; Scheutz, Charlotte; Astrup, Thomas Fruergaard

2017-11-01

Data for fractional solid waste composition provide relative magnitudes of individual waste fractions, the percentages of which always sum to 100, thereby connecting them intrinsically. Due to this sum constraint, waste composition data represent closed data, and their interpretation and analysis require statistical methods, other than classical statistics that are suitable only for non-constrained data such as absolute values. However, the closed characteristics of waste composition data are often ignored when analysed. The results of this study showed, for example, that unavoidable animal-derived food waste amounted to 2.21±3.12% with a confidence interval of (-4.03; 8.45), which highlights the problem of the biased negative proportions. A Pearson's correlation test, applied to waste fraction generation (kg mass), indicated a positive correlation between avoidable vegetable food waste and plastic packaging. However, correlation tests applied to waste fraction compositions (percentage values) showed a negative association in this regard, thus demonstrating that statistical analyses applied to compositional waste fraction data, without addressing the closed characteristics of these data, have the potential to generate spurious or misleading results. Therefore, ¨compositional data should be transformed adequately prior to any statistical analysis, such as computing mean, standard deviation and correlation coefficients. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classification of bladder cancer cell lines using Raman spectroscopy: a comparison of excitation wavelength, sample substrate and statistical algorithms

NASA Astrophysics Data System (ADS)

Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.

2014-05-01

Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
Statistical analysis of experimental multifragmentation events in 64Zn+112Sn at 40 MeV/nucleon

NASA Astrophysics Data System (ADS)

Lin, W.; Zheng, H.; Ren, P.; Liu, X.; Huang, M.; Wada, R.; Chen, Z.; Wang, J.; Xiao, G. Q.; Qu, G.

2018-04-01

A statistical multifragmentation model (SMM) is applied to the experimentally observed multifragmentation events in an intermediate heavy-ion reaction. Using the temperature and symmetry energy extracted from the isobaric yield ratio (IYR) method based on the modified Fisher model (MFM), SMM is applied to the reaction 64Zn+112Sn at 40 MeV/nucleon. The experimental isotope distribution and mass distribution of the primary reconstructed fragments are compared without afterburner and they are well reproduced. The extracted temperature T and symmetry energy coefficient asym from SMM simulated events, using the IYR method, are also consistent with those from the experiment. These results strongly suggest that in the multifragmentation process there is a freezeout volume, in which the thermal and chemical equilibrium is established before or at the time of the intermediate-mass fragments emission.
[Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

PubMed

Golder, W

1999-09-01

To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.
Randomly iterated search and statistical competency as powerful inversion tools for deformation source modeling: Application to volcano interferometric synthetic aperture radar data

NASA Astrophysics Data System (ADS)

Shirzaei, M.; Walter, T. R.

2009-10-01

Modern geodetic techniques provide valuable and near real-time observations of volcanic activity. Characterizing the source of deformation based on these observations has become of major importance in related monitoring efforts. We investigate two random search approaches, simulated annealing (SA) and genetic algorithm (GA), and utilize them in an iterated manner. The iterated approach helps to prevent GA in general and SA in particular from getting trapped in local minima, and it also increases redundancy for exploring the search space. We apply a statistical competency test for estimating the confidence interval of the inversion source parameters, considering their internal interaction through the model, the effect of the model deficiency, and the observational error. Here, we present and test this new randomly iterated search and statistical competency (RISC) optimization method together with GA and SA for the modeling of data associated with volcanic deformations. Following synthetic and sensitivity tests, we apply the improved inversion techniques to two episodes of activity in the Campi Flegrei volcanic region in Italy, observed by the interferometric synthetic aperture radar technique. Inversion of these data allows derivation of deformation source parameters and their associated quality so that we can compare the two inversion methods. The RISC approach was found to be an efficient method in terms of computation time and search results and may be applied to other optimization problems in volcanic and tectonic environments.
Fully Bayesian tests of neutrality using genealogical summary statistics.

PubMed

Drummond, Alexei J; Suchard, Marc A

2008-10-31

Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics

PubMed Central

Chen, Wenan; Larrabee, Beth R.; Ovsyannikova, Inna G.; Kennedy, Richard B.; Haralambieva, Iana H.; Poland, Gregory A.; Schaid, Daniel J.

2015-01-01

Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. PMID:25948564
Computational methods to extract meaning from text and advance theories of human cognition.

PubMed

McNamara, Danielle S

2011-01-01

Over the past two decades, researchers have made great advances in the area of computational methods for extracting meaning from text. This research has to a large extent been spurred by the development of latent semantic analysis (LSA), a method for extracting and representing the meaning of words using statistical computations applied to large corpora of text. Since the advent of LSA, researchers have developed and tested alternative statistical methods designed to detect and analyze meaning in text corpora. This research exemplifies how statistical models of semantics play an important role in our understanding of cognition and contribute to the field of cognitive science. Importantly, these models afford large-scale representations of human knowledge and allow researchers to explore various questions regarding knowledge, discourse processing, text comprehension, and language. This topic includes the latest progress by the leading researchers in the endeavor to go beyond LSA. Copyright © 2010 Cognitive Science Society, Inc.
Inferring Demographic History Using Two-Locus Statistics.

PubMed

Ragsdale, Aaron P; Gutenkunst, Ryan N

2017-06-01

Population demographic history may be learned from contemporary genetic variation data. Methods based on aggregating the statistics of many single loci into an allele frequency spectrum (AFS) have proven powerful, but such methods ignore potentially informative patterns of linkage disequilibrium (LD) between neighboring loci. To leverage such patterns, we developed a composite-likelihood framework for inferring demographic history from aggregated statistics of pairs of loci. Using this framework, we show that two-locus statistics are more sensitive to demographic history than single-locus statistics such as the AFS. In particular, two-locus statistics escape the notorious confounding of depth and duration of a bottleneck, and they provide a means to estimate effective population size based on the recombination rather than mutation rate. We applied our approach to a Zambian population of Drosophila melanogaster Notably, using both single- and two-locus statistics, we inferred a substantially lower ancestral effective population size than previous works and did not infer a bottleneck history. Together, our results demonstrate the broad potential for two-locus statistics to enable powerful population genetic inference. Copyright © 2017 by the Genetics Society of America.
Data processing of qualitative results from an interlaboratory comparison for the detection of “Flavescence dorée” phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology

PubMed Central

Renaudin, Isabelle; Poliakoff, Françoise

2017-01-01

A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of “Flavescence dorée” (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes’ theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods. PMID:28384335
Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

PubMed

Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise

2017-01-01

A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their combination can be applied to many other studies concerning plant pathogens and other disciplines that use qualitative detection methods.
A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy).

PubMed

Lo Presti, Rossella; Barca, Emanuele; Passarella, Giuseppe

2010-01-01

Environmental time series are often affected by the "presence" of missing data, but when dealing statistically with data, the need to fill in the gaps estimating the missing values must be considered. At present, a large number of statistical techniques are available to achieve this objective; they range from very simple methods, such as using the sample mean, to very sophisticated ones, such as multiple imputation. A brand new methodology for missing data estimation is proposed, which tries to merge the obvious advantages of the simplest techniques (e.g. their vocation to be easily implemented) with the strength of the newest techniques. The proposed method consists in the application of two consecutive stages: once it has been ascertained that a specific monitoring station is affected by missing data, the "most similar" monitoring stations are identified among neighbouring stations on the basis of a suitable similarity coefficient; in the second stage, a regressive method is applied in order to estimate the missing data. In this paper, four different regressive methods are applied and compared, in order to determine which is the most reliable for filling in the gaps, using rainfall data series measured in the Candelaro River Basin located in South Italy.
Statistical analysis to assess automated level of suspicion scoring methods in breast ultrasound

NASA Astrophysics Data System (ADS)

Galperin, Michael

2003-05-01

A well-defined rule-based system has been developed for scoring 0-5 the Level of Suspicion (LOS) based on qualitative lexicon describing the ultrasound appearance of breast lesion. The purposes of the research are to asses and select one of the automated LOS scoring quantitative methods developed during preliminary studies in benign biopsies reduction. The study has used Computer Aided Imaging System (CAIS) to improve the uniformity and accuracy of applying the LOS scheme by automatically detecting, analyzing and comparing breast masses. The overall goal is to reduce biopsies on the masses with lower levels of suspicion, rather that increasing the accuracy of diagnosis of cancers (will require biopsy anyway). On complex cysts and fibroadenoma cases experienced radiologists were up to 50% less certain in true negatives than CAIS. Full correlation analysis was applied to determine which of the proposed LOS quantification methods serves CAIS accuracy the best. This paper presents current results of applying statistical analysis for automated LOS scoring quantification for breast masses with known biopsy results. It was found that First Order Ranking method yielded most the accurate results. The CAIS system (Image Companion, Data Companion software) is developed by Almen Laboratories and was used to achieve the results.
A comparison of high-frequency cross-correlation measures

NASA Astrophysics Data System (ADS)

Precup, Ovidiu V.; Iori, Giulia

2004-12-01

On a high-frequency scale the time series are not homogeneous, therefore standard correlation measures cannot be directly applied to the raw data. There are two ways to deal with this problem. The time series can be homogenised through an interpolation method (An Introduction to High-Frequency Finance, Academic Press, NY, 2001) (linear or previous tick) and then the Pearson correlation statistic computed. Recently, methods that can handle raw non-synchronous time series have been developed (Int. J. Theor. Appl. Finance 6(1) (2003) 87; J. Empirical Finance 4 (1997) 259). This paper compares two traditional methods that use interpolation with an alternative method applied directly to the actual time series.
A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

PubMed

Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

2011-05-26

Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.
Statistical testing of association between menstruation and migraine.

PubMed

Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G

2015-02-01

To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.

*K-means and cluster models for cancer signatures.

PubMed

Kakushadze, Zura; Yu, Willie

2017-09-01

We present *K-means clustering algorithm and source code by expanding statistical clustering methods applied in https://ssrn.com/abstract=2802753 to quantitative finance. *K-means is statistically deterministic without specifying initial centers, etc. We apply *K-means to extracting cancer signatures from genome data without using nonnegative matrix factorization (NMF). *K-means' computational cost is a fraction of NMF's. Using 1389 published samples for 14 cancer types, we find that 3 cancers (liver cancer, lung cancer and renal cell carcinoma) stand out and do not have cluster-like structures. Two clusters have especially high within-cluster correlations with 11 other cancers indicating common underlying structures. Our approach opens a novel avenue for studying such structures. *K-means is universal and can be applied in other fields. We discuss some potential applications in quantitative finance.
Application of radioisotope XRF and thermoluminescence (TL) dating in investigation of pottery from Tell AL-Kasra archaeological site, Syria.

PubMed

Abboud, R; Issa, H; Abed-Allah, Y D; Bakraji, E H

2015-11-01

Statistical analysis based on chemical composition, using radioisotope X-ray fluorescence, have been applied on 39 ancient pottery fragments coming from the excavation at Tell Al-Kasra archaeological site, Syria. Three groups were defined by applying Cluster and Factor analysis statistical methods. Thermoluminescence (TL) dating was investigated on three sherds taken from the bathroom (hammam) on the site. Multiple aliquot additive dose (MAAD) was used to estimate the paleodose value, and the gamma spectrometry was used to estimate the dose rate. The average age was found to be 715±36 year. Copyright © 2015 Elsevier Ltd. All rights reserved.
Mixed-Methods Research in Nutrition and Dietetics.

PubMed

Zoellner, Jamie; Harris, Jeffrey E

2017-05-01

This work focuses on mixed-methods research (MMR) and is the 11th in a series exploring the importance of research design, statistical analysis, and epidemiologic methods as applied to nutrition and dietetics research. MMR research is an investigative technique that applies both quantitative and qualitative data. The purpose of this article is to define MMR; describe its history and nature; provide reasons for its use; describe and explain the six different MMR designs; describe sample selection; and provide guidance in data collection, analysis, and inference. MMR concepts are applied and integrated with nutrition-related scenarios in real-world research contexts and summary recommendations are provided. Copyright © 2017 Academy of Nutrition and Dietetics. Published by Elsevier Inc. All rights reserved.
Random Numbers and Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Scherer, Philipp O. J.

Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.
Nonparametric rank regression for analyzing water quality concentration data with multiple detection limits.

PubMed

Fu, Liya; Wang, You-Gan

2011-02-15

Environmental data usually include measurements, such as water quality data, which fall below detection limits, because of limitations of the instruments or of certain analytical methods used. The fact that some responses are not detected needs to be properly taken into account in statistical analysis of such data. However, it is well-known that it is challenging to analyze a data set with detection limits, and we often have to rely on the traditional parametric methods or simple imputation methods. Distributional assumptions can lead to biased inference and justification of distributions is often not possible when the data are correlated and there is a large proportion of data below detection limits. The extent of bias is usually unknown. To draw valid conclusions and hence provide useful advice for environmental management authorities, it is essential to develop and apply an appropriate statistical methodology. This paper proposes rank-based procedures for analyzing non-normally distributed data collected at different sites over a period of time in the presence of multiple detection limits. To take account of temporal correlations within each site, we propose an optimal linear combination of estimating functions and apply the induced smoothing method to reduce the computational burden. Finally, we apply the proposed method to the water quality data collected at Susquehanna River Basin in United States of America, which clearly demonstrates the advantages of the rank regression models.
Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

PubMed Central

Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

2015-01-01

Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481
Experimental design and statistical methods for improved hit detection in high-throughput screening.

PubMed

Malo, Nathalie; Hanley, James A; Carlile, Graeme; Liu, Jing; Pelletier, Jerry; Thomas, David; Nadon, Robert

2010-09-01

Identification of active compounds in high-throughput screening (HTS) contexts can be substantially improved by applying classical experimental design and statistical inference principles to all phases of HTS studies. The authors present both experimental and simulated data to illustrate how true-positive rates can be maximized without increasing false-positive rates by the following analytical process. First, the use of robust data preprocessing methods reduces unwanted variation by removing row, column, and plate biases. Second, replicate measurements allow estimation of the magnitude of the remaining random error and the use of formal statistical models to benchmark putative hits relative to what is expected by chance. Receiver Operating Characteristic (ROC) analyses revealed superior power for data preprocessed by a trimmed-mean polish method combined with the RVM t-test, particularly for small- to moderate-sized biological hits.
A Gentle Introduction to Bayesian Analysis: Applications to Developmental Research

ERIC Educational Resources Information Center

van de Schoot, Rens; Kaplan, David; Denissen, Jaap; Asendorpf, Jens B.; Neyer, Franz J.; van Aken, Marcel A. G.

2014-01-01

Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In this study a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attractive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlying Bayesian methods are…
A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

ERIC Educational Resources Information Center

Baker, Bruce D.; Richards, Craig E.

1999-01-01

Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…
Power Analysis for Complex Mediational Designs Using Monte Carlo Methods

ERIC Educational Resources Information Center

Thoemmes, Felix; MacKinnon, David P.; Reiser, Mark R.

2010-01-01

Applied researchers often include mediation effects in applications of advanced methods such as latent variable models and linear growth curve models. Guidance on how to estimate statistical power to detect mediation for these models has not yet been addressed in the literature. We describe a general framework for power analyses for complex…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

ERIC Educational Resources Information Center

Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

2018-01-01

The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Post-processing method for wind speed ensemble forecast using wind speed and direction

NASA Astrophysics Data System (ADS)

Sofie Eide, Siri; Bjørnar Bremnes, John; Steinsland, Ingelin

2017-04-01

Statistical methods are widely applied to enhance the quality of both deterministic and ensemble NWP forecasts. In many situations, like wind speed forecasting, most of the predictive information is contained in one variable in the NWP models. However, in statistical calibration of deterministic forecasts it is often seen that including more variables can further improve forecast skill. For ensembles this is rarely taken advantage of, mainly due to that it is generally not straightforward how to include multiple variables. In this study, it is demonstrated how multiple variables can be included in Bayesian model averaging (BMA) by using a flexible regression method for estimating the conditional means. The method is applied to wind speed forecasting at 204 Norwegian stations based on wind speed and direction forecasts from the ECMWF ensemble system. At about 85 % of the sites the ensemble forecasts were improved in terms of CRPS by adding wind direction as predictor compared to only using wind speed. On average the improvements were about 5 %, but mainly for moderate to strong wind situations. For weak wind speeds adding wind direction had more or less neutral impact.
Statistically qualified neuro-analytic failure detection method and system

DOEpatents

Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

2002-03-02

An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.
Statistical methods for investigating quiescence and other temporal seismicity patterns

USGS Publications Warehouse

Matthews, M.V.; Reasenberg, P.A.

1988-01-01

We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.
An entropy-based statistic for genomewide association studies.

PubMed

Zhao, Jinying; Boerwinkle, Eric; Xiong, Momiao

2005-07-01

Efficient genotyping methods and the availability of a large collection of single-nucleotide polymorphisms provide valuable tools for genetic studies of human disease. The standard chi2 statistic for case-control studies, which uses a linear function of allele frequencies, has limited power when the number of marker loci is large. We introduce a novel test statistic for genetic association studies that uses Shannon entropy and a nonlinear function of allele frequencies to amplify the differences in allele and haplotype frequencies to maintain statistical power with large numbers of marker loci. We investigate the relationship between the entropy-based test statistic and the standard chi2 statistic and show that, in most cases, the power of the entropy-based statistic is greater than that of the standard chi2 statistic. The distribution of the entropy-based statistic and the type I error rates are validated using simulation studies. Finally, we apply the new entropy-based test statistic to two real data sets, one for the COMT gene and schizophrenia and one for the MMP-2 gene and esophageal carcinoma, to evaluate the performance of the new method for genetic association studies. The results show that the entropy-based statistic obtained smaller P values than did the standard chi2 statistic.
Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review.

PubMed

Weir, Christopher J; Butcher, Isabella; Assi, Valentina; Lewis, Stephanie C; Murray, Gordon D; Langhorne, Peter; Brady, Marian C

2018-03-07

Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual participant data. For continuous outcomes, especially those with naturally skewed distributions, summary information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal, we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis. We undertook two systematic literature reviews to identify methodological approaches used to deal with missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited reference searching and emailed topic experts to identify recent methodological developments. Details recorded included the description of the method, the information required to implement the method, any underlying assumptions and whether the method could be readily applied in standard statistical software. We provided a summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios. For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when replacing a missing SD the approximation using the range minimised loss of precision and generally performed better than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials gave superior results. Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median) reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or variability summary statistics within meta-analyses.
Practical statistics in pain research.

PubMed

Kim, Tae Kyun

2017-10-01

Pain is subjective, while statistics related to pain research are objective. This review was written to help researchers involved in pain research make statistical decisions. The main issues are related with the level of scales that are often used in pain research, the choice of statistical methods between parametric or nonparametric statistics, and problems which arise from repeated measurements. In the field of pain research, parametric statistics used to be applied in an erroneous way. This is closely related with the scales of data and repeated measurements. The level of scales includes nominal, ordinal, interval, and ratio scales. The level of scales affects the choice of statistics between parametric or non-parametric methods. In the field of pain research, the most frequently used pain assessment scale is the ordinal scale, which would include the visual analogue scale (VAS). There used to be another view, however, which considered the VAS to be an interval or ratio scale, so that the usage of parametric statistics would be accepted practically in some cases. Repeated measurements of the same subjects always complicates statistics. It means that measurements inevitably have correlations between each other, and would preclude the application of one-way ANOVA in which independence between the measurements is necessary. Repeated measures of ANOVA (RMANOVA), however, would permit the comparison between the correlated measurements as long as the condition of sphericity assumption is satisfied. Conclusively, parametric statistical methods should be used only when the assumptions of parametric statistics, such as normality and sphericity, are established.
Adaptive Design of Confirmatory Trials: Advances and Challenges

PubMed Central

Lai, Tze Leung; Lavori, Philip W.; Tsang, Ka Wai

2015-01-01

The past decade witnessed major developments in innovative designs of confirmatory clinical trials, and adaptive designs represent the most active area of these developments. We give an overview of the developments and associated statistical methods in several classes of adaptive designs of confirmatory trials. We also discuss their statistical difficulties and implementation challenges, and show how these problems are connected to other branches of mainstream Statistics, which we then apply to resolve the difficulties and bypass the bottlenecks in the development of adaptive designs for the next decade. PMID:26079372
Dynamic evolution of nearby galaxy clusters

NASA Astrophysics Data System (ADS)

Biernacka, M.; Flin, P.

2011-06-01

A study of the evolution of 377 rich ACO clusters with redshift z<0.2 is presented. The data concerning galaxies in the investigated clusters were obtained using FOCAS packages applied to Digital Sky Survey I. The 377 galaxy clusters constitute a statistically uniform sample to which visual galaxy/star reclassifications were applied. Cluster shape within 2.0 h-1 Mpc from the adopted cluster centre (the mean and the median of all galaxy coordinates, the position of the brightest and of the third brightest galaxy in the cluster) was determined through its ellipticity calculated using two methods: the covariance ellipse method (hereafter CEM) and the method based on Minkowski functionals (hereafter MFM). We investigated ellipticity dependence on the radius of circular annuli, in which ellipticity was calculated. This was realized by varying the radius from 0.5 to 2 Mpc in steps of 0.25 Mpc. By performing Monte Carlo simulations, we generated clusters to which the two ellipticity methods were applied. We found that the covariance ellipse method works better than the method based on Minkowski functionals. We also found that ellipticity distributions are different for different methods used. Using the ellipticity-redshift relation, we investigated the possibility of cluster evolution in the low-redshift Universe. The correlation of cluster ellipticities with redshifts is undoubtly an indicator of structural evolution. Using the t-Student statistics, we found a statistically significant correlation between ellipticity and redshift at the significance level of α = 0.95. In one of the two shape determination methods we found that ellipticity grew with redshift, while the other method gave opposite results. Monte Carlo simulations showed that only ellipticities calculated at the distance of 1.5 Mpc from cluster centre in the Minkowski functional method are robust enough to be taken into account, but for that radius we did not find any relation between e and z. Since CEM pointed towards the existence of the e(z) relation, we conclude that such an effect is real though rather weak. A detailed study of the e(z) relation showed that the observed relation is nonlinear, and the number of elongated structures grows rapidly for z>0.14.
Alternative Derivations of the Statistical Mechanical Distribution Laws

PubMed Central

Wall, Frederick T.

1971-01-01

A new approach is presented for the derivation of statistical mechanical distribution laws. The derivations are accomplished by minimizing the Helmholtz free energy under constant temperature and volume, instead of maximizing the entropy under constant energy and volume. An alternative method involves stipulating equality of chemical potential, or equality of activity, for particles in different energy levels. This approach leads to a general statement of distribution laws applicable to all systems for which thermodynamic probabilities can be written. The methods also avoid use of the calculus of variations, Lagrangian multipliers, and Stirling's approximation for the factorial. The results are applied specifically to Boltzmann, Fermi-Dirac, and Bose-Einstein statistics. The special significance of chemical potential and activity is discussed for microscopic systems. PMID:16578712

Alternative derivations of the statistical mechanical distribution laws.

PubMed

Wall, F T

1971-08-01

A new approach is presented for the derivation of statistical mechanical distribution laws. The derivations are accomplished by minimizing the Helmholtz free energy under constant temperature and volume, instead of maximizing the entropy under constant energy and volume. An alternative method involves stipulating equality of chemical potential, or equality of activity, for particles in different energy levels. This approach leads to a general statement of distribution laws applicable to all systems for which thermodynamic probabilities can be written. The methods also avoid use of the calculus of variations, Lagrangian multipliers, and Stirling's approximation for the factorial. The results are applied specifically to Boltzmann, Fermi-Dirac, and Bose-Einstein statistics. The special significance of chemical potential and activity is discussed for microscopic systems.
Statistical inference for template aging

NASA Astrophysics Data System (ADS)

Schuckers, Michael E.

2006-04-01

A change in classification error rates for a biometric device is often referred to as template aging. Here we offer two methods for determining whether the effect of time is statistically significant. The first of these is the use of a generalized linear model to determine if these error rates change linearly over time. This approach generalizes previous work assessing the impact of covariates using generalized linear models. The second approach uses of likelihood ratio tests methodology. The focus here is on statistical methods for estimation not the underlying cause of the change in error rates over time. These methodologies are applied to data from the National Institutes of Standards and Technology Biometric Score Set Release 1. The results of these applications are discussed.
Independent component analysis for automatic note extraction from musical trills

NASA Astrophysics Data System (ADS)

Brown, Judith C.; Smaragdis, Paris

2004-05-01

The method of principal component analysis, which is based on second-order statistics (or linear independence), has long been used for redundancy reduction of audio data. The more recent technique of independent component analysis, enforcing much stricter statistical criteria based on higher-order statistical independence, is introduced and shown to be far superior in separating independent musical sources. This theory has been applied to piano trills and a database of trill rates was assembled from experiments with a computer-driven piano, recordings of a professional pianist, and commercially available compact disks. The method of independent component analysis has thus been shown to be an outstanding, effective means of automatically extracting interesting musical information from a sea of redundant data.
General Framework for Meta-analysis of Rare Variants in Sequencing Association Studies

PubMed Central

Lee, Seunggeun; Teslovich, Tanya M.; Boehnke, Michael; Lin, Xihong

2013-01-01

We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels. PMID:23768515
A weighted U-statistic for genetic association analyses of sequencing data.

PubMed

Wei, Changshuai; Li, Ming; He, Zihuai; Vsevolozhskaya, Olga; Schaid, Daniel J; Lu, Qing

2014-12-01

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol. © 2014 WILEY PERIODICALS, INC.
Improvement of Microtremor Data Filtering and Processing Methods Used in Determining the Fundamental Frequency of Urban Areas

NASA Astrophysics Data System (ADS)

Mousavi Anzehaee, Mohammad; Adib, Ahmad; Heydarzadeh, Kobra

2015-10-01

The manner of microtremor data collection and filtering operation and also the method used for processing have a considerable effect on the accuracy of estimation of dynamic soil parameters. In this paper, running variance method was used to improve the automatic detection of data sections infected by local perturbations. In this method, the microtremor data running variance is computed using a sliding window. Then the obtained signal is used to remove the ranges of data affected by perturbations from the original data. Additionally, to determinate the fundamental frequency of a site, this study has proposed a statistical characteristics-based method. Actually, statistical characteristics, such as the probability density graph and the average and the standard deviation of all the frequencies corresponding to the maximum peaks in the H/ V spectra of all data windows, are used to differentiate the real peaks from the false peaks resulting from perturbations. The methods have been applied to the data recorded for the City of Meybod in central Iran. Experimental results show that the applied methods are able to successfully reduce the effects of extensive local perturbations on microtremor data and eventually to estimate the fundamental frequency more accurately compared to other common methods.
Wavelet analysis in ecology and epidemiology: impact of statistical tests

PubMed Central

Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

2014-01-01

Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.

PubMed

Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario

2014-02-06

Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.

PubMed

Pasaniuc, Bogdan; Zaitlen, Noah; Shi, Huwenbo; Bhatia, Gaurav; Gusev, Alexander; Pickrell, Joseph; Hirschhorn, Joel; Strachan, David P; Patterson, Nick; Price, Alkes L

2014-10-15

Imputation using external reference panels (e.g. 1000 Genomes) is a widely used approach for increasing power in genome-wide association studies and meta-analysis. Existing hidden Markov models (HMM)-based imputation approaches require individual-level genotypes. Here, we develop a new method for Gaussian imputation from summary association statistics, a type of data that is becoming widely available. In simulations using 1000 Genomes (1000G) data, this method recovers 84% (54%) of the effective sample size for common (>5%) and low-frequency (1-5%) variants [increasing to 87% (60%) when summary linkage disequilibrium information is available from target samples] versus the gold standard of 89% (67%) for HMM-based imputation, which cannot be applied to summary statistics. Our approach accounts for the limited sample size of the reference panel, a crucial step to eliminate false-positive associations, and it is computationally very fast. As an empirical demonstration, we apply our method to seven case-control phenotypes from the Wellcome Trust Case Control Consortium (WTCCC) data and a study of height in the British 1958 birth cohort (1958BC). Gaussian imputation from summary statistics recovers 95% (105%) of the effective sample size (as quantified by the ratio of [Formula: see text] association statistics) compared with HMM-based imputation from individual-level genotypes at the 227 (176) published single nucleotide polymorphisms (SNPs) in the WTCCC (1958BC height) data. In addition, for publicly available summary statistics from large meta-analyses of four lipid traits, we publicly release imputed summary statistics at 1000G SNPs, which could not have been obtained using previously published methods, and demonstrate their accuracy by masking subsets of the data. We show that 1000G imputation using our approach increases the magnitude and statistical evidence of enrichment at genic versus non-genic loci for these traits, as compared with an analysis without 1000G imputation. Thus, imputation of summary statistics will be a valuable tool in future functional enrichment analyses. Publicly available software package available at http://bogdan.bioinformatics.ucla.edu/software/. bpasaniuc@mednet.ucla.edu or aprice@hsph.harvard.edu Supplementary materials are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical Methods for Proteomic Biomarker Discovery based on Feature Extraction or Functional Modeling Approaches.

PubMed

Morris, Jeffrey S

2012-01-01

In recent years, developments in molecular biotechnology have led to the increased promise of detecting and validating biomarkers, or molecular markers that relate to various biological or medical outcomes. Proteomics, the direct study of proteins in biological samples, plays an important role in the biomarker discovery process. These technologies produce complex, high dimensional functional and image data that present many analytical challenges that must be addressed properly for effective comparative proteomics studies that can yield potential biomarkers. Specific challenges include experimental design, preprocessing, feature extraction, and statistical analysis accounting for the inherent multiple testing issues. This paper reviews various computational aspects of comparative proteomic studies, and summarizes contributions I along with numerous collaborators have made. First, there is an overview of comparative proteomics technologies, followed by a discussion of important experimental design and preprocessing issues that must be considered before statistical analysis can be done. Next, the two key approaches to analyzing proteomics data, feature extraction and functional modeling, are described. Feature extraction involves detection and quantification of discrete features like peaks or spots that theoretically correspond to different proteins in the sample. After an overview of the feature extraction approach, specific methods for mass spectrometry ( Cromwell ) and 2D gel electrophoresis ( Pinnacle ) are described. The functional modeling approach involves modeling the proteomic data in their entirety as functions or images. A general discussion of the approach is followed by the presentation of a specific method that can be applied, wavelet-based functional mixed models, and its extensions. All methods are illustrated by application to two example proteomic data sets, one from mass spectrometry and one from 2D gel electrophoresis. While the specific methods presented are applied to two specific proteomic technologies, MALDI-TOF and 2D gel electrophoresis, these methods and the other principles discussed in the paper apply much more broadly to other expression proteomics technologies.
Tensor Minkowski Functionals for random fields on the sphere

NASA Astrophysics Data System (ADS)

Chingangbam, Pravabati; Yogendran, K. P.; Joby, P. K.; Ganesan, Vidhya; Appleby, Stephen; Park, Changbom

2017-12-01

We generalize the translation invariant tensor-valued Minkowski Functionals which are defined on two-dimensional flat space to the unit sphere. We apply them to level sets of random fields. The contours enclosing boundaries of level sets of random fields give a spatial distribution of random smooth closed curves. We outline a method to compute the tensor-valued Minkowski Functionals numerically for any random field on the sphere. Then we obtain analytic expressions for the ensemble expectation values of the matrix elements for isotropic Gaussian and Rayleigh fields. The results hold on flat as well as any curved space with affine connection. We elucidate the way in which the matrix elements encode information about the Gaussian nature and statistical isotropy (or departure from isotropy) of the field. Finally, we apply the method to maps of the Galactic foreground emissions from the 2015 PLANCK data and demonstrate their high level of statistical anisotropy and departure from Gaussianity.
Metamodels for Computer-Based Engineering Design: Survey and Recommendations

NASA Technical Reports Server (NTRS)

Simpson, Timothy W.; Peplinski, Jesse; Koch, Patrick N.; Allen, Janet K.

1997-01-01

The use of statistical techniques to build approximations of expensive computer analysis codes pervades much of todays engineering design. These statistical approximations, or metamodels, are used to replace the actual expensive computer analyses, facilitating multidisciplinary, multiobjective optimization and concept exploration. In this paper we review several of these techniques including design of experiments, response surface methodology, Taguchi methods, neural networks, inductive learning, and kriging. We survey their existing application in engineering design and then address the dangers of applying traditional statistical techniques to approximate deterministic computer analysis codes. We conclude with recommendations for the appropriate use of statistical approximation techniques in given situations and how common pitfalls can be avoided.
Reconstructing Macroeconomics Based on Statistical Physics

NASA Astrophysics Data System (ADS)

Aoki, Masanao; Yoshikawa, Hiroshi

We believe that time has come to integrate the new approach based on statistical physics or econophysics into macroeconomics. Toward this goal, there must be more dialogues between physicists and economists. In this paper, we argue that there is no reason why the methods of statistical physics so successful in many fields of natural sciences cannot be usefully applied to macroeconomics that is meant to analyze the macroeconomy comprising a large number of economic agents. It is, in fact, weird to regard the macroeconomy as a homothetic enlargement of the representative micro agent. We trust the bright future of the new approach to macroeconomies based on statistical physics.
Design, analysis, and interpretation of field quality-control data for water-sampling projects

USGS Publications Warehouse

Mueller, David K.; Schertz, Terry L.; Martin, Jeffrey D.; Sandstrom, Mark W.

2015-01-01

The report provides extensive information about statistical methods used to analyze quality-control data in order to estimate potential bias and variability in environmental data. These methods include construction of confidence intervals on various statistical measures, such as the mean, percentiles and percentages, and standard deviation. The methods are used to compare quality-control results with the larger set of environmental data in order to determine whether the effects of bias and variability might interfere with interpretation of these data. Examples from published reports are presented to illustrate how the methods are applied, how bias and variability are reported, and how the interpretation of environmental data can be qualified based on the quality-control analysis.
A new statistical method for characterizing the atmospheres of extrasolar planets

NASA Astrophysics Data System (ADS)

Henderson, Cassandra S.; Skemer, Andrew J.; Morley, Caroline V.; Fortney, Jonathan J.

2017-10-01

By detecting light from extrasolar planets, we can measure their compositions and bulk physical properties. The technologies used to make these measurements are still in their infancy, and a lack of self-consistency suggests that previous observations have underestimated their systemic errors. We demonstrate a statistical method, newly applied to exoplanet characterization, which uses a Bayesian formalism to account for underestimated errorbars. We use this method to compare photometry of a substellar companion, GJ 758b, with custom atmospheric models. Our method produces a probability distribution of atmospheric model parameters including temperature, gravity, cloud model (fsed) and chemical abundance for GJ 758b. This distribution is less sensitive to highly variant data and appropriately reflects a greater uncertainty on parameter fits.
Statistical Analysis of the Uncertainty in Pre-Flight Aerodynamic Database of a Hypersonic Vehicle

NASA Astrophysics Data System (ADS)

Huh, Lynn

The objective of the present research was to develop a new method to derive the aerodynamic coefficients and the associated uncertainties for flight vehicles via post- flight inertial navigation analysis using data from the inertial measurement unit. Statistical estimates of vehicle state and aerodynamic coefficients are derived using Monte Carlo simulation. Trajectory reconstruction using the inertial navigation system (INS) is a simple and well used method. However, deriving realistic uncertainties in the reconstructed state and any associated parameters is not so straight forward. Extended Kalman filters, batch minimum variance estimation and other approaches have been used. However, these methods generally depend on assumed physical models, assumed statistical distributions (usually Gaussian) or have convergence issues for non-linear problems. The approach here assumes no physical models, is applicable to any statistical distribution, and does not have any convergence issues. The new approach obtains the statistics directly from a sufficient number of Monte Carlo samples using only the generally well known gyro and accelerometer specifications and could be applied to the systems of non-linear form and non-Gaussian distribution. When redundant data are available, the set of Monte Carlo simulations are constrained to satisfy the redundant data within the uncertainties specified for the additional data. The proposed method was applied to validate the uncertainty in the pre-flight aerodynamic database of the X-43A Hyper-X research vehicle. In addition to gyro and acceleration data, the actual flight data include redundant measurements of position and velocity from the global positioning system (GPS). The criteria derived from the blend of the GPS and INS accuracy was used to select valid trajectories for statistical analysis. The aerodynamic coefficients were derived from the selected trajectories by either direct extraction method based on the equations in dynamics, or by the inquiry of the pre-flight aerodynamic database. After the application of the proposed method to the case of the X-43A Hyper-X research vehicle, it was found that 1) there were consistent differences in the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis, 2) the pre-flight estimation of the pitching moment coefficients was significantly different from the post-flight analysis, 3) the type of distribution of the states from the Monte Carlo simulation were affected by that of the perturbation parameters, 4) the uncertainties in the pre-flight model were overestimated, 5) the range where the aerodynamic coefficients from the pre-flight aerodynamic database and post-flight analysis are in closest agreement is between Mach *.* and *.* and more data points may be needed between Mach * and ** in the pre-flight aerodynamic database, 6) selection criterion for valid trajectories from the Monte Carlo simulations was mostly driven by the horizontal velocity error, 7) the selection criterion must be based on reasonable model to ensure the validity of the statistics from the proposed method, and 8) the results from the proposed method applied to the two different flights with the identical geometry and similar flight profile were consistent.
In Defense of P Values

PubMed Central

Verhulst, Brad

2016-01-01

P values have become the scapegoat for a wide variety of problems in science. P values are generally over-emphasized, often incorrectly applied, and in some cases even abused. However, alternative methods of hypothesis testing will likely fall victim to the same criticisms currently leveled at P values if more fundamental changes are not made in the research process. Increasing the general level of statistical literacy and enhancing training in statistical methods provide a potential avenue for identifying, correcting, and preventing erroneous conclusions from entering the academic literature and for improving the general quality of patient care. PMID:28366961
Fine Mapping Causal Variants with an Approximate Bayesian Method Using Marginal Test Statistics.

PubMed

Chen, Wenan; Larrabee, Beth R; Ovsyannikova, Inna G; Kennedy, Richard B; Haralambieva, Iana H; Poland, Gregory A; Schaid, Daniel J

2015-07-01

Two recently developed fine-mapping methods, CAVIAR and PAINTOR, demonstrate better performance over other fine-mapping methods. They also have the advantage of using only the marginal test statistics and the correlation among SNPs. Both methods leverage the fact that the marginal test statistics asymptotically follow a multivariate normal distribution and are likelihood based. However, their relationship with Bayesian fine mapping, such as BIMBAM, is not clear. In this study, we first show that CAVIAR and BIMBAM are actually approximately equivalent to each other. This leads to a fine-mapping method using marginal test statistics in the Bayesian framework, which we call CAVIAR Bayes factor (CAVIARBF). Another advantage of the Bayesian framework is that it can answer both association and fine-mapping questions. We also used simulations to compare CAVIARBF with other methods under different numbers of causal variants. The results showed that both CAVIARBF and BIMBAM have better performance than PAINTOR and other methods. Compared to BIMBAM, CAVIARBF has the advantage of using only marginal test statistics and takes about one-quarter to one-fifth of the running time. We applied different methods on two independent cohorts of the same phenotype. Results showed that CAVIARBF, BIMBAM, and PAINTOR selected the same top 3 SNPs; however, CAVIARBF and BIMBAM had better consistency in selecting the top 10 ranked SNPs between the two cohorts. Software is available at https://bitbucket.org/Wenan/caviarbf. Copyright © 2015 by the Genetics Society of America.
A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

PubMed Central

Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

2016-01-01

Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330
A standards-based method for compositional analysis by energy dispersive X-ray spectrometry using multivariate statistical analysis: application to multicomponent alloys.

PubMed

Rathi, Monika; Ahrenkiel, S P; Carapella, J J; Wanlass, M W

2013-02-01

Given an unknown multicomponent alloy, and a set of standard compounds or alloys of known composition, can one improve upon popular standards-based methods for energy dispersive X-ray (EDX) spectrometry to quantify the elemental composition of the unknown specimen? A method is presented here for determining elemental composition of alloys using transmission electron microscopy-based EDX with appropriate standards. The method begins with a discrete set of related reference standards of known composition, applies multivariate statistical analysis to those spectra, and evaluates the compositions with a linear matrix algebra method to relate the spectra to elemental composition. By using associated standards, only limited assumptions about the physical origins of the EDX spectra are needed. Spectral absorption corrections can be performed by providing an estimate of the foil thickness of one or more reference standards. The technique was applied to III-V multicomponent alloy thin films: composition and foil thickness were determined for various III-V alloys. The results were then validated by comparing with X-ray diffraction and photoluminescence analysis, demonstrating accuracy of approximately 1% in atomic fraction.

A Comparison of Didactic and Inquiry Teaching Methods in a Rural Community College Earth Science Course

NASA Astrophysics Data System (ADS)

Beam, Margery Elizabeth

The combination of increasing enrollment and the importance of providing transfer students a solid foundation in science calls for science faculty to evaluate teaching methods in rural community colleges. The purpose of this study was to examine and compare the effectiveness of two teaching methods, inquiry teaching methods and didactic teaching methods, applied in a rural community college earth science course. Two groups of students were taught the same content via inquiry and didactic teaching methods. Analysis of quantitative data included a non-parametric ranking statistical testing method in which the difference between the rankings and the median of the post-test scores was analyzed for significance. Results indicated there was not a significant statistical difference between the teaching methods for the group of students participating in the research. The practical and educational significance of this study provides valuable perspectives on teaching methods and student learning styles in rural community colleges.
Analytical investigation of different mathematical approaches utilizing manipulation of ratio spectra

NASA Astrophysics Data System (ADS)

Osman, Essam Eldin A.

2018-01-01

This work represents a comparative study of different approaches of manipulating ratio spectra, applied on a binary mixture of ciprofloxacin HCl and dexamethasone sodium phosphate co-formulated as ear drops. The proposed new spectrophotometric methods are: ratio difference spectrophotometric method (RDSM), amplitude center method (ACM), first derivative of the ratio spectra (1DD) and mean centering of ratio spectra (MCR). The proposed methods were checked using laboratory-prepared mixtures and were successfully applied for the analysis of pharmaceutical formulation containing the cited drugs. The proposed methods were validated according to the ICH guidelines. A comparative study was conducted between those methods regarding simplicity, limitations and sensitivity. The obtained results were statistically compared with those obtained from the reported HPLC method, showing no significant difference with respect to accuracy and precision.
Spatial Ensemble Postprocessing of Precipitation Forecasts Using High Resolution Analyses

NASA Astrophysics Data System (ADS)

Lang, Moritz N.; Schicker, Irene; Kann, Alexander; Wang, Yong

2017-04-01

Ensemble prediction systems are designed to account for errors or uncertainties in the initial and boundary conditions, imperfect parameterizations, etc. However, due to sampling errors and underestimation of the model errors, these ensemble forecasts tend to be underdispersive, and to lack both reliability and sharpness. To overcome such limitations, statistical postprocessing methods are commonly applied to these forecasts. In this study, a full-distributional spatial post-processing method is applied to short-range precipitation forecasts over Austria using Standardized Anomaly Model Output Statistics (SAMOS). Following Stauffer et al. (2016), observation and forecast fields are transformed into standardized anomalies by subtracting a site-specific climatological mean and dividing by the climatological standard deviation. Due to the need of fitting only a single regression model for the whole domain, the SAMOS framework provides a computationally inexpensive method to create operationally calibrated probabilistic forecasts for any arbitrary location or for all grid points in the domain simultaneously. Taking advantage of the INCA system (Integrated Nowcasting through Comprehensive Analysis), high resolution analyses are used for the computation of the observed climatology and for model training. The INCA system operationally combines station measurements and remote sensing data into real-time objective analysis fields at 1 km-horizontal resolution and 1 h-temporal resolution. The precipitation forecast used in this study is obtained from a limited area model ensemble prediction system also operated by ZAMG. The so called ALADIN-LAEF provides, by applying a multi-physics approach, a 17-member forecast at a horizontal resolution of 10.9 km and a temporal resolution of 1 hour. The performed SAMOS approach statistically combines the in-house developed high resolution analysis and ensemble prediction system. The station-based validation of 6 hour precipitation sums shows a mean improvement of more than 40% in CRPS when compared to bilinearly interpolated uncalibrated ensemble forecasts. The validation on randomly selected grid points, representing the true height distribution over Austria, still indicates a mean improvement of 35%. The applied statistical model is currently set up for 6-hourly and daily accumulation periods, but will be extended to a temporal resolution of 1-3 hours within a new probabilistic nowcasting system operated by ZAMG.
Process safety improvement--quality and target zero.

PubMed

Van Scyoc, Karl

2008-11-15

Process safety practitioners have adopted quality management principles in design of process safety management systems with positive effect, yet achieving safety objectives sometimes remain a distant target. Companies regularly apply tools and methods which have roots in quality and productivity improvement. The "plan, do, check, act" improvement loop, statistical analysis of incidents (non-conformities), and performance trending popularized by Dr. Deming are now commonly used in the context of process safety. Significant advancements in HSE performance are reported after applying methods viewed as fundamental for quality management. In pursuit of continual process safety improvement, the paper examines various quality improvement methods, and explores how methods intended for product quality can be additionally applied to continual improvement of process safety. Methods such as Kaizen, Poke yoke, and TRIZ, while long established for quality improvement, are quite unfamiliar in the process safety arena. These methods are discussed for application in improving both process safety leadership and field work team performance. Practical ways to advance process safety, based on the methods, are given.
Bayesian statistics and Monte Carlo methods

NASA Astrophysics Data System (ADS)

Koch, K. R.

2018-03-01

The Bayesian approach allows an intuitive way to derive the methods of statistics. Probability is defined as a measure of the plausibility of statements or propositions. Three rules are sufficient to obtain the laws of probability. If the statements refer to the numerical values of variables, the so-called random variables, univariate and multivariate distributions follow. They lead to the point estimation by which unknown quantities, i.e. unknown parameters, are computed from measurements. The unknown parameters are random variables, they are fixed quantities in traditional statistics which is not founded on Bayes' theorem. Bayesian statistics therefore recommends itself for Monte Carlo methods, which generate random variates from given distributions. Monte Carlo methods, of course, can also be applied in traditional statistics. The unknown parameters, are introduced as functions of the measurements, and the Monte Carlo methods give the covariance matrix and the expectation of these functions. A confidence region is derived where the unknown parameters are situated with a given probability. Following a method of traditional statistics, hypotheses are tested by determining whether a value for an unknown parameter lies inside or outside the confidence region. The error propagation of a random vector by the Monte Carlo methods is presented as an application. If the random vector results from a nonlinearly transformed vector, its covariance matrix and its expectation follow from the Monte Carlo estimate. This saves a considerable amount of derivatives to be computed, and errors of the linearization are avoided. The Monte Carlo method is therefore efficient. If the functions of the measurements are given by a sum of two or more random vectors with different multivariate distributions, the resulting distribution is generally not known. TheMonte Carlo methods are then needed to obtain the covariance matrix and the expectation of the sum.
Use of statistical and neural net approaches in predicting toxicity of chemicals.

PubMed

Basak, S C; Grunwald, G D; Gute, B D; Balasubramanian, K; Opitz, D

2000-01-01

Hierarchical quantitative structure-activity relationships (H-QSAR) have been developed as a new approach in constructing models for estimating physicochemical, biomedicinal, and toxicological properties of interest. This approach uses increasingly more complex molecular descriptors in a graduated approach to model building. In this study, statistical and neural network methods have been applied to the development of H-QSAR models for estimating the acute aquatic toxicity (LC50) of 69 benzene derivatives to Pimephales promelas (fathead minnow). Topostructural, topochemical, geometrical, and quantum chemical indices were used as the four levels of the hierarchical method. It is clear from both the statistical and neural network models that topostructural indices alone cannot adequately model this set of congeneric chemicals. Not surprisingly, topochemical indices greatly increase the predictive power of both statistical and neural network models. Quantum chemical indices also add significantly to the modeling of this set of acute aquatic toxicity data.
Applying a statistical PTB detection procedure to complement the gold standard.

PubMed

Noor, Norliza Mohd; Yunus, Ashari; Bakar, S A R Abu; Hussin, Amran; Rijal, Omar Mohd

2011-04-01

This paper investigates a novel statistical discrimination procedure to detect PTB when the gold standard requirement is taken into consideration. Archived data were used to establish two groups of patients which are the control and test group. The control group was used to develop the statistical discrimination procedure using four vectors of wavelet coefficients as feature vectors for the detection of pulmonary tuberculosis (PTB), lung cancer (LC), and normal lung (NL). This discrimination procedure was investigated using the test group where the number of sputum positive and sputum negative cases that were correctly classified as PTB cases were noted. The proposed statistical discrimination method is able to detect PTB patients and LC with high true positive fraction. The method is also able to detect PTB patients that are sputum negative and therefore may be used as a complement to the gold standard. Copyright © 2010 Elsevier Ltd. All rights reserved.
Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes.

PubMed

Harrington, Peter de Boves

2018-01-02

Validation of multivariate models is of current importance for a wide range of chemical applications. Although important, it is neglected. The common practice is to use a single external validation set for evaluation. This approach is deficient and may mislead investigators with results that are specific to the single validation set of data. In addition, no statistics are available regarding the precision of a derived figure of merit (FOM). A statistical approach using bootstrapped Latin partitions is advocated. This validation method makes an efficient use of the data because each object is used once for validation. It was reviewed a decade earlier but primarily for the optimization of chemometric models this review presents the reasons it should be used for generalized statistical validation. Average FOMs with confidence intervals are reported and powerful, matched-sample statistics may be applied for comparing models and methods. Examples demonstrate the problems with single validation sets.
Reconstruction of ensembles of coupled time-delay systems from time series.

PubMed

Sysoev, I V; Prokhorov, M D; Ponomarenko, V I; Bezruchko, B P

2014-06-01

We propose a method to recover from time series the parameters of coupled time-delay systems and the architecture of couplings between them. The method is based on a reconstruction of model delay-differential equations and estimation of statistical significance of couplings. It can be applied to networks composed of nonidentical nodes with an arbitrary number of unidirectional and bidirectional couplings. We test our method on chaotic and periodic time series produced by model equations of ensembles of diffusively coupled time-delay systems in the presence of noise, and apply it to experimental time series obtained from electronic oscillators with delayed feedback coupled by resistors.
STRengthening analytical thinking for observational studies: the STRATOS initiative.

PubMed

Sauerbrei, Willi; Abrahamowicz, Michal; Altman, Douglas G; le Cessie, Saskia; Carpenter, James

2014-12-30

The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times. Unfortunately, many of these methodological developments are ignored in practice. Consequently, design and analysis of observational studies often exhibit serious weaknesses. The lack of guidance on vital practical issues discourages many applied researchers from using more sophisticated and possibly more appropriate methods when analyzing observational studies. Furthermore, many analyses are conducted by researchers with a relatively weak statistical background and limited experience in using statistical methodology and software. Consequently, even 'standard' analyses reported in the medical literature are often flawed, casting doubt on their results and conclusions. An efficient way to help researchers to keep up with recent methodological developments is to develop guidance documents that are spread to the research community at large. These observations led to the initiation of the strengthening analytical thinking for observational studies (STRATOS) initiative, a large collaboration of experts in many different areas of biostatistical research. The objective of STRATOS is to provide accessible and accurate guidance in the design and analysis of observational studies. The guidance is intended for applied statisticians and other data analysts with varying levels of statistical education, experience and interests. In this article, we introduce the STRATOS initiative and its main aims, present the need for guidance documents and outline the planned approach and progress so far. We encourage other biostatisticians to become involved. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

NASA Astrophysics Data System (ADS)

Mazzitello, Karina I.; Candia, Julián

2012-12-01

In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.
Integrated Analysis of Pharmacologic, Clinical, and SNP Microarray Data using Projection onto the Most Interesting Statistical Evidence with Adaptive Permutation Testing

PubMed Central

Pounds, Stan; Cao, Xueyuan; Cheng, Cheng; Yang, Jun; Campana, Dario; Evans, William E.; Pui, Ching-Hon; Relling, Mary V.

2010-01-01

Powerful methods for integrated analysis of multiple biological data sets are needed to maximize interpretation capacity and acquire meaningful knowledge. We recently developed Projection Onto the Most Interesting Statistical Evidence (PROMISE). PROMISE is a statistical procedure that incorporates prior knowledge about the biological relationships among endpoint variables into an integrated analysis of microarray gene expression data with multiple biological and clinical endpoints. Here, PROMISE is adapted to the integrated analysis of pharmacologic, clinical, and genome-wide genotype data that incorporating knowledge about the biological relationships among pharmacologic and clinical response data. An efficient permutation-testing algorithm is introduced so that statistical calculations are computationally feasible in this higher-dimension setting. The new method is applied to a pediatric leukemia data set. The results clearly indicate that PROMISE is a powerful statistical tool for identifying genomic features that exhibit a biologically meaningful pattern of association with multiple endpoint variables. PMID:21516175
A generalized concept for cost-effective structural design. [Statistical Decision Theory applied to aerospace systems

NASA Technical Reports Server (NTRS)

Thomas, J. M.; Hawk, J. D.

1975-01-01

A generalized concept for cost-effective structural design is introduced. It is assumed that decisions affecting the cost effectiveness of aerospace structures fall into three basic categories: design, verification, and operation. Within these basic categories, certain decisions concerning items such as design configuration, safety factors, testing methods, and operational constraints are to be made. All or some of the variables affecting these decisions may be treated probabilistically. Bayesian statistical decision theory is used as the tool for determining the cost optimum decisions. A special case of the general problem is derived herein, and some very useful parametric curves are developed and applied to several sample structures.
Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic.

PubMed

Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B; Chen, Li; Wang, Yue; Clarke, Robert

2012-08-01

Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive 'noise' in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. xuan@vt.edu Supplementary data are available at Bioinformatics online.
Robust identification of transcriptional regulatory networks using a Gibbs sampler on outlier sum statistic

PubMed Central

Gu, Jinghua; Xuan, Jianhua; Riggins, Rebecca B.; Chen, Li; Wang, Yue; Clarke, Robert

2012-01-01

Motivation: Identification of transcriptional regulatory networks (TRNs) is of significant importance in computational biology for cancer research, providing a critical building block to unravel disease pathways. However, existing methods for TRN identification suffer from the inclusion of excessive ‘noise’ in microarray data and false-positives in binding data, especially when applied to human tumor-derived cell line studies. More robust methods that can counteract the imperfection of data sources are therefore needed for reliable identification of TRNs in this context. Results: In this article, we propose to establish a link between the quality of one target gene to represent its regulator and the uncertainty of its expression to represent other target genes. Specifically, an outlier sum statistic was used to measure the aggregated evidence for regulation events between target genes and their corresponding transcription factors. A Gibbs sampling method was then developed to estimate the marginal distribution of the outlier sum statistic, hence, to uncover underlying regulatory relationships. To evaluate the effectiveness of our proposed method, we compared its performance with that of an existing sampling-based method using both simulation data and yeast cell cycle data. The experimental results show that our method consistently outperforms the competing method in different settings of signal-to-noise ratio and network topology, indicating its robustness for biological applications. Finally, we applied our method to breast cancer cell line data and demonstrated its ability to extract biologically meaningful regulatory modules related to estrogen signaling and action in breast cancer. Availability and implementation: The Gibbs sampler MATLAB package is freely available at http://www.cbil.ece.vt.edu/software.htm. Contact: xuan@vt.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22595208
Prediction of the Electromagnetic Field Distribution in a Typical Aircraft Using the Statistical Energy Analysis

NASA Astrophysics Data System (ADS)

Kovalevsky, Louis; Langley, Robin S.; Caro, Stephane

2016-05-01

Due to the high cost of experimental EMI measurements significant attention has been focused on numerical simulation. Classical methods such as Method of Moment or Finite Difference Time Domain are not well suited for this type of problem, as they require a fine discretisation of space and failed to take into account uncertainties. In this paper, the authors show that the Statistical Energy Analysis is well suited for this type of application. The SEA is a statistical approach employed to solve high frequency problems of electromagnetically reverberant cavities at a reduced computational cost. The key aspects of this approach are (i) to consider an ensemble of system that share the same gross parameter, and (ii) to avoid solving Maxwell's equations inside the cavity, using the power balance principle. The output is an estimate of the field magnitude distribution in each cavity. The method is applied on a typical aircraft structure.
Investigation of methods for hydroclimatic data homogenization

NASA Astrophysics Data System (ADS)

Steirou, E.; Koutsoyiannis, D.

2012-04-01

We investigate the methods used for the adjustment of inhomogeneities of temperature time series covering the last 100 years. Based on a systematic study of scientific literature, we classify and evaluate the observed inhomogeneities in historical and modern time series, as well as their adjustment methods. It turns out that these methods are mainly statistical, not well justified by experiments and are rarely supported by metadata. In many of the cases studied the proposed corrections are not even statistically significant. From the global database GHCN-Monthly Version 2, we examine all stations containing both raw and adjusted data that satisfy certain criteria of continuity and distribution over the globe. In the United States of America, because of the large number of available stations, stations were chosen after a suitable sampling. In total we analyzed 181 stations globally. For these stations we calculated the differences between the adjusted and non-adjusted linear 100-year trends. It was found that in the two thirds of the cases, the homogenization procedure increased the positive or decreased the negative temperature trends. One of the most common homogenization methods, 'SNHT for single shifts', was applied to synthetic time series with selected statistical characteristics, occasionally with offsets. The method was satisfactory when applied to independent data normally distributed, but not in data with long-term persistence. The above results cast some doubts in the use of homogenization procedures and tend to indicate that the global temperature increase during the last century is between 0.4°C and 0.7°C, where these two values are the estimates derived from raw and adjusted data, respectively.
Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis

PubMed Central

Chen, Chih-Hao; Hsu, Chueh-Lin; Huang, Shih-Hao; Chen, Shih-Yuan; Hung, Yi-Lin; Chen, Hsiao-Rong; Wu, Yu-Chung

2015-01-01

Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. PMID:25793610
Needs of the Learning Effect on Instructional Website for Vocational High School Students

ERIC Educational Resources Information Center

Lo, Hung-Jen; Fu, Gwo-Liang; Chuang, Kuei-Chih

2013-01-01

The purpose of study was to understand the correlation between the needs of the learning effect on instructional website for the vocational high school students. Our research applied the statistic methods of product-moment correlation, stepwise regression, and structural equation method to analyze the questionnaire with the sample size of 377…
What Is Design-Based Causal Inference for RCTs and Why Should I Use It? NCEE 2017-4025

ERIC Educational Resources Information Center

Schochet, Peter Z.

2017-01-01

Design-based methods have recently been developed as a way to analyze data from impact evaluations of interventions, programs, and policies. The impact estimators are derived using the building blocks of experimental designs with minimal assumptions, and have good statistical properties. The methods apply to randomized controlled trials (RCTs) and…

Characterization Methods for Small Estuarine Systems in the Mid-Atlantic Region of the United States

EPA Science Inventory

Various statistical methods were applied to spatially discrete data from 14 intensively sampled small estuarine systems in the mid-Atlantic U.S. The number of sites per system ranged from 6 to 37. The surface area of the systems ranged from 1.9 to 193.4 km2. Parameters examined ...
Training Needs Assessment of Technical Skills in Managers of Tehran Electricity Distribution Company

ERIC Educational Resources Information Center

Koohi, Amir Hasan; Ghandali, Fatemeh; Dehghan, Hasan; Ghandali, Najme

2016-01-01

Current dissertation has been conducted in order to investigate and detect training needs of the mangers (top and middle) in Tehran Electricity Distribution Company. Research method is applied kind based on its purpose. Due to data collection method, this query is descriptive-survey type. Statistical population in this study is all of managers in…
Improved dynamical scaling analysis using the kernel method for nonequilibrium relaxation.

PubMed

Echinaka, Yuki; Ozeki, Yukiyasu

2016-10-01

The dynamical scaling analysis for the Kosterlitz-Thouless transition in the nonequilibrium relaxation method is improved by the use of Bayesian statistics and the kernel method. This allows data to be fitted to a scaling function without using any parametric model function, which makes the results more reliable and reproducible and enables automatic and faster parameter estimation. Applying this method, the bootstrap method is introduced and a numerical discrimination for the transition type is proposed.
STRengthening Analytical Thinking for Observational Studies: the STRATOS initiative

PubMed Central

Sauerbrei, Willi; Abrahamowicz, Michal; Altman, Douglas G; le Cessie, Saskia; Carpenter, James

2014-01-01

The validity and practical utility of observational medical research depends critically on good study design, excellent data quality, appropriate statistical methods and accurate interpretation of results. Statistical methodology has seen substantial development in recent times. Unfortunately, many of these methodological developments are ignored in practice. Consequently, design and analysis of observational studies often exhibit serious weaknesses. The lack of guidance on vital practical issues discourages many applied researchers from using more sophisticated and possibly more appropriate methods when analyzing observational studies. Furthermore, many analyses are conducted by researchers with a relatively weak statistical background and limited experience in using statistical methodology and software. Consequently, even ‘standard’ analyses reported in the medical literature are often flawed, casting doubt on their results and conclusions. An efficient way to help researchers to keep up with recent methodological developments is to develop guidance documents that are spread to the research community at large. These observations led to the initiation of the strengthening analytical thinking for observational studies (STRATOS) initiative, a large collaboration of experts in many different areas of biostatistical research. The objective of STRATOS is to provide accessible and accurate guidance in the design and analysis of observational studies. The guidance is intended for applied statisticians and other data analysts with varying levels of statistical education, experience and interests. In this article, we introduce the STRATOS initiative and its main aims, present the need for guidance documents and outline the planned approach and progress so far. We encourage other biostatisticians to become involved. PMID:25074480
Pitfalls in statistical landslide susceptibility modelling

NASA Astrophysics Data System (ADS)

Schröder, Boris; Vorpahl, Peter; Märker, Michael; Elsenbeer, Helmut

2010-05-01

The use of statistical methods is a well-established approach to predict landslide occurrence probabilities and to assess landslide susceptibility. This is achieved by applying statistical methods relating historical landslide inventories to topographic indices as predictor variables. In our contribution, we compare several new and powerful methods developed in machine learning and well-established in landscape ecology and macroecology for predicting the distribution of shallow landslides in tropical mountain rainforests in southern Ecuador (among others: boosted regression trees, multivariate adaptive regression splines, maximum entropy). Although these methods are powerful, we think it is necessary to follow a basic set of guidelines to avoid some pitfalls regarding data sampling, predictor selection, and model quality assessment, especially if a comparison of different models is contemplated. We therefore suggest to apply a novel toolbox to evaluate approaches to the statistical modelling of landslide susceptibility. Additionally, we propose some methods to open the "black box" as an inherent part of machine learning methods in order to achieve further explanatory insights into preparatory factors that control landslides. Sampling of training data should be guided by hypotheses regarding processes that lead to slope failure taking into account their respective spatial scales. This approach leads to the selection of a set of candidate predictor variables considered on adequate spatial scales. This set should be checked for multicollinearity in order to facilitate model response curve interpretation. Model quality assesses how well a model is able to reproduce independent observations of its response variable. This includes criteria to evaluate different aspects of model performance, i.e. model discrimination, model calibration, and model refinement. In order to assess a possible violation of the assumption of independency in the training samples or a possible lack of explanatory information in the chosen set of predictor variables, the model residuals need to be checked for spatial auto¬correlation. Therefore, we calculate spline correlograms. In addition to this, we investigate partial dependency plots and bivariate interactions plots considering possible interactions between predictors to improve model interpretation. Aiming at presenting this toolbox for model quality assessment, we investigate the influence of strategies in the construction of training datasets for statistical models on model quality.
Statistical Methods for Quantifying the Variability of Solar Wind Transients of All Sizes

NASA Astrophysics Data System (ADS)

Tindale, E.; Chapman, S. C.

2016-12-01

The solar wind is inherently variable across a wide range of timescales, from small-scale turbulent fluctuations to the 11-year periodicity induced by the solar cycle. Each solar cycle is unique, and this change in overall cycle activity is coupled from the Sun to Earth via the solar wind, leading to long-term trends in space weather. Our work [Tindale & Chapman, 2016] applies novel statistical methods to solar wind transients of all sizes, to quantify the variability of the solar wind associated with the solar cycle. We use the same methods to link solar wind observations with those on the Sun and Earth. We use Wind data to construct quantile-quantile (QQ) plots comparing the statistical distributions of multiple commonly used solar wind-magnetosphere coupling parameters between the minima and maxima of solar cycles 23 and 24. We find that in each case the distribution is multicomponent, ranging from small fluctuations to extreme values, with the same functional form at all phases of the solar cycle. The change in PDF is captured by a simple change of variables, which is independent of the PDF model. Using this method we can quantify the quietness of the cycle 24 maximum, identify which variable drives the changing distribution of composite parameters such as ɛ, and we show that the distribution of ɛ is less sensitive to changes in its extreme values than that of its constituents. After demonstrating the QQ method on solar wind data, we extend the analysis to include solar and magnetospheric data spanning the same time period. We focus on GOES X-ray flux and WDC AE index data. Finally, having studied the statistics of transients across the full distribution, we apply the same method to time series of extreme bursts in each variable. Using these statistical tools, we aim to track the solar cycle-driven variability from the Sun through the solar wind and into the Earth's magnetosphere. Tindale, E. and S.C. Chapman (2016), Geophys. Res. Lett., 43(11), doi: 10.1002/2016GL068920.
Application of multivariate statistical techniques in microbial ecology

PubMed Central

Paliy, O.; Shankar, V.

2016-01-01

Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large scale ecological datasets. Especially noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions, and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amounts of data, powerful statistical techniques of multivariate analysis are well suited to analyze and interpret these datasets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular dataset. In this review we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive, and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and dataset structure. PMID:26786791
Representation of Probability Density Functions from Orbit Determination using the Particle Filter

NASA Technical Reports Server (NTRS)

Mashiku, Alinda K.; Garrison, James; Carpenter, J. Russell

2012-01-01

Statistical orbit determination enables us to obtain estimates of the state and the statistical information of its region of uncertainty. In order to obtain an accurate representation of the probability density function (PDF) that incorporates higher order statistical information, we propose the use of nonlinear estimation methods such as the Particle Filter. The Particle Filter (PF) is capable of providing a PDF representation of the state estimates whose accuracy is dependent on the number of particles or samples used. For this method to be applicable to real case scenarios, we need a way of accurately representing the PDF in a compressed manner with little information loss. Hence we propose using the Independent Component Analysis (ICA) as a non-Gaussian dimensional reduction method that is capable of maintaining higher order statistical information obtained using the PF. Methods such as the Principal Component Analysis (PCA) are based on utilizing up to second order statistics, hence will not suffice in maintaining maximum information content. Both the PCA and the ICA are applied to two scenarios that involve a highly eccentric orbit with a lower apriori uncertainty covariance and a less eccentric orbit with a higher a priori uncertainty covariance, to illustrate the capability of the ICA in relation to the PCA.
Use of statistical study methods for the analysis of the results of the imitation modeling of radiation transfer

NASA Astrophysics Data System (ADS)

Alekseenko, M. A.; Gendrina, I. Yu.

2017-11-01

Recently, due to the abundance of various types of observational data in the systems of vision through the atmosphere and the need for their processing, the use of various methods of statistical research in the study of such systems as correlation-regression analysis, dynamic series, variance analysis, etc. is actual. We have attempted to apply elements of correlation-regression analysis for the study and subsequent prediction of the patterns of radiation transfer in these systems same as in the construction of radiation models of the atmosphere. In this paper, we present some results of statistical processing of the results of numerical simulation of the characteristics of vision systems through the atmosphere obtained with the help of a special software package.1
Statistical physics of hard combinatorial optimization: Vertex cover problem

NASA Astrophysics Data System (ADS)

Zhao, Jin-Hua; Zhou, Hai-Jun

2014-07-01

Typical-case computation complexity is a research topic at the boundary of computer science, applied mathematics, and statistical physics. In the last twenty years, the replica-symmetry-breaking mean field theory of spin glasses and the associated message-passing algorithms have greatly deepened our understanding of typical-case computation complexity. In this paper, we use the vertex cover problem, a basic nondeterministic-polynomial (NP)-complete combinatorial optimization problem of wide application, as an example to introduce the statistical physical methods and algorithms. We do not go into the technical details but emphasize mainly the intuitive physical meanings of the message-passing equations. A nonfamiliar reader shall be able to understand to a large extent the physics behind the mean field approaches and to adjust the mean field methods in solving other optimization problems.
Microscopic saw mark analysis: an empirical approach.

PubMed

Love, Jennifer C; Derrick, Sharon M; Wiersema, Jason M; Peters, Charles

2015-01-01

Microscopic saw mark analysis is a well published and generally accepted qualitative analytical method. However, little research has focused on identifying and mitigating potential sources of error associated with the method. The presented study proposes the use of classification trees and random forest classifiers as an optimal, statistically sound approach to mitigate the potential for error of variability and outcome error in microscopic saw mark analysis. The statistical model was applied to 58 experimental saw marks created with four types of saws. The saw marks were made in fresh human femurs obtained through anatomical gift and were analyzed using a Keyence digital microscope. The statistical approach weighed the variables based on discriminatory value and produced decision trees with an associated outcome error rate of 8.62-17.82%. © 2014 American Academy of Forensic Sciences.
Many-body formalism for fermions: The partition function

NASA Astrophysics Data System (ADS)

Watson, D. K.

2017-09-01

The partition function, a fundamental tenet in statistical thermodynamics, contains in principle all thermodynamic information about a system. It encapsulates both microscopic information through the quantum energy levels and statistical information from the partitioning of the particles among the available energy levels. For identical particles, this statistical accounting is complicated by the symmetry requirements of the allowed quantum states. In particular, for Fermi systems, the enforcement of the Pauli principle is typically a numerically demanding task, responsible for much of the cost of the calculations. The interplay of these three elements—the structure of the many-body spectrum, the statistical partitioning of the N particles among the available levels, and the enforcement of the Pauli principle—drives the behavior of mesoscopic and macroscopic Fermi systems. In this paper, we develop an approach for the determination of the partition function, a numerically difficult task, for systems of strongly interacting identical fermions and apply it to a model system of harmonically confined, harmonically interacting fermions. This approach uses a recently introduced many-body method that is an extension of the symmetry-invariant perturbation method (SPT) originally developed for bosons. It uses group theory and graphical techniques to avoid the heavy computational demands of conventional many-body methods which typically scale exponentially with the number of particles. The SPT application of the Pauli principle is trivial to implement since it is done "on paper" by imposing restrictions on the normal-mode quantum numbers at first order in the perturbation. The method is applied through first order and represents an extension of the SPT method to excited states. Our method of determining the partition function and various thermodynamic quantities is accurate and efficient and has the potential to yield interesting insight into the role played by the Pauli principle and the influence of large degeneracies on the emergence of the thermodynamic behavior of large-N systems.
Applying instructional design practices to evaluate and improve the roadway characteristics inventory (RCI) training curriculum.

DOT National Transportation Integrated Search

2010-07-01

The Transportation Statistics Office (TranStat) of the Florida Department of Transportation (FDOT) provides training for district data collection technicians in both office- and field-based Roadway Characteristics Inventory (RCI) methods. The current...
Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

DOE PAGES

Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

2018-01-01

Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.
Rapid acquisition of data dense solid-state CPMG NMR spectral sets using multi-dimensional statistical analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mason, H. E.; Uribe, E. C.; Shusterman, J. A.

Tensor-rank decomposition methods have been applied to variable contact time 29 Si{ 1 H} CP/CPMG NMR data sets to extract NMR dynamics information and dramatically decrease conventional NMR acquisition times.
Synchronization for Optical PPM Signals

NASA Technical Reports Server (NTRS)

Vilnrotter, V. A.

1985-01-01

Method based on statistical properties of weak pulse-positionmodulated (PPM) signal enables synchronization of receiver clock with received-signal time base. Method applies to weak optical M-ary PPM signals, for which there is only one pulse of length Tp transmitted during one of timeslots of length T in each successive interval of M timeslots. Method requires small dead time, Td, at beginning and end of each timeslot, during which pulse amplitude is zero.
AASC Recommendations for the Education of an Applied Climatologist

NASA Astrophysics Data System (ADS)

Nielsen-Gammon, J. W.; Stooksbury, D.; Akyuz, A.; Dupigny-Giroux, L.; Hubbard, K. G.; Timofeyeva, M. M.

2011-12-01

The American Association of State Climatologists (AASC) has developed curricular recommendations for the education of future applied and service climatologists. The AASC was founded in 1976. Membership of the AASC includes state climatologists and others who work in state climate offices; climate researchers in academia and educators; applied climatologists in NOAA and other federal agencies; and the private sector. The AASC is the only professional organization dedicated solely to the growth and development of applied and service climatology. The purpose of the recommendations is to offer a framework for existing and developing academic climatology programs. These recommendations are intended to serve as a road map and to help distinguish the educational needs for future applied climatologists from those of operational meteorologists or other scientists and practitioners. While the home department of climatology students may differ from one program to the next, the most essential factor is that students can demonstrate a breadth and depth of understanding in the knowledge and tools needed to be an applied climatologist. Because the training of an applied climatologist requires significant depth and breadth, the Masters degree is recommended as the minimum level of education needed. This presentation will highlight the AASC recommendations. These include a strong foundation in: - climatology (instrumentation and data collection, climate dynamics, physical climatology, synoptic and regional climatology, applied climatology, climate models, etc.) - basic natural sciences and mathematics including calculus, physics, chemistry, and biology/ecology - fundamental atmospheric sciences (atmospheric dynamics, atmospheric thermodynamics, atmospheric radiation, and weather analysis/synoptic meteorology) and - data analysis and spatial analysis (descriptive statistics, statistical methods, multivariate statistics, geostatistics, GIS, etc.). The recommendations also include a secondary area of concentration (agriculture, economics, geography, hydrology, marine sciences, natural resources, policy, etc.) and a major applied climate research component.
Characterizing the D2 statistic: word matches in biological sequences.

PubMed

Forêt, Sylvain; Wilson, Susan R; Burden, Conrad J

2009-01-01

Word matches are often used in sequence comparison methods, either as a measure of sequence similarity or in the first search steps of algorithms such as BLAST or BLAT. The D2 statistic is the number of matches of words of k letters between two sequences. Recent advances have been made in the characterization of this statistic and in the approximation of its distribution. Here, these results are extended to the case of approximate word matches. We compute the exact value of the variance of the D2 statistic for the case of a uniform letter distribution, and introduce a method to provide accurate approximations of the variance in the remaining cases. This enables the distribution of D2 to be approximated for typical situations arising in biological research. We apply these results to the identification of cis-regulatory modules, and show that this method detects such sequences with a high accuracy. The ability to approximate the distribution of D2 for both exact and approximate word matches will enable the use of this statistic in a more precise manner for sequence comparison, database searches, and identification of transcription factor binding sites.
[Bayesian statistics in medicine -- part II: main applications and inference].

PubMed

Montomoli, C; Nichelatti, M

2008-01-01

Bayesian statistics is not only used when one is dealing with 2-way tables, but it can be used for inferential purposes. Using the basic concepts presented in the first part, this paper aims to give a simple overview of Bayesian methods by introducing its foundation (Bayes' theorem) and then applying this rule to a very simple practical example; whenever possible, the elementary processes at the basis of analysis are compared to those of frequentist (classical) statistical analysis. The Bayesian reasoning is naturally connected to medical activity, since it appears to be quite similar to a diagnostic process.
Statistical and Economic Techniques for Site-specific Nematode Management.

PubMed

Liu, Zheng; Griffin, Terry; Kirkpatrick, Terrence L

2014-03-01

Recent advances in precision agriculture technologies and spatial statistics allow realistic, site-specific estimation of nematode damage to field crops and provide a platform for the site-specific delivery of nematicides within individual fields. This paper reviews the spatial statistical techniques that model correlations among neighboring observations and develop a spatial economic analysis to determine the potential of site-specific nematicide application. The spatial econometric methodology applied in the context of site-specific crop yield response contributes to closing the gap between data analysis and realistic site-specific nematicide recommendations and helps to provide a practical method of site-specifically controlling nematodes.

Accurate mass measurement: terminology and treatment of data.

PubMed

Brenton, A Gareth; Godfrey, A Ruth

2010-11-01

High-resolution mass spectrometry has become ever more accessible with improvements in instrumentation, such as modern FT-ICR and Orbitrap mass spectrometers. This has resulted in an increase in the number of articles submitted for publication quoting accurate mass data. There is a plethora of terms related to accurate mass analysis that are in current usage, many employed incorrectly or inconsistently. This article is based on a set of notes prepared by the authors for research students and staff in our laboratories as a guide to the correct terminology and basic statistical procedures to apply in relation to mass measurement, particularly for accurate mass measurement. It elaborates on the editorial by Gross in 1994 regarding the use of accurate masses for structure confirmation. We have presented and defined the main terms in use with reference to the International Union of Pure and Applied Chemistry (IUPAC) recommendations for nomenclature and symbolism for mass spectrometry. The correct use of statistics and treatment of data is illustrated as a guide to new and existing mass spectrometry users with a series of examples as well as statistical methods to compare different experimental methods and datasets. Copyright © 2010. Published by Elsevier Inc.
Testing statistical isotropy in cosmic microwave background polarization maps

NASA Astrophysics Data System (ADS)

Rath, Pranati K.; Samal, Pramoda Kumar; Panda, Srikanta; Mishra, Debesh D.; Aluri, Pavan K.

2018-04-01

We apply our symmetry based Power tensor technique to test conformity of PLANCK Polarization maps with statistical isotropy. On a wide range of angular scales (l = 40 - 150), our preliminary analysis detects many statistically anisotropic multipoles in foreground cleaned full sky PLANCK polarization maps viz., COMMANDER and NILC. We also study the effect of residual foregrounds that may still be present in the Galactic plane using both common UPB77 polarization mask, as well as the individual component separation method specific polarization masks. However, some of the statistically anisotropic modes still persist, albeit significantly in NILC map. We further probed the data for any coherent alignments across multipoles in several bins from the chosen multipole range.
Identification of robust statistical downscaling methods based on a comprehensive suite of performance metrics for South Korea

NASA Astrophysics Data System (ADS)

Eum, H. I.; Cannon, A. J.

2015-12-01

Climate models are a key provider to investigate impacts of projected future climate conditions on regional hydrologic systems. However, there is a considerable mismatch of spatial resolution between GCMs and regional applications, in particular a region characterized by complex terrain such as Korean peninsula. Therefore, a downscaling procedure is an essential to assess regional impacts of climate change. Numerous statistical downscaling methods have been used mainly due to the computational efficiency and simplicity. In this study, four statistical downscaling methods [Bias-Correction/Spatial Disaggregation (BCSD), Bias-Correction/Constructed Analogue (BCCA), Multivariate Adaptive Constructed Analogs (MACA), and Bias-Correction/Climate Imprint (BCCI)] are applied to downscale the latest Climate Forecast System Reanalysis data to stations for precipitation, maximum temperature, and minimum temperature over South Korea. By split sampling scheme, all methods are calibrated with observational station data for 19 years from 1973 to 1991 are and tested for the recent 19 years from 1992 to 2010. To assess skill of the downscaling methods, we construct a comprehensive suite of performance metrics that measure an ability of reproducing temporal correlation, distribution, spatial correlation, and extreme events. In addition, we employ Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) to identify robust statistical downscaling methods based on the performance metrics for each season. The results show that downscaling skill is considerably affected by the skill of CFSR and all methods lead to large improvements in representing all performance metrics. According to seasonal performance metrics evaluated, when TOPSIS is applied, MACA is identified as the most reliable and robust method for all variables and seasons. Note that such result is derived from CFSR output which is recognized as near perfect climate data in climate studies. Therefore, the ranking of this study may be changed when various GCMs are downscaled and evaluated. Nevertheless, it may be informative for end-users (i.e. modelers or water resources managers) to understand and select more suitable downscaling methods corresponding to priorities on regional applications.
Simultaneous determination of binary mixture of amlodipine besylate and atenolol based on dual wavelengths

NASA Astrophysics Data System (ADS)

Lamie, Nesrine T.

2015-10-01

Four, accurate, precise, and sensitive spectrophotometric methods are developed for simultaneous determination of a binary mixture of amlodipine besylate (AM) and atenolol (AT). AM is determined at its λmax 360 nm (0D), while atenolol can be determined by four different methods. Method (A) is absorption factor (AF). Method (B) is the new ratio difference method (RD) which measures the difference in amplitudes between 210 and 226 nm. Method (C) is novel constant center spectrophotometric method (CC). Method (D) is mean centering of the ratio spectra (MCR) at 284 nm. The methods are tested by analyzing synthetic mixtures of the cited drugs and they are applied to their commercial pharmaceutical preparation. The validity of results is assessed by applying standard addition technique. The results obtained are found to agree statistically with those obtained by official methods, showing no significant difference with respect to accuracy and precision.
Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

USGS Publications Warehouse

Royle, J. Andrew; Dorazio, Robert M.

2008-01-01

A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Statistically Validated Networks in Bipartite Complex Systems

PubMed Central

Tumminello, Michele; Miccichè, Salvatore; Lillo, Fabrizio; Piilo, Jyrki; Mantegna, Rosario N.

2011-01-01

Many complex systems present an intrinsic bipartite structure where elements of one set link to elements of the second set. In these complex systems, such as the system of actors and movies, elements of one set are qualitatively different than elements of the other set. The properties of these complex systems are typically investigated by constructing and analyzing a projected network on one of the two sets (for example the actor network or the movie network). Complex systems are often very heterogeneous in the number of relationships that the elements of one set establish with the elements of the other set, and this heterogeneity makes it very difficult to discriminate links of the projected network that are just reflecting system's heterogeneity from links relevant to unveil the properties of the system. Here we introduce an unsupervised method to statistically validate each link of a projected network against a null hypothesis that takes into account system heterogeneity. We apply the method to a biological, an economic and a social complex system. The method we propose is able to detect network structures which are very informative about the organization and specialization of the investigated systems, and identifies those relationships between elements of the projected network that cannot be explained simply by system heterogeneity. We also show that our method applies to bipartite systems in which different relationships might have different qualitative nature, generating statistically validated networks in which such difference is preserved. PMID:21483858
Estimation of Signal Coherence Threshold and Concealed Spectral Lines Applied to Detection of Turbofan Engine Combustion Noise

NASA Technical Reports Server (NTRS)

Miles, Jeffrey Hilton

2010-01-01

Combustion noise from turbofan engines has become important, as the noise from sources like the fan and jet are reduced. An aligned and un-aligned coherence technique has been developed to determine a threshold level for the coherence and thereby help to separate the coherent combustion noise source from other noise sources measured with far-field microphones. This method is compared with a statistics based coherence threshold estimation method. In addition, the un-aligned coherence procedure at the same time also reveals periodicities, spectral lines, and undamped sinusoids hidden by broadband turbofan engine noise. In calculating the coherence threshold using a statistical method, one may use either the number of independent records or a larger number corresponding to the number of overlapped records used to create the average. Using data from a turbofan engine and a simulation this paper shows that applying the Fisher z-transform to the un-aligned coherence can aid in making the proper selection of samples and produce a reasonable statistics based coherence threshold. Examples are presented showing that the underlying tonal and coherent broad band structure which is buried under random broadband noise and jet noise can be determined. The method also shows the possible presence of indirect combustion noise. Copyright 2011 Acoustical Society of America. This article may be downloaded for personal use only. Any other use requires prior permission of the author and the Acoustical Society of America.
Securing wide appreciation of health statistics

PubMed Central

Pyrrait, A. M. DO Amaral; Aubenque, M. J.; Benjamin, B.; DE Groot, Meindert J. W.; Kohn, R.

1954-01-01

All the authors are agreed on the need for a certain publicizing of health statistics, but do Amaral Pyrrait points out that the medical profession prefers to convince itself rather than to be convinced. While there is great utility in articles and reviews in the professional press (especially for paramedical personnel) Aubenque, de Groot, and Kohn show how appreciation can effectively be secured by making statistics more easily understandable to the non-expert by, for instance, including readable commentaries in official publications, simplifying charts and tables, and preparing simple manuals on statistical methods. Aubenque and Kohn also stress the importance of linking health statistics to other economic and social information. Benjamin suggests that the principles of market research could to advantage be applied to health statistics to determine the precise needs of the “consumers”. At the same time, Aubenque points out that the value of the ultimate results must be clear to those who provide the data; for this, Kohn suggests that the enumerators must know exactly what is wanted and why. There is general agreement that some explanation of statistical methods and their uses should be given in the curricula of medical schools and that lectures and postgraduate courses should be arranged for practising physicians. PMID:13199668
Research on the raw data processing method of the hydropower construction project

NASA Astrophysics Data System (ADS)

Tian, Zhichao

2018-01-01

In this paper, based on the characteristics of the fixed data, this paper compares the various mathematical statistics analysis methods and chooses the improved Grabs criterion to analyze the data, and through the analysis of the data processing, the data processing method is not suitable. It is proved that this method can be applied to the processing of fixed raw data. This paper provides a reference for reasonably determining the effective quota analysis data.
Maximum entropy method applied to deblurring images on a MasPar MP-1 computer

NASA Technical Reports Server (NTRS)

Bonavito, N. L.; Dorband, John; Busse, Tim

1991-01-01

A statistical inference method based on the principle of maximum entropy is developed for the purpose of enhancing and restoring satellite images. The proposed maximum entropy image restoration method is shown to overcome the difficulties associated with image restoration and provide the smoothest and most appropriate solution consistent with the measured data. An implementation of the method on the MP-1 computer is described, and results of tests on simulated data are presented.
qFeature

DOE Office of Scientific and Technical Information (OSTI.GOV)

2015-09-14

This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
MRMC analysis of agreement studies

NASA Astrophysics Data System (ADS)

Gallas, Brandon D.; Anam, Amrita; Chen, Weijie; Wunderlich, Adam; Zhang, Zhiwei

2016-03-01

The purpose of this work is to present and evaluate methods based on U-statistics to compare intra- or inter-reader agreement across different imaging modalities. We apply these methods to multi-reader multi-case (MRMC) studies. We measure reader-averaged agreement and estimate its variance accounting for the variability from readers and cases (an MRMC analysis). In our application, pathologists (readers) evaluate patient tissue mounted on glass slides (cases) in two ways. They evaluate the slides on a microscope (reference modality) and they evaluate digital scans of the slides on a computer display (new modality). In the current work, we consider concordance as the agreement measure, but many of the concepts outlined here apply to other agreement measures. Concordance is the probability that two readers rank two cases in the same order. Concordance can be estimated with a U-statistic and thus it has some nice properties: it is unbiased, asymptotically normal, and its variance is given by an explicit formula. Another property of a U-statistic is that it is symmetric in its inputs; it doesn't matter which reader is listed first or which case is listed first, the result is the same. Using this property and a few tricks while building the U-statistic kernel for concordance, we get a mathematically tractable problem and efficient software. Simulations show that our variance and covariance estimates are unbiased.
Statistical procedures for evaluating daily and monthly hydrologic model predictions

USGS Publications Warehouse

Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

2004-01-01

The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.
Validation of chemistry models employed in a particle simulation method

NASA Technical Reports Server (NTRS)

Haas, Brian L.; Mcdonald, Jeffrey D.

1991-01-01

The chemistry models employed in a statistical particle simulation method, as implemented in the Intel iPSC/860 multiprocessor computer, are validated and applied. Chemical relaxation of five-species air in these reservoirs involves 34 simultaneous dissociation, recombination, and atomic-exchange reactions. The reaction rates employed in the analytic solutions are obtained from Arrhenius experimental correlations as functions of temperature for adiabatic gas reservoirs in thermal equilibrium. Favorable agreement with the analytic solutions validates the simulation when applied to relaxation of O2 toward equilibrium in reservoirs dominated by dissociation and recombination, respectively, and when applied to relaxation of air in the temperature range 5000 to 30,000 K. A flow of O2 over a circular cylinder at high Mach number is simulated to demonstrate application of the method to multidimensional reactive flows.
Segmentation-free statistical image reconstruction for polyenergetic x-ray computed tomography with experimental validation.

PubMed

Idris A, Elbakri; Fessler, Jeffrey A

2003-08-07

This paper describes a statistical image reconstruction method for x-ray CT that is based on a physical model that accounts for the polyenergetic x-ray source spectrum and the measurement nonlinearities caused by energy-dependent attenuation. Unlike our earlier work, the proposed algorithm does not require pre-segmentation of the object into the various tissue classes (e.g., bone and soft tissue) and allows mixed pixels. The attenuation coefficient of each voxel is modelled as the product of its unknown density and a weighted sum of energy-dependent mass attenuation coefficients. We formulate a penalized-likelihood function for this polyenergetic model and develop an iterative algorithm for estimating the unknown density of each voxel. Applying this method to simulated x-ray CT measurements of objects containing both bone and soft tissue yields images with significantly reduced beam hardening artefacts relative to conventional beam hardening correction methods. We also apply the method to real data acquired from a phantom containing various concentrations of potassium phosphate solution. The algorithm reconstructs an image with accurate density values for the different concentrations, demonstrating its potential for quantitative CT applications.
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

PubMed Central

Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

2011-01-01

Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Model-independent test for scale-dependent non-Gaussianities in the cosmic microwave background.

PubMed

Räth, C; Morfill, G E; Rossmanith, G; Banday, A J; Górski, K M

2009-04-03

We present a model-independent method to test for scale-dependent non-Gaussianities in combination with scaling indices as test statistics. Therefore, surrogate data sets are generated, in which the power spectrum of the original data is preserved, while the higher order correlations are partly randomized by applying a scale-dependent shuffling procedure to the Fourier phases. We apply this method to the Wilkinson Microwave Anisotropy Probe data of the cosmic microwave background and find signatures for non-Gaussianities on large scales. Further tests are required to elucidate the origin of the detected anomalies.
Distinguishing synchronous and time-varying synergies using point process interval statistics: motor primitives in frog and rat

PubMed Central

Hart, Corey B.; Giszter, Simon F.

2013-01-01

We present and apply a method that uses point process statistics to discriminate the forms of synergies in motor pattern data, prior to explicit synergy extraction. The method uses electromyogram (EMG) pulse peak timing or onset timing. Peak timing is preferable in complex patterns where pulse onsets may be overlapping. An interval statistic derived from the point processes of EMG peak timings distinguishes time-varying synergies from synchronous synergies (SS). Model data shows that the statistic is robust for most conditions. Its application to both frog hindlimb EMG and rat locomotion hindlimb EMG show data from these preparations is clearly most consistent with synchronous synergy models (p < 0.001). Additional direct tests of pulse and interval relations in frog data further bolster the support for synchronous synergy mechanisms in these data. Our method and analyses support separated control of rhythm and pattern of motor primitives, with the low level execution primitives comprising pulsed SS in both frog and rat, and both episodic and rhythmic behaviors. PMID:23675341
Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

PubMed

Wang, Zuozhen

2018-01-01

Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.
Huffman and linear scanning methods with statistical language models.

PubMed

Roark, Brian; Fried-Oken, Melanie; Gibbons, Chris

2015-03-01

Current scanning access methods for text generation in AAC devices are limited to relatively few options, most notably row/column variations within a matrix. We present Huffman scanning, a new method for applying statistical language models to binary-switch, static-grid typing AAC interfaces, and compare it to other scanning options under a variety of conditions. We present results for 16 adults without disabilities and one 36-year-old man with locked-in syndrome who presents with complex communication needs and uses AAC scanning devices for writing. Huffman scanning with a statistical language model yielded significant typing speedups for the 16 participants without disabilities versus any of the other methods tested, including two row/column scanning methods. A similar pattern of results was found with the individual with locked-in syndrome. Interestingly, faster typing speeds were obtained with Huffman scanning using a more leisurely scan rate than relatively fast individually calibrated scan rates. Overall, the results reported here demonstrate great promise for the usability of Huffman scanning as a faster alternative to row/column scanning.

Test Statistics and Confidence Intervals to Establish Noninferiority between Treatments with Ordinal Categorical Data.

PubMed

Zhang, Fanghong; Miyaoka, Etsuo; Huang, Fuping; Tanaka, Yutaka

2015-01-01

The problem for establishing noninferiority is discussed between a new treatment and a standard (control) treatment with ordinal categorical data. A measure of treatment effect is used and a method of specifying noninferiority margin for the measure is provided. Two Z-type test statistics are proposed where the estimation of variance is constructed under the shifted null hypothesis using U-statistics. Furthermore, the confidence interval and the sample size formula are given based on the proposed test statistics. The proposed procedure is applied to a dataset from a clinical trial. A simulation study is conducted to compare the performance of the proposed test statistics with that of the existing ones, and the results show that the proposed test statistics are better in terms of the deviation from nominal level and the power.
Comparison of Histograms for Use in Cloud Observation and Modeling

NASA Technical Reports Server (NTRS)

Green, Lisa; Xu, Kuan-Man

2005-01-01

Cloud observation and cloud modeling data can be presented in histograms for each characteristic to be measured. Combining information from single-cloud histograms yields a summary histogram. Summary histograms can be compared to each other to reach conclusions about the behavior of an ensemble of clouds in different places at different times or about the accuracy of a particular cloud model. As in any scientific comparison, it is necessary to decide whether any apparent differences are statistically significant. The usual methods of deciding statistical significance when comparing histograms do not apply in this case because they assume independent data. Thus, a new method is necessary. The proposed method uses the Euclidean distance metric and bootstrapping to calculate the significance level.
Statistical differences between relative quantitative molecular fingerprints from microbial communities.

PubMed

Portillo, M C; Gonzalez, J M

2008-08-01

Molecular fingerprints of microbial communities are a common method for the analysis and comparison of environmental samples. The significance of differences between microbial community fingerprints was analyzed considering the presence of different phylotypes and their relative abundance. A method is proposed by simulating coverage of the analyzed communities as a function of sampling size applying a Cramér-von Mises statistic. Comparisons were performed by a Monte Carlo testing procedure. As an example, this procedure was used to compare several sediment samples from freshwater ponds using a relative quantitative PCR-DGGE profiling technique. The method was able to discriminate among different samples based on their molecular fingerprints, and confirmed the lack of differences between aliquots from a single sample.
SU-E-J-261: Statistical Analysis and Chaotic Dynamics of Respiratory Signal of Patients in BodyFix

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michalski, D; Huq, M; Bednarz, G

Purpose: To quantify respiratory signal of patients in BodyFix undergoing 4DCT scan with and without immobilization cover. Methods: 20 pairs of respiratory tracks recorded with RPM system during 4DCT scan were analyzed. Descriptive statistic was applied to selected parameters of exhale-inhale decomposition. Standardized signals were used with the delay method to build orbits in embedded space. Nonlinear behavior was tested with surrogate data. Sample entropy SE, Lempel-Ziv complexity LZC and the largest Lyapunov exponents LLE were compared. Results: Statistical tests show difference between scans for inspiration time and its variability, which is bigger for scans without cover. The same ismore » for variability of the end of exhalation and inhalation. Other parameters fail to show the difference. For both scans respiratory signals show determinism and nonlinear stationarity. Statistical test on surrogate data reveals their nonlinearity. LLEs show signals chaotic nature and its correlation with breathing period and its embedding delay time. SE, LZC and LLE measure respiratory signal complexity. Nonlinear characteristics do not differ between scans. Conclusion: Contrary to expectation cover applied to patients in BodyFix appears to have limited effect on signal parameters. Analysis based on trajectories of delay vectors shows respiratory system nonlinear character and its sensitive dependence on initial conditions. Reproducibility of respiratory signal can be evaluated with measures of signal complexity and its predictability window. Longer respiratory period is conducive for signal reproducibility as shown by these gauges. Statistical independence of the exhale and inhale times is also supported by the magnitude of LLE. The nonlinear parameters seem more appropriate to gauge respiratory signal complexity since its deterministic chaotic nature. It contrasts with measures based on harmonic analysis that are blind for nonlinear features. Dynamics of breathing, so crucial for 4D-based clinical technologies, can be better controlled if nonlinear-based methodology, which reflects respiration characteristic, is applied. Funding provided by Varian Medical Systems via Investigator Initiated Research Project.« less
Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

PubMed

Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

2018-03-01

Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

PubMed

Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

2015-03-01

Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.
The Fusion of Financial Analysis and Seismology: Statistical Methods from Financial Market Analysis Applied to Earthquake Data

NASA Astrophysics Data System (ADS)

Ohyanagi, S.; Dileonardo, C.

2013-12-01

As a natural phenomenon earthquake occurrence is difficult to predict. Statistical analysis of earthquake data was performed using candlestick chart and Bollinger Band methods. These statistical methods, commonly used in the financial world to analyze market trends were tested against earthquake data. Earthquakes above Mw 4.0 located on shore of Sanriku (37.75°N ~ 41.00°N, 143.00°E ~ 144.50°E) from February 1973 to May 2013 were selected for analysis. Two specific patterns in earthquake occurrence were recognized through the analysis. One is a spread of candlestick prior to the occurrence of events greater than Mw 6.0. A second pattern shows convergence in the Bollinger Band, which implies a positive or negative change in the trend of earthquakes. Both patterns match general models for the buildup and release of strain through the earthquake cycle, and agree with both the characteristics of the candlestick chart and Bollinger Band analysis. These results show there is a high correlation between patterns in earthquake occurrence and trend analysis by these two statistical methods. The results of this study agree with the appropriateness of the application of these financial analysis methods to the analysis of earthquake occurrence.
Density-based empirical likelihood procedures for testing symmetry of data distributions and K-sample comparisons.

PubMed

Vexler, Albert; Tanajian, Hovig; Hutson, Alan D

In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.
A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

PubMed

Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

2013-11-15

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
Automated Analysis of Renewable Energy Datasets ('EE/RE Data Mining')

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bush, Brian; Elmore, Ryan; Getman, Dan

This poster illustrates methods to substantially improve the understanding of renewable energy data sets and the depth and efficiency of their analysis through the application of statistical learning methods ('data mining') in the intelligent processing of these often large and messy information sources. The six examples apply methods for anomaly detection, data cleansing, and pattern mining to time-series data (measurements from metering points in buildings) and spatiotemporal data (renewable energy resource datasets).
Application of econometric and ecology analysis methods in physics software

NASA Astrophysics Data System (ADS)

Han, Min Cheol; Hoff, Gabriela; Kim, Chan Hyeong; Kim, Sung Hun; Grazia Pia, Maria; Ronchieri, Elisabetta; Saracco, Paolo

2017-10-01

Some data analysis methods typically used in econometric studies and in ecology have been evaluated and applied in physics software environments. They concern the evolution of observables through objective identification of change points and trends, and measurements of inequality, diversity and evenness across a data set. Within each analysis area, various statistical tests and measures have been examined. This conference paper summarizes a brief overview of some of these methods.
Landslide Susceptibility Analysis by the comparison and integration of Random Forest and Logistic Regression methods; application to the disaster of Nova Friburgo - Rio de Janeiro, Brasil (January 2011)

NASA Astrophysics Data System (ADS)

Esposito, Carlo; Barra, Anna; Evans, Stephen G.; Scarascia Mugnozza, Gabriele; Delaney, Keith

2014-05-01

The study of landslide susceptibility by multivariate statistical methods is based on finding a quantitative relationship between controlling factors and landslide occurrence. Such studies have become popular in the last few decades thanks to the development of geographic information systems (GIS) software and the related improved data management. In this work we applied a statistical approach to an area of high landslide susceptibility mainly due to its tropical climate and geological-geomorphological setting. The study area is located in the south-east region of Brazil that has frequently been affected by flood and landslide hazard, especially because of heavy rainfall events during the summer season. In this work we studied a disastrous event that occurred on January 11th and 12th of 2011, which involved Região Serrana (the mountainous region of Rio de Janeiro State) and caused more than 5000 landslides and at least 904 deaths. In order to produce susceptibility maps, we focused our attention on an area of 93,6 km2 that includes Nova Friburgo city. We utilized two different multivariate statistic methods: Logistic Regression (LR), already widely used in applied geosciences, and Random Forest (RF), which has only recently been applied to landslide susceptibility analysis. With reference to each mapping unit, the first method (LR) results in a probability of landslide occurrence, while the second one (RF) gives a prediction in terms of % of area susceptible to slope failure. With this aim in mind, a landslide inventory map (related to the studied event) has been drawn up through analyses of high-resolution GeoEye satellite images, in a GIS environment. Data layers of 11 causative factors have been created and processed in order to be used as continuous numerical or discrete categorical variables in statistical analysis. In particular, the logistic regression method has frequent difficulties in managing numerical continuous and discrete categorical variables together; therefore in our work we tried different methods to process categorical variables , until we obtained a statistically significant model. The outcomes of the two statistical methods (RF and LR) have been tested with a spatial validation and gave us two susceptibility maps. The significance of the models is quantified in terms of Area Under ROC Curve (AUC resulted in 0.81 for RF model and in 0.72 for LR model). In the first instance, a graphical comparison of the two methods shows a good correspondence between them. Further, we integrated results in a unique susceptibility map which maintains both information of probability of occurrence and % of area of landslide detachment, resulting from LR and RF respectively. In fact, in view of a landslide susceptibility classification of the study area, the former is less accurate but gives easily classifiable results, while the latter is more accurate but the results can be only subjectively classified. The obtained "integrated" susceptibility map preserves information about the probability that a given % of area could fail for each mapping unit.
Different spectrophotometric methods applied for the analysis of simeprevir in the presence of its oxidative degradation product: Acomparative study

NASA Astrophysics Data System (ADS)

Attia, Khalid A. M.; El-Abasawi, Nasr M.; El-Olemy, Ahmed; Serag, Ahmed

2018-02-01

Five simple spectrophotometric methods were developed for the determination of simeprevir in the presence of its oxidative degradation product namely, ratio difference, mean centering, derivative ratio using the Savitsky-Golay filters, second derivative and continuous wavelet transform. These methods are linear in the range of 2.5-40 μg/mL and validated according to the ICH guidelines. The obtained results of accuracy, repeatability and precision were found to be within the acceptable limits. The specificity of the proposed methods was tested using laboratory prepared mixtures and assessed by applying the standard addition technique. Furthermore, these methods were statistically comparable to RP-HPLC method and good results were obtained. So, they can be used for the routine analysis of simeprevir in quality-control laboratories.
Applying Monte Carlo Simulation to Launch Vehicle Design and Requirements Verification

NASA Technical Reports Server (NTRS)

Hanson, John M.; Beard, Bernard B.

2010-01-01

This paper is focused on applying Monte Carlo simulation to probabilistic launch vehicle design and requirements verification. The approaches developed in this paper can be applied to other complex design efforts as well. Typically the verification must show that requirement "x" is met for at least "y" % of cases, with, say, 10% consumer risk or 90% confidence. Two particular aspects of making these runs for requirements verification will be explored in this paper. First, there are several types of uncertainties that should be handled in different ways, depending on when they become known (or not). The paper describes how to handle different types of uncertainties and how to develop vehicle models that can be used to examine their characteristics. This includes items that are not known exactly during the design phase but that will be known for each assembled vehicle (can be used to determine the payload capability and overall behavior of that vehicle), other items that become known before or on flight day (can be used for flight day trajectory design and go/no go decision), and items that remain unknown on flight day. Second, this paper explains a method (order statistics) for determining whether certain probabilistic requirements are met or not and enables the user to determine how many Monte Carlo samples are required. Order statistics is not new, but may not be known in general to the GN&C community. The methods also apply to determining the design values of parameters of interest in driving the vehicle design. The paper briefly discusses when it is desirable to fit a distribution to the experimental Monte Carlo results rather than using order statistics.
The discounting model selector: Statistical software for delay discounting applications.

PubMed

Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A

2017-05-01

Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
Genetic algorithm for the optimization of features and neural networks in ECG signals classification

NASA Astrophysics Data System (ADS)

Li, Hongqiang; Yuan, Danyang; Ma, Xiangdong; Cui, Dianyin; Cao, Lu

2017-01-01

Feature extraction and classification of electrocardiogram (ECG) signals are necessary for the automatic diagnosis of cardiac diseases. In this study, a novel method based on genetic algorithm-back propagation neural network (GA-BPNN) for classifying ECG signals with feature extraction using wavelet packet decomposition (WPD) is proposed. WPD combined with the statistical method is utilized to extract the effective features of ECG signals. The statistical features of the wavelet packet coefficients are calculated as the feature sets. GA is employed to decrease the dimensions of the feature sets and to optimize the weights and biases of the back propagation neural network (BPNN). Thereafter, the optimized BPNN classifier is applied to classify six types of ECG signals. In addition, an experimental platform is constructed for ECG signal acquisition to supply the ECG data for verifying the effectiveness of the proposed method. The GA-BPNN method with the MIT-BIH arrhythmia database achieved a dimension reduction of nearly 50% and produced good classification results with an accuracy of 97.78%. The experimental results based on the established acquisition platform indicated that the GA-BPNN method achieved a high classification accuracy of 99.33% and could be efficiently applied in the automatic identification of cardiac arrhythmias.
Rapid extraction of image texture by co-occurrence using a hybrid data structure

NASA Astrophysics Data System (ADS)

Clausi, David A.; Zhao, Yongping

2002-07-01

Calculation of co-occurrence probabilities is a popular method for determining texture features within remotely sensed digital imagery. Typically, the co-occurrence features are calculated by using a grey level co-occurrence matrix (GLCM) to store the co-occurring probabilities. Statistics are applied to the probabilities in the GLCM to generate the texture features. This method is computationally intensive since the matrix is usually sparse leading to many unnecessary calculations involving zero probabilities when applying the statistics. An improvement on the GLCM method is to utilize a grey level co-occurrence linked list (GLCLL) to store only the non-zero co-occurring probabilities. The GLCLL suffers since, to achieve preferred computational speeds, the list should be sorted. An improvement on the GLCLL is to utilize a grey level co-occurrence hybrid structure (GLCHS) based on an integrated hash table and linked list approach. Texture features obtained using this technique are identical to those obtained using the GLCM and GLCLL. The GLCHS method is implemented using the C language in a Unix environment. Based on a Brodatz test image, the GLCHS method is demonstrated to be a superior technique when compared across various window sizes and grey level quantizations. The GLCHS method required, on average, 33.4% ( σ=3.08%) of the computational time required by the GLCLL. Significant computational gains are made using the GLCHS method.
Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review.

PubMed

Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina

2012-01-01

Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future.
Statistical Methods Used to Test for Agreement of Medical Instruments Measuring Continuous Variables in Method Comparison Studies: A Systematic Review

PubMed Central

Zaki, Rafdzah; Bulgiba, Awang; Ismail, Roshidi; Ismail, Noor Azina

2012-01-01

Background Accurate values are a must in medicine. An important parameter in determining the quality of a medical instrument is agreement with a gold standard. Various statistical methods have been used to test for agreement. Some of these methods have been shown to be inappropriate. This can result in misleading conclusions about the validity of an instrument. The Bland-Altman method is the most popular method judging by the many citations of the article proposing this method. However, the number of citations does not necessarily mean that this method has been applied in agreement research. No previous study has been conducted to look into this. This is the first systematic review to identify statistical methods used to test for agreement of medical instruments. The proportion of various statistical methods found in this review will also reflect the proportion of medical instruments that have been validated using those particular methods in current clinical practice. Methodology/Findings Five electronic databases were searched between 2007 and 2009 to look for agreement studies. A total of 3,260 titles were initially identified. Only 412 titles were potentially related, and finally 210 fitted the inclusion criteria. The Bland-Altman method is the most popular method with 178 (85%) studies having used this method, followed by the correlation coefficient (27%) and means comparison (18%). Some of the inappropriate methods highlighted by Altman and Bland since the 1980s are still in use. Conclusions This study finds that the Bland-Altman method is the most popular method used in agreement research. There are still inappropriate applications of statistical methods in some studies. It is important for a clinician or medical researcher to be aware of this issue because misleading conclusions from inappropriate analyses will jeopardize the quality of the evidence, which in turn will influence quality of care given to patients in the future. PMID:22662248
Mapping cell populations in flow cytometry data for cross‐sample comparison using the Friedman–Rafsky test statistic as a distance measure

PubMed Central

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu

2015-01-01

Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018

Mapping cell populations in flow cytometry data for cross-sample comparison using the Friedman-Rafsky test statistic as a distance measure.

PubMed

Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H

2016-01-01

Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.
Development of hi-resolution regional climate scenarios in Japan by statistical downscaling

NASA Astrophysics Data System (ADS)

Dairaku, K.

2016-12-01

Climate information and services for Impacts, Adaptation and Vulnerability (IAV) Assessments are of great concern. To meet with the needs of stakeholders such as local governments, a Japan national project, Social Implementation Program on Climate Change Adaptation Technology (SI-CAT), launched in December 2015. It develops reliable technologies for near-term climate change predictions. Multi-model ensemble regional climate scenarios with 1km horizontal grid-spacing over Japan are developed by using CMIP5 GCMs and a statistical downscaling method to support various municipal adaptation measures appropriate for possible regional climate changes. A statistical downscaling method, Bias Correction Spatial Disaggregation (BCSD), is employed to develop regional climate scenarios based on CMIP5 RCP8.5 five GCMs (MIROC5, MRI-CGCM3, GFDL-CM3, CSIRO-Mk3-6-0, HadGEM2-ES) for the periods of historical climate (1970-2005) and near future climate (2020-2055). Downscaled variables are monthly/daily precipitation and temperature. File format is NetCDF4 (conforming to CF1.6, HDF5 compression). Developed regional climate scenarios will be expanded to meet with needs of stakeholders and interface applications to access and download the data are under developing. Statistical downscaling method is not necessary to well represent locally forced nonlinear phenomena, extreme events such as heavy rain, heavy snow, etc. To complement the statistical method, dynamical downscaling approach is also combined and applied to some specific regions which have needs of stakeholders. The added values of statistical/dynamical downscaling methods compared with parent GCMs are investigated.
Phantom Effects in Multilevel Compositional Analysis: Problems and Solutions

ERIC Educational Resources Information Center

Pokropek, Artur

2015-01-01

This article combines statistical and applied research perspective showing problems that might arise when measurement error in multilevel compositional effects analysis is ignored. This article focuses on data where independent variables are constructed measures. Simulation studies are conducted evaluating methods that could overcome the…
40 CFR 796.2750 - Sediment and soil adsorption isotherm.

Code of Federal Regulations, 2014 CFR

2014-07-01

... are highly reproducible. The test provides excellent quantitative data readily amenable to statistical... combination of methods suitable for the identification and quantitative detection of the parent test chemical... quantitative analysis of the parent chemical. (3) Amount of parent test chemical applied, the amount recovered...
40 CFR 796.2750 - Sediment and soil adsorption isotherm.

Code of Federal Regulations, 2013 CFR

2013-07-01

... highly reproducible. The test provides excellent quantitative data readily amenable to statistical... combination of methods suitable for the identification and quantitative detection of the parent test chemical... quantitative analysis of the parent chemical. (3) Amount of parent test chemical applied, the amount recovered...
40 CFR 796.2750 - Sediment and soil adsorption isotherm.

Code of Federal Regulations, 2012 CFR

2012-07-01

... highly reproducible. The test provides excellent quantitative data readily amenable to statistical... combination of methods suitable for the identification and quantitative detection of the parent test chemical... quantitative analysis of the parent chemical. (3) Amount of parent test chemical applied, the amount recovered...
An instrument to assess the statistical intensity of medical research papers.

PubMed

Nieminen, Pentti; Virtanen, Jorma I; Vähänikkilä, Hannu

2017-01-01

There is widespread evidence that statistical methods play an important role in original research articles, especially in medical research. The evaluation of statistical methods and reporting in journals suffers from a lack of standardized methods for assessing the use of statistics. The objective of this study was to develop and evaluate an instrument to assess the statistical intensity in research articles in a standardized way. A checklist-type measure scale was developed by selecting and refining items from previous reports about the statistical contents of medical journal articles and from published guidelines for statistical reporting. A total of 840 original medical research articles that were published between 2007-2015 in 16 journals were evaluated to test the scoring instrument. The total sum of all items was used to assess the intensity between sub-fields and journals. Inter-rater agreement was examined using a random sample of 40 articles. Four raters read and evaluated the selected articles using the developed instrument. The scale consisted of 66 items. The total summary score adequately discriminated between research articles according to their study design characteristics. The new instrument could also discriminate between journals according to their statistical intensity. The inter-observer agreement measured by the ICC was 0.88 between all four raters. Individual item analysis showed very high agreement between the rater pairs, the percentage agreement ranged from 91.7% to 95.2%. A reliable and applicable instrument for evaluating the statistical intensity in research papers was developed. It is a helpful tool for comparing the statistical intensity between sub-fields and journals. The novel instrument may be applied in manuscript peer review to identify papers in need of additional statistical review.
Dynamic Uncertain Causality Graph for Knowledge Representation and Reasoning: Utilization of Statistical Data and Domain Knowledge in Complex Cases.

PubMed

Zhang, Qin; Yao, Quanying

2018-05-01

The dynamic uncertain causality graph (DUCG) is a newly presented framework for uncertain causality representation and probabilistic reasoning. It has been successfully applied to online fault diagnoses of large, complex industrial systems, and decease diagnoses. This paper extends the DUCG to model more complex cases than what could be previously modeled, e.g., the case in which statistical data are in different groups with or without overlap, and some domain knowledge and actions (new variables with uncertain causalities) are introduced. In other words, this paper proposes to use -mode, -mode, and -mode of the DUCG to model such complex cases and then transform them into either the standard -mode or the standard -mode. In the former situation, if no directed cyclic graph is involved, the transformed result is simply a Bayesian network (BN), and existing inference methods for BNs can be applied. In the latter situation, an inference method based on the DUCG is proposed. Examples are provided to illustrate the methodology.
Biomechanical analysis of tension band fixation for olecranon fracture treatment.

PubMed

Kozin, S H; Berglund, L J; Cooney, W P; Morrey, B F; An, K N

1996-01-01

This study assessed the strength of various tension band fixation methods with wire and cable applied to simulated olecranon fractures to compare stability and potential failure or complications between the two. Transverse olecranon fractures were simulated by osteotomy. The fracture was anatomically reduced, and various tension band fixation techniques were applied with monofilament wire or multifilament cable. With a material testing machine load displacement curves were obtained and statistical relevance determined by analysis of variance. Two loading modes were tested: loading on the posterior surface of olecranon to simulate triceps pull and loading on the anterior olecranon tip to recreate a potential compressive loading on the fragment during the resistive flexion. All fixation methods were more resistant to posterior loading than to an anterior load. Individual comparative analysis for various loading conditions concluded that tension band fixation is more resilient to tensile forces exerted by the triceps than compressive forces on the anterior olecranon tip. Neither wire passage anterior to the K-wires nor the multifilament cable provided statistically significant increased stability.
A comparison of ensemble post-processing approaches that preserve correlation structures

NASA Astrophysics Data System (ADS)

Schefzik, Roman; Van Schaeybroeck, Bert; Vannitsem, Stéphane

2016-04-01

Despite the fact that ensemble forecasts address the major sources of uncertainty, they exhibit biases and dispersion errors and therefore are known to improve by calibration or statistical post-processing. For instance the ensemble model output statistics (EMOS) method, also known as non-homogeneous regression approach (Gneiting et al., 2005) is known to strongly improve forecast skill. EMOS is based on fitting and adjusting a parametric probability density function (PDF). However, EMOS and other common post-processing approaches apply to a single weather quantity at a single location for a single look-ahead time. They are therefore unable of taking into account spatial, inter-variable and temporal dependence structures. Recently many research efforts have been invested in designing post-processing methods that resolve this drawback but also in verification methods that enable the detection of dependence structures. New verification methods are applied on two classes of post-processing methods, both generating physically coherent ensembles. A first class uses the ensemble copula coupling (ECC) that starts from EMOS but adjusts the rank structure (Schefzik et al., 2013). The second class is a member-by-member post-processing (MBM) approach that maps each raw ensemble member to a corrected one (Van Schaeybroeck and Vannitsem, 2015). We compare variants of the EMOS-ECC and MBM classes and highlight a specific theoretical connection between them. All post-processing variants are applied in the context of the ensemble system of the European Centre of Weather Forecasts (ECMWF) and compared using multivariate verification tools including the energy score, the variogram score (Scheuerer and Hamill, 2015) and the band depth rank histogram (Thorarinsdottir et al., 2015). Gneiting, Raftery, Westveld, and Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., {133}, 1098-1118. Scheuerer and Hamill, 2015. Variogram-based proper scoring rules for probabilistic forecasts of multivariate quantities. Mon. Wea. Rev. {143},1321-1334. Schefzik, Thorarinsdottir, Gneiting. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science {28},616-640, 2013. Thorarinsdottir, M. Scheuerer, and C. Heinz, 2015. Assessing the calibration of high-dimensional ensemble forecasts using rank histograms, arXiv:1310.0236. Van Schaeybroeck and Vannitsem, 2015: Ensemble post-processing using member-by-member approaches: theoretical aspects. Q.J.R. Meteorol. Soc., 141: 807-818.
A Pragmatic Smoothing Method for Improving the Quality of the Results in Atomic Spectroscopy

NASA Astrophysics Data System (ADS)

Bennun, Leonardo

2017-07-01

A new smoothing method for the improvement on the identification and quantification of spectral functions based on the previous knowledge of the signals that are expected to be quantified, is presented. These signals are used as weighted coefficients in the smoothing algorithm. This smoothing method was conceived to be applied in atomic and nuclear spectroscopies preferably to these techniques where net counts are proportional to acquisition time, such as particle induced X-ray emission (PIXE) and other X-ray fluorescence spectroscopic methods, etc. This algorithm, when properly applied, does not distort the form nor the intensity of the signal, so it is well suited for all kind of spectroscopic techniques. This method is extremely effective at reducing high-frequency noise in the signal much more efficient than a single rectangular smooth of the same width. As all of smoothing techniques, the proposed method improves the precision of the results, but in this case we found also a systematic improvement on the accuracy of the results. We still have to evaluate the improvement on the quality of the results when this method is applied over real experimental results. We expect better characterization of the net area quantification of the peaks, and smaller Detection and Quantification Limits. We have applied this method to signals that obey Poisson statistics, but with the same ideas and criteria, it could be applied to time series. In a general case, when this algorithm is applied over experimental results, also it would be required that the sought characteristic functions, required for this weighted smoothing method, should be obtained from a system with strong stability. If the sought signals are not perfectly clean, this method should be carefully applied
Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system.

PubMed

Mathes, Robert W; Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J; Olson, Don; Weiss, Don

2017-01-01

The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method's implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System's C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis.
Methods in pharmacoepidemiology: a review of statistical analyses and data reporting in pediatric drug utilization studies.

PubMed

Sequi, Marco; Campi, Rita; Clavenna, Antonio; Bonati, Maurizio

2013-03-01

To evaluate the quality of data reporting and statistical methods performed in drug utilization studies in the pediatric population. Drug utilization studies evaluating all drug prescriptions to children and adolescents published between January 1994 and December 2011 were retrieved and analyzed. For each study, information on measures of exposure/consumption, the covariates considered, descriptive and inferential analyses, statistical tests, and methods of data reporting was extracted. An overall quality score was created for each study using a 12-item checklist that took into account the presence of outcome measures, covariates of measures, descriptive measures, statistical tests, and graphical representation. A total of 22 studies were reviewed and analyzed. Of these, 20 studies reported at least one descriptive measure. The mean was the most commonly used measure (18 studies), but only five of these also reported the standard deviation. Statistical analyses were performed in 12 studies, with the chi-square test being the most commonly performed test. Graphs were presented in 14 papers. Sixteen papers reported the number of drug prescriptions and/or packages, and ten reported the prevalence of the drug prescription. The mean quality score was 8 (median 9). Only seven of the 22 studies received a score of ≥10, while four studies received a score of <6. Our findings document that only a few of the studies reviewed applied statistical methods and reported data in a satisfactory manner. We therefore conclude that the methodology of drug utilization studies needs to be improved.
A comparative study of different aspects of manipulating ratio spectra applied for ternary mixtures: Derivative spectrophotometry versus wavelet transform

NASA Astrophysics Data System (ADS)

Salem, Hesham; Lotfy, Hayam M.; Hassan, Nagiba Y.; El-Zeiny, Mohamed B.; Saleh, Sarah S.

2015-01-01

This work represents a comparative study of different aspects of manipulating ratio spectra, which are: double divisor ratio spectra derivative (DR-DD), area under curve of derivative ratio (DR-AUC) and its novel approach, namely area under the curve correction method (AUCCM) applied for overlapped spectra; successive derivative of ratio spectra (SDR) and continuous wavelet transform (CWT) methods. The proposed methods represent different aspects of manipulating ratio spectra of the ternary mixture of Ofloxacin (OFX), Prednisolone acetate (PA) and Tetryzoline HCl (TZH) combined in eye drops in the presence of benzalkonium chloride as a preservative. The proposed methods were checked using laboratory-prepared mixtures and were successfully applied for the analysis of pharmaceutical formulation containing the cited drugs. The proposed methods were validated according to the ICH guidelines. A comparative study was conducted between those methods regarding simplicity, limitation and sensitivity. The obtained results were statistically compared with those obtained from the reported HPLC method, showing no significant difference with respect to accuracy and precision.
A comparative study of different aspects of manipulating ratio spectra applied for ternary mixtures: derivative spectrophotometry versus wavelet transform.

PubMed

Salem, Hesham; Lotfy, Hayam M; Hassan, Nagiba Y; El-Zeiny, Mohamed B; Saleh, Sarah S

2015-01-25

This work represents a comparative study of different aspects of manipulating ratio spectra, which are: double divisor ratio spectra derivative (DR-DD), area under curve of derivative ratio (DR-AUC) and its novel approach, namely area under the curve correction method (AUCCM) applied for overlapped spectra; successive derivative of ratio spectra (SDR) and continuous wavelet transform (CWT) methods. The proposed methods represent different aspects of manipulating ratio spectra of the ternary mixture of Ofloxacin (OFX), Prednisolone acetate (PA) and Tetryzoline HCl (TZH) combined in eye drops in the presence of benzalkonium chloride as a preservative. The proposed methods were checked using laboratory-prepared mixtures and were successfully applied for the analysis of pharmaceutical formulation containing the cited drugs. The proposed methods were validated according to the ICH guidelines. A comparative study was conducted between those methods regarding simplicity, limitation and sensitivity. The obtained results were statistically compared with those obtained from the reported HPLC method, showing no significant difference with respect to accuracy and precision. Copyright © 2014 Elsevier B.V. All rights reserved.
Application of Scan Statistics to Detect Suicide Clusters in Australia

PubMed Central

Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane

2013-01-01

Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098
Methods for estimating flow-duration curve and low-flow frequency statistics for ungaged locations on small streams in Minnesota

USGS Publications Warehouse

Ziegeweid, Jeffrey R.; Lorenz, David L.; Sanocki, Chris A.; Czuba, Christiana R.

2015-12-24

Equations developed in this study apply only to stream locations where flows are not substantially affected by regulation, diversion, or urbanization. All equations presented in this study will be incorporated into StreamStats, a web-based geographic information system tool developed by the U.S. Geological Survey. StreamStats allows users to obtain streamflow statistics, basin characteristics, and other information for user-selected locations on streams through an interactive map.
Statistical Relational Learning to Predict Primary Myocardial Infarction from Electronic Health Records

PubMed Central

Weiss, Jeremy C; Page, David; Peissig, Peggy L; Natarajan, Sriraam; McCarty, Catherine

2013-01-01

Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices. PMID:25360347
Stochastical modeling for Viral Disease: Statistical Mechanics and Network Theory

NASA Astrophysics Data System (ADS)

Zhou, Hao; Deem, Michael

2007-04-01

Theoretical methods of statistical mechanics are developed and applied to study the immunological response against viral disease, such as dengue. We use this theory to show how the immune response to four different dengue serotypes may be sculpted. It is the ability of avian influenza, to change and to mix, that has given rise to the fear of a new human flu pandemic. Here we propose to utilize a scale free network based stochastic model to investigate the mitigation strategies and analyze the risk.
Brain tissues volume measurements from 2D MRI using parametric approach

NASA Astrophysics Data System (ADS)

L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

2018-04-01

The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.

Identifying natural flow regimes using fish communities

NASA Astrophysics Data System (ADS)

Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.

2011-10-01

SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.
A statistical approach to selecting and confirming validation targets in -omics experiments

PubMed Central

2012-01-01

Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

PubMed

Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

2016-01-22

Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
Non-destructive scanning for applied stress by the continuous magnetic Barkhausen noise method

NASA Astrophysics Data System (ADS)

Franco Grijalba, Freddy A.; Padovese, L. R.

2018-01-01

This paper reports the use of a non-destructive continuous magnetic Barkhausen noise technique to detect applied stress on steel surfaces. The stress profile generated in a sample of 1070 steel subjected to a three-point bending test is analyzed. The influence of different parameters such as pickup coil type, scanner speed, applied magnetic field and frequency band analyzed on the effectiveness of the technique is investigated. A moving smoothing window based on a second-order statistical moment is used to analyze the time signal. The findings show that the technique can be used to detect applied stress profiles.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE.

PubMed

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.
Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE

PubMed Central

Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja

2017-01-01

Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729
Journal of Air Transportation, Volume 12, No. 1

NASA Technical Reports Server (NTRS)

Bowers, Brent D. (Editor); Kabashkin, Igor (Editor)

2007-01-01

Topics discussed include: a) Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods; b) Financial Comparisons across Different Business Models in the Canadian Airline Industry; c) Carving a Niche for the "No-Frills" Carrier, Air Arabia, in Oil-Rich Skies; d) Situational Leadership in Air Traffic Control; and e) The Very Light Jet Arrives: Stakeholders and Their Perceptions.
Using DIF Dissection Method to Assess Effects of Item Deletion. Research Report No. 2005-10. ETS RR-05-23

ERIC Educational Resources Information Center

Zhang, Yanling; Dorans, Neil J.; Matthews-López, Joy L.

2005-01-01

Statistical procedures for detecting differential item functioning (DIF) are often used as an initial step to screen items for construct irrelevant variance. This research applies a DIF dissection method and a two-way classification scheme to SAT Reasoning Test™ verbal section data and explores the effects of deleting sizable DIF items on reported…
A method of using cluster analysis to study statistical dependence in multivariate data

NASA Technical Reports Server (NTRS)

Borucki, W. J.; Card, D. H.; Lyle, G. C.

1975-01-01

A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages.

PubMed

Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry

2013-08-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
[Efficiency of rehabilitation methods in the treatment of arm lymphedema after breast cancer surgery].

PubMed

Petruseviciene, Daiva; Krisciūnas, Aleksandras; Sameniene, Jūrate

2002-01-01

In this article we analyze influence of rehabilitation methods in treatment of arm lymphedema. In Kaunas oncological hospital were examined 60 women after surgery for breast cancer. The work objective was to evaluate efficiency of rehabilitation methods in treatment of arm lymphedema and in evaluate movement amplitude of shoulder joint. Two groups of women depending on rehabilitation start were evaluated. The same methods of rehabilitation were applied to both groups: physical therapy, electrostimulation, massage, lymphodrainage with apparate. Our study indicated that women, who were treated at early period of rehabilitation (3 months), showed statistically significantly (p < 0.01) better results in increase of movement amplitude of shoulder joint. However, results of treatment of arm lymphedema, comparing with women who started rehabilitation after 12 months, were equally successful--results were not statistically significantly better (p > 0.05).
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins

PubMed Central

Knudsen, Bjarne; Miyamoto, Michael M.

2001-01-01

Changes in protein function can lead to changes in the selection acting on specific residues. This can often be detected as evolutionary rate changes at the sites in question. A maximum-likelihood method for detecting evolutionary rate shifts at specific protein positions is presented. The method determines significance values of the rate differences to give a sound statistical foundation for the conclusions drawn from the analyses. A statistical test for detecting slowly evolving sites is also described. The methods are applied to a set of Myc proteins for the identification of both conserved sites and those with changing evolutionary rates. Those positions with conserved and changing rates are related to the structures and functions of their proteins. The results are compared with an earlier Bayesian method, thereby highlighting the advantages of the new likelihood ratio tests. PMID:11734650
State Estimation Using Dependent Evidence Fusion: Application to Acoustic Resonance-Based Liquid Level Measurement.

PubMed

Xu, Xiaobin; Li, Zhenghui; Li, Guo; Zhou, Zhe

2017-04-21

Estimating the state of a dynamic system via noisy sensor measurement is a common problem in sensor methods and applications. Most state estimation methods assume that measurement noise and state perturbations can be modeled as random variables with known statistical properties. However in some practical applications, engineers can only get the range of noises, instead of the precise statistical distributions. Hence, in the framework of Dempster-Shafer (DS) evidence theory, a novel state estimatation method by fusing dependent evidence generated from state equation, observation equation and the actual observations of the system states considering bounded noises is presented. It can be iteratively implemented to provide state estimation values calculated from fusion results at every time step. Finally, the proposed method is applied to a low-frequency acoustic resonance level gauge to obtain high-accuracy measurement results.
Histogram of gradient and binarized statistical image features of wavelet subband-based palmprint features extraction

NASA Astrophysics Data System (ADS)

Attallah, Bilal; Serir, Amina; Chahir, Youssef; Boudjelal, Abdelwahhab

2017-11-01

Palmprint recognition systems are dependent on feature extraction. A method of feature extraction using higher discrimination information was developed to characterize palmprint images. In this method, two individual feature extraction techniques are applied to a discrete wavelet transform of a palmprint image, and their outputs are fused. The two techniques used in the fusion are the histogram of gradient and the binarized statistical image features. They are then evaluated using an extreme learning machine classifier before selecting a feature based on principal component analysis. Three palmprint databases, the Hong Kong Polytechnic University (PolyU) Multispectral Palmprint Database, Hong Kong PolyU Palmprint Database II, and the Delhi Touchless (IIDT) Palmprint Database, are used in this study. The study shows that our method effectively identifies and verifies palmprints and outperforms other methods based on feature extraction.
A critical look at prospective surveillance using a scan statistic.

PubMed

Correa, Thais R; Assunção, Renato M; Costa, Marcelo A

2015-03-30

The scan statistic is a very popular surveillance technique for purely spatial, purely temporal, and spatial-temporal disease data. It was extended to the prospective surveillance case, and it has been applied quite extensively in this situation. When the usual signal rules, as those implemented in SaTScan(TM) (Boston, MA, USA) software, are used, we show that the scan statistic method is not appropriate for the prospective case. The reason is that it does not adjust properly for the sequential and repeated tests carried out during the surveillance. We demonstrate that the nominal significance level α is not meaningful and there is no relationship between α and the recurrence interval or the average run length (ARL). In some cases, the ARL may be equal to ∞, which makes the method ineffective. This lack of control of the type-I error probability and of the ARL leads us to strongly oppose the use of the scan statistic with the usual signal rules in the prospective context. Copyright © 2014 John Wiley & Sons, Ltd.
Constant pressure and temperature discrete-time Langevin molecular dynamics

NASA Astrophysics Data System (ADS)

Grønbech-Jensen, Niels; Farago, Oded

2014-11-01

We present a new and improved method for simultaneous control of temperature and pressure in molecular dynamics simulations with periodic boundary conditions. The thermostat-barostat equations are built on our previously developed stochastic thermostat, which has been shown to provide correct statistical configurational sampling for any time step that yields stable trajectories. Here, we extend the method and develop a set of discrete-time equations of motion for both particle dynamics and system volume in order to seek pressure control that is insensitive to the choice of the numerical time step. The resulting method is simple, practical, and efficient. The method is demonstrated through direct numerical simulations of two characteristic model systems—a one-dimensional particle chain for which exact statistical results can be obtained and used as benchmarks, and a three-dimensional system of Lennard-Jones interacting particles simulated in both solid and liquid phases. The results, which are compared against the method of Kolb and Dünweg [J. Chem. Phys. 111, 4453 (1999)], show that the new method behaves according to the objective, namely that acquired statistical averages and fluctuations of configurational measures are accurate and robust against the chosen time step applied to the simulation.
Sharpening method of satellite thermal image based on the geographical statistical model

NASA Astrophysics Data System (ADS)

Qi, Pengcheng; Hu, Shixiong; Zhang, Haijun; Guo, Guangmeng

2016-04-01

To improve the effectiveness of thermal sharpening in mountainous regions, paying more attention to the laws of land surface energy balance, a thermal sharpening method based on the geographical statistical model (GSM) is proposed. Explanatory variables were selected from the processes of land surface energy budget and thermal infrared electromagnetic radiation transmission, then high spatial resolution (57 m) raster layers were generated for these variables through spatially simulating or using other raster data as proxies. Based on this, the local adaptation statistical relationship between brightness temperature (BT) and the explanatory variables, i.e., the GSM, was built at 1026-m resolution using the method of multivariate adaptive regression splines. Finally, the GSM was applied to the high-resolution (57-m) explanatory variables; thus, the high-resolution (57-m) BT image was obtained. This method produced a sharpening result with low error and good visual effect. The method can avoid the blind choice of explanatory variables and remove the dependence on synchronous imagery at visible and near-infrared bands. The influences of the explanatory variable combination, sampling method, and the residual error correction on sharpening results were analyzed deliberately, and their influence mechanisms are reported herein.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.

PubMed

Chatterjee, Snehamoy

2014-01-01

This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Multi-trait analysis of genome-wide association summary statistics using MTAG.

PubMed

Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

2018-02-01

We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Non-invasive body temperature measurement of wild chimpanzees using fecal temperature decline.

PubMed

Jensen, Siv Aina; Mundry, Roger; Nunn, Charles L; Boesch, Christophe; Leendertz, Fabian H

2009-04-01

New methods are required to increase our understanding of pathologic processes in wild mammals. We developed a noninvasive field method to estimate the body temperature of wild living chimpanzees habituated to humans, based on statistically fitting temperature decline of feces after defecation. The method was established with the use of control measures of human rectal temperature and subsequent changes in fecal temperature over time. The method was then applied to temperature data collected from wild chimpanzee feces. In humans, we found good correspondence between the temperature estimated by the method and the actual rectal temperature that was measured (maximum deviation 0.22 C). The method was successfully applied and the average estimated temperature of the chimpanzees was 37.2 C. This simple-to-use field method reliably estimates the body temperature of wild chimpanzees and probably also other large mammals.

Application of the Probabilistic Dynamic Synthesis Method to Realistic Structures

NASA Technical Reports Server (NTRS)

Brown, Andrew M.; Ferri, Aldo A.

1998-01-01

The Probabilistic Dynamic Synthesis method is a technique for obtaining the statistics of a desired response engineering quantity for a structure with non-deterministic parameters. The method uses measured data from modal testing of the structure as the input random variables, rather than more "primitive" quantities like geometry or material variation. This modal information is much more comprehensive and easily measured than the "primitive" information. The probabilistic analysis is carried out using either response surface reliability methods or Monte Carlo simulation. In previous work, the feasibility of the PDS method applied to a simple seven degree-of-freedom spring-mass system was verified. In this paper, extensive issues involved with applying the method to a realistic three-substructure system are examined, and free and forced response analyses are performed. The results from using the method are promising, especially when the lack of alternatives for obtaining quantitative output for probabilistic structures is considered.
The human chromosomal fragile sites more often involved in constitutional deletions and duplications - A genetic and statistical assessment

NASA Astrophysics Data System (ADS)

Gomes, Dora Prata; Sequeira, Inês J.; Figueiredo, Carlos; Rueff, José; Brás, Aldina

2016-12-01

Human chromosomal fragile sites (CFSs) are heritable loci or regions of the human chromosomes prone to exhibit gaps, breaks and rearrangements. Determining the frequency of deletions and duplications in CFSs may contribute to explain the occurrence of human disease due to those rearrangements. In this study we analyzed the frequency of deletions and duplications in each human CFS. Statistical methods, namely data display, descriptive statistics and linear regression analysis were applied to analyze this dataset. We found that FRA15C, FRA16A and FRAXB are the most frequently involved CFSs in deletions and duplications occurring in the human genome.
Scaling up to address data science challenges

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wendelberger, Joanne R.

Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Scaling up to address data science challenges

DOE PAGES

Wendelberger, Joanne R.

2017-04-27

Statistics and Data Science provide a variety of perspectives and technical approaches for exploring and understanding Big Data. Partnerships between scientists from different fields such as statistics, machine learning, computer science, and applied mathematics can lead to innovative approaches for addressing problems involving increasingly large amounts of data in a rigorous and effective manner that takes advantage of advances in computing. Here, this article will explore various challenges in Data Science and will highlight statistical approaches that can facilitate analysis of large-scale data including sampling and data reduction methods, techniques for effective analysis and visualization of large-scale simulations, and algorithmsmore » and procedures for efficient processing.« less
Bayes in biological anthropology.

PubMed

Konigsberg, Lyle W; Frankenberg, Susan R

2013-12-01

In this article, we both contend and illustrate that biological anthropologists, particularly in the Americas, often think like Bayesians but act like frequentists when it comes to analyzing a wide variety of data. In other words, while our research goals and perspectives are rooted in probabilistic thinking and rest on prior knowledge, we often proceed to use statistical hypothesis tests and confidence interval methods unrelated (or tenuously related) to the research questions of interest. We advocate for applying Bayesian analyses to a number of different bioanthropological questions, especially since many of the programming and computational challenges to doing so have been overcome in the past two decades. To facilitate such applications, this article explains Bayesian principles and concepts, and provides concrete examples of Bayesian computer simulations and statistics that address questions relevant to biological anthropology, focusing particularly on bioarchaeology and forensic anthropology. It also simultaneously reviews the use of Bayesian methods and inference within the discipline to date. This article is intended to act as primer to Bayesian methods and inference in biological anthropology, explaining the relationships of various methods to likelihoods or probabilities and to classical statistical models. Our contention is not that traditional frequentist statistics should be rejected outright, but that there are many situations where biological anthropology is better served by taking a Bayesian approach. To this end it is hoped that the examples provided in this article will assist researchers in choosing from among the broad array of statistical methods currently available. Copyright © 2013 Wiley Periodicals, Inc.
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline

PubMed Central

2013-01-01

Background As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. Results We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS A : DE genes with non-zero effect sizes in all studies, (2) HS B : DE genes with non-zero effect sizes in one or more studies and (3) HS r : DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. Conclusions The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS A , HS B , and HS r ). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author’s publication website. PMID:24359104
Meta-analysis methods for combining multiple expression profiles: comparisons, statistical characterization and an application guideline.

PubMed

Chang, Lun-Ching; Lin, Hui-Min; Sibille, Etienne; Tseng, George C

2013-12-21

As high-throughput genomic technologies become accurate and affordable, an increasing number of data sets have been accumulated in the public domain and genomic information integration and meta-analysis have become routine in biomedical research. In this paper, we focus on microarray meta-analysis, where multiple microarray studies with relevant biological hypotheses are combined in order to improve candidate marker detection. Many methods have been developed and applied in the literature, but their performance and properties have only been minimally investigated. There is currently no clear conclusion or guideline as to the proper choice of a meta-analysis method given an application; the decision essentially requires both statistical and biological considerations. We performed 12 microarray meta-analysis methods for combining multiple simulated expression profiles, and such methods can be categorized for different hypothesis setting purposes: (1) HS(A): DE genes with non-zero effect sizes in all studies, (2) HS(B): DE genes with non-zero effect sizes in one or more studies and (3) HS(r): DE gene with non-zero effect in "majority" of studies. We then performed a comprehensive comparative analysis through six large-scale real applications using four quantitative statistical evaluation criteria: detection capability, biological association, stability and robustness. We elucidated hypothesis settings behind the methods and further apply multi-dimensional scaling (MDS) and an entropy measure to characterize the meta-analysis methods and data structure, respectively. The aggregated results from the simulation study categorized the 12 methods into three hypothesis settings (HS(A), HS(B), and HS(r)). Evaluation in real data and results from MDS and entropy analyses provided an insightful and practical guideline to the choice of the most suitable method in a given application. All source files for simulation and real data are available on the author's publication website.
Rank-based permutation approaches for non-parametric factorial designs.

PubMed

Umlauft, Maria; Konietschke, Frank; Pauly, Markus

2017-11-01

Inference methods for null hypotheses formulated in terms of distribution functions in general non-parametric factorial designs are studied. The methods can be applied to continuous, ordinal or even ordered categorical data in a unified way, and are based only on ranks. In this set-up Wald-type statistics and ANOVA-type statistics are the current state of the art. The first method is asymptotically exact but a rather liberal statistical testing procedure for small to moderate sample size, while the latter is only an approximation which does not possess the correct asymptotic α level under the null. To bridge these gaps, a novel permutation approach is proposed which can be seen as a flexible generalization of the Kruskal-Wallis test to all kinds of factorial designs with independent observations. It is proven that the permutation principle is asymptotically correct while keeping its finite exactness property when data are exchangeable. The results of extensive simulation studies foster these theoretical findings. A real data set exemplifies its applicability. © 2017 The British Psychological Society.
Selecting statistical model and optimum maintenance policy: a case study of hydraulic pump.

PubMed

Ruhi, S; Karim, M R

2016-01-01

Proper maintenance policy can play a vital role for effective investigation of product reliability. Every engineered object such as product, plant or infrastructure needs preventive and corrective maintenance. In this paper we look at a real case study. It deals with the maintenance of hydraulic pumps used in excavators by a mining company. We obtain the data that the owner had collected and carry out an analysis and building models for pump failures. The data consist of both failure and censored lifetimes of the hydraulic pump. Different competitive mixture models are applied to analyze a set of maintenance data of a hydraulic pump. Various characteristics of the mixture models, such as the cumulative distribution function, reliability function, mean time to failure, etc. are estimated to assess the reliability of the pump. Akaike Information Criterion, adjusted Anderson-Darling test statistic, Kolmogrov-Smirnov test statistic and root mean square error are considered to select the suitable models among a set of competitive models. The maximum likelihood estimation method via the EM algorithm is applied mainly for estimating the parameters of the models and reliability related quantities. In this study, it is found that a threefold mixture model (Weibull-Normal-Exponential) fits well for the hydraulic pump failures data set. This paper also illustrates how a suitable statistical model can be applied to estimate the optimum maintenance period at a minimum cost of a hydraulic pump.
Evaluating and implementing temporal, spatial, and spatio-temporal methods for outbreak detection in a local syndromic surveillance system

PubMed Central

Lall, Ramona; Levin-Rector, Alison; Sell, Jessica; Paladini, Marc; Konty, Kevin J.; Olson, Don; Weiss, Don

2017-01-01

The New York City Department of Health and Mental Hygiene has operated an emergency department syndromic surveillance system since 2001, using temporal and spatial scan statistics run on a daily basis for cluster detection. Since the system was originally implemented, a number of new methods have been proposed for use in cluster detection. We evaluated six temporal and four spatial/spatio-temporal detection methods using syndromic surveillance data spiked with simulated injections. The algorithms were compared on several metrics, including sensitivity, specificity, positive predictive value, coherence, and timeliness. We also evaluated each method’s implementation, programming time, run time, and the ease of use. Among the temporal methods, at a set specificity of 95%, a Holt-Winters exponential smoother performed the best, detecting 19% of the simulated injects across all shapes and sizes, followed by an autoregressive moving average model (16%), a generalized linear model (15%), a modified version of the Early Aberration Reporting System’s C2 algorithm (13%), a temporal scan statistic (11%), and a cumulative sum control chart (<2%). Of the spatial/spatio-temporal methods we tested, a spatial scan statistic detected 3% of all injects, a Bayes regression found 2%, and a generalized linear mixed model and a space-time permutation scan statistic detected none at a specificity of 95%. Positive predictive value was low (<7%) for all methods. Overall, the detection methods we tested did not perform well in identifying the temporal and spatial clusters of cases in the inject dataset. The spatial scan statistic, our current method for spatial cluster detection, performed slightly better than the other tested methods across different inject magnitudes and types. Furthermore, we found the scan statistics, as applied in the SaTScan software package, to be the easiest to program and implement for daily data analysis. PMID:28886112
A global goodness-of-fit statistic for Cox regression models.

PubMed

Parzen, M; Lipsitz, S R

1999-06-01

In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.
The polarization-optical measuring method of linearity of radiant-power characteristic of the laser emission photodetectors

NASA Astrophysics Data System (ADS)

Baranov, M. S.; Khramov, V. N.; Chebanenko, R. A.

2016-04-01

The method of measurement of the power (lux-ampere) characteristic of photodetectors for work with the continuous laser sources of light which radiation has the linear polarization is developed and realized. The way offered in this work is approved on the basis of the FD-24K widespread photo diode. The received results quite correspond to passport data of this kind of photodetectors. Methods of statistical processing of results are applied.
Median nitrate concentrations in groundwater in the New Jersey Highlands Region estimated using regression models and land-surface characteristics

USGS Publications Warehouse

Baker, Ronald J.; Chepiga, Mary M.; Cauller, Stephen J.

2015-01-01

The Kaplan-Meier method of estimating summary statistics from left-censored data was applied in order to include nondetects (left-censored data) in median nitrate-concentration calculations. Median concentrations also were determined using three alternative methods of handling nondetects. Treatment of the 23 percent of samples that were nondetects had little effect on estimated median nitrate concentrations because method detection limits were mostly less than median values.
Data Mining Methods Applied to Flight Operations Quality Assurance Data: A Comparison to Standard Statistical Methods

NASA Technical Reports Server (NTRS)

Stolzer, Alan J.; Halford, Carl

2007-01-01

In a previous study, multiple regression techniques were applied to Flight Operations Quality Assurance-derived data to develop parsimonious model(s) for fuel consumption on the Boeing 757 airplane. The present study examined several data mining algorithms, including neural networks, on the fuel consumption problem and compared them to the multiple regression results obtained earlier. Using regression methods, parsimonious models were obtained that explained approximately 85% of the variation in fuel flow. In general data mining methods were more effective in predicting fuel consumption. Classification and Regression Tree methods reported correlation coefficients of .91 to .92, and General Linear Models and Multilayer Perceptron neural networks reported correlation coefficients of about .99. These data mining models show great promise for use in further examining large FOQA databases for operational and safety improvements.
[Correlation coefficient-based principle and method for the classification of jump degree in hydrological time series].

PubMed

Wu, Zi Yi; Xie, Ping; Sang, Yan Fang; Gu, Hai Ting

2018-04-01

The phenomenon of jump is one of the importantly external forms of hydrological variabi-lity under environmental changes, representing the adaption of hydrological nonlinear systems to the influence of external disturbances. Presently, the related studies mainly focus on the methods for identifying the jump positions and jump times in hydrological time series. In contrast, few studies have focused on the quantitative description and classification of jump degree in hydrological time series, which make it difficult to understand the environmental changes and evaluate its potential impacts. Here, we proposed a theatrically reliable and easy-to-apply method for the classification of jump degree in hydrological time series, using the correlation coefficient as a basic index. The statistical tests verified the accuracy, reasonability, and applicability of this method. The relationship between the correlation coefficient and the jump degree of series were described using mathematical equation by derivation. After that, several thresholds of correlation coefficients under different statistical significance levels were chosen, based on which the jump degree could be classified into five levels: no, weak, moderate, strong and very strong. Finally, our method was applied to five diffe-rent observed hydrological time series, with diverse geographic and hydrological conditions in China. The results of the classification of jump degrees in those series were closely accorded with their physically hydrological mechanisms, indicating the practicability of our method.
Application of multivariate statistical techniques in microbial ecology.

PubMed

Paliy, O; Shankar, V

2016-03-01

Recent advances in high-throughput methods of molecular analyses have led to an explosion of studies generating large-scale ecological data sets. In particular, noticeable effect has been attained in the field of microbial ecology, where new experimental approaches provided in-depth assessments of the composition, functions and dynamic changes of complex microbial communities. Because even a single high-throughput experiment produces large amount of data, powerful statistical techniques of multivariate analysis are well suited to analyse and interpret these data sets. Many different multivariate techniques are available, and often it is not clear which method should be applied to a particular data set. In this review, we describe and compare the most widely used multivariate statistical techniques including exploratory, interpretive and discriminatory procedures. We consider several important limitations and assumptions of these methods, and we present examples of how these approaches have been utilized in recent studies to provide insight into the ecology of the microbial world. Finally, we offer suggestions for the selection of appropriate methods based on the research question and data set structure. © 2016 John Wiley & Sons Ltd.
Projecting future precipitation and temperature at sites with diverse climate through multiple statistical downscaling schemes

NASA Astrophysics Data System (ADS)

Vallam, P.; Qin, X. S.

2017-10-01

Anthropogenic-driven climate change would affect the global ecosystem and is becoming a world-wide concern. Numerous studies have been undertaken to determine the future trends of meteorological variables at different scales. Despite these studies, there remains significant uncertainty in the prediction of future climates. To examine the uncertainty arising from using different schemes to downscale the meteorological variables for the future horizons, projections from different statistical downscaling schemes were examined. These schemes included statistical downscaling method (SDSM), change factor incorporated with LARS-WG, and bias corrected disaggregation (BCD) method. Global circulation models (GCMs) based on CMIP3 (HadCM3) and CMIP5 (CanESM2) were utilized to perturb the changes in the future climate. Five study sites (i.e., Alice Springs, Edmonton, Frankfurt, Miami, and Singapore) with diverse climatic conditions were chosen for examining the spatial variability of applying various statistical downscaling schemes. The study results indicated that the regions experiencing heavy precipitation intensities were most likely to demonstrate the divergence between the predictions from various statistical downscaling methods. Also, the variance computed in projecting the weather extremes indicated the uncertainty derived from selection of downscaling tools and climate models. This study could help gain an improved understanding about the features of different downscaling approaches and the overall downscaling uncertainty.
Dynamic modelling of n-of-1 data: powerful and flexible data analytics applied to individualised studies.

PubMed

Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin

2017-09-01

N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
Assessment of variations in thermal cycle life data of thermal barrier coated rods

NASA Astrophysics Data System (ADS)

Hendricks, R. C.; McDonald, G.

An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.
Assessment of variations in thermal cycle life data of thermal barrier coated rods

NASA Technical Reports Server (NTRS)

Hendricks, R. C.; Mcdonald, G.

1981-01-01

An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.

Application of image recognition algorithms for statistical description of nano- and microstructured surfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mărăscu, V.; Dinescu, G.; Faculty of Physics, University of Bucharest, 405 Atomistilor Street, Bucharest-Magurele

In this paper we propose a statistical approach for describing the self-assembling of sub-micronic polystyrene beads on silicon surfaces, as well as the evolution of surface topography due to plasma treatments. Algorithms for image recognition are used in conjunction with Scanning Electron Microscopy (SEM) imaging of surfaces. In a first step, greyscale images of the surface covered by the polystyrene beads are obtained. Further, an adaptive thresholding method was applied for obtaining binary images. The next step consisted in automatic identification of polystyrene beads dimensions, by using Hough transform algorithm, according to beads radius. In order to analyze the uniformitymore » of the self–assembled polystyrene beads, the squared modulus of 2-dimensional Fast Fourier Transform (2- D FFT) was applied. By combining these algorithms we obtain a powerful and fast statistical tool for analysis of micro and nanomaterials with aspect features regularly distributed on surface upon SEM examination.« less
Estimation of Global Network Statistics from Incomplete Data

PubMed Central

Bliss, Catherine A.; Danforth, Christopher M.; Dodds, Peter Sheridan

2014-01-01

Complex networks underlie an enormous variety of social, biological, physical, and virtual systems. A profound complication for the science of complex networks is that in most cases, observing all nodes and all network interactions is impossible. Previous work addressing the impacts of partial network data is surprisingly limited, focuses primarily on missing nodes, and suggests that network statistics derived from subsampled data are not suitable estimators for the same network statistics describing the overall network topology. We generate scaling methods to predict true network statistics, including the degree distribution, from only partial knowledge of nodes, links, or weights. Our methods are transparent and do not assume a known generating process for the network, thus enabling prediction of network statistics for a wide variety of applications. We validate analytical results on four simulated network classes and empirical data sets of various sizes. We perform subsampling experiments by varying proportions of sampled data and demonstrate that our scaling methods can provide very good estimates of true network statistics while acknowledging limits. Lastly, we apply our techniques to a set of rich and evolving large-scale social networks, Twitter reply networks. Based on 100 million tweets, we use our scaling techniques to propose a statistical characterization of the Twitter Interactome from September 2008 to November 2008. Our treatment allows us to find support for Dunbar's hypothesis in detecting an upper threshold for the number of active social contacts that individuals maintain over the course of one week. PMID:25338183
A common base method for analysis of qPCR data and the application of simple blocking in qPCR experiments.

PubMed

Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J

2017-12-01

qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Use of model calibration to achieve high accuracy in analysis of computer networks

DOEpatents

Frogner, Bjorn; Guarro, Sergio; Scharf, Guy

2004-05-11

A system and method are provided for creating a network performance prediction model, and calibrating the prediction model, through application of network load statistical analyses. The method includes characterizing the measured load on the network, which may include background load data obtained over time, and may further include directed load data representative of a transaction-level event. Probabilistic representations of load data are derived to characterize the statistical persistence of the network performance variability and to determine delays throughout the network. The probabilistic representations are applied to the network performance prediction model to adapt the model for accurate prediction of network performance. Certain embodiments of the method and system may be used for analysis of the performance of a distributed application characterized as data packet streams.
Comparison of transform coding methods with an optimal predictor for the data compression of digital elevation models

NASA Technical Reports Server (NTRS)

Lewis, Michael

1994-01-01

Statistical encoding techniques enable the reduction of the number of bits required to encode a set of symbols, and are derived from their probabilities. Huffman encoding is an example of statistical encoding that has been used for error-free data compression. The degree of compression given by Huffman encoding in this application can be improved by the use of prediction methods. These replace the set of elevations by a set of corrections that have a more advantageous probability distribution. In particular, the method of Lagrange Multipliers for minimization of the mean square error has been applied to local geometrical predictors. Using this technique, an 8-point predictor achieved about a 7 percent improvement over an existing simple triangular predictor.
Competing bosonic condensates in optical lattice with a mixture of single and pair hoppings

NASA Astrophysics Data System (ADS)

Travin, V. M.; Kopeć, T. K.

2017-01-01

A system of ultra-cold atoms with single boson and pair tunneling of bosonic atoms is considered in an optical lattice at arbitrary temperature. A mean-field theory was applied to the extended Bose-Hubbard Hamiltonian describing the system in order to investigate the competition between superfluid and pair superfluid as a function of the chemical potential and the temperature. To this end we have applied a method based on the Laplace transform method for the efficient calculation of the statistical sum for the quantum Hamiltonian. These results may be of interest for experiments on cold atom systems in optical lattices.
Application of Savitzky-Golay differentiation filters and Fourier functions to simultaneous determination of cefepime and the co-administered drug, levofloxacin, in spiked human plasma

NASA Astrophysics Data System (ADS)

Abdel-Aziz, Omar; Abdel-Ghany, Maha F.; Nagi, Reham; Abdel-Fattah, Laila

2015-03-01

The present work is concerned with simultaneous determination of cefepime (CEF) and the co-administered drug, levofloxacin (LEV), in spiked human plasma by applying a new approach, Savitzky-Golay differentiation filters, and combined trigonometric Fourier functions to their ratio spectra. The different parameters associated with the calculation of Savitzky-Golay and Fourier coefficients were optimized. The proposed methods were validated and applied for determination of the two drugs in laboratory prepared mixtures and spiked human plasma. The results were statistically compared with reported HPLC methods and were found accurate and precise.
Evolution of statistical averages: An interdisciplinary proposal using the Chapman-Enskog method

NASA Astrophysics Data System (ADS)

Mariscal-Sanchez, A.; Sandoval-Villalbazo, A.

2017-08-01

This work examines the idea of applying the Chapman-Enskog (CE) method for approximating the solution of the Boltzmann equation beyond the realm of physics, using an information theory approach. Equations describing the evolution of averages and their fluctuations in a generalized phase space are established up to first-order in the Knudsen parameter which is defined as the ratio of the time between interactions (mean free time) and a characteristic macroscopic time. Although the general equations here obtained may be applied in a wide range of disciplines, in this paper, only a particular case related to the evolution of averages in speculative markets is examined.
Application of the modified chi-square ratio statistic in a stepwise procedure for cascade impactor equivalence testing.

PubMed

Weber, Benjamin; Lee, Sau L; Delvadia, Renishkumar; Lionberger, Robert; Li, Bing V; Tsong, Yi; Hochhaus, Guenther

2015-03-01

Equivalence testing of aerodynamic particle size distribution (APSD) through multi-stage cascade impactors (CIs) is important for establishing bioequivalence of orally inhaled drug products. Recent work demonstrated that the median of the modified chi-square ratio statistic (MmCSRS) is a promising metric for APSD equivalence testing of test (T) and reference (R) products as it can be applied to a reduced number of CI sites that are more relevant for lung deposition. This metric is also less sensitive to the increased variability often observed for low-deposition sites. A method to establish critical values for the MmCSRS is described here. This method considers the variability of the R product by employing a reference variance scaling approach that allows definition of critical values as a function of the observed variability of the R product. A stepwise CI equivalence test is proposed that integrates the MmCSRS as a method for comparing the relative shapes of CI profiles and incorporates statistical tests for assessing equivalence of single actuation content and impactor sized mass. This stepwise CI equivalence test was applied to 55 published CI profile scenarios, which were classified as equivalent or inequivalent by members of the Product Quality Research Institute working group (PQRI WG). The results of the stepwise CI equivalence test using a 25% difference in MmCSRS as an acceptance criterion provided the best matching with those of the PQRI WG as decisions of both methods agreed in 75% of the 55 CI profile scenarios.
Statistical methods for astronomical data with upper limits. II - Correlation and regression

NASA Technical Reports Server (NTRS)

Isobe, T.; Feigelson, E. D.; Nelson, P. I.

1986-01-01

Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
New forecasting methodology indicates more disease and earlier mortality ahead for today's younger Americans.

PubMed

Reither, Eric N; Olshansky, S Jay; Yang, Yang

2011-08-01

Traditional methods of projecting population health statistics, such as estimating future death rates, can give inaccurate results and lead to inferior or even poor policy decisions. A new "three-dimensional" method of forecasting vital health statistics is more accurate because it takes into account the delayed effects of the health risks being accumulated by today's younger generations. Applying this forecasting technique to the US obesity epidemic suggests that future death rates and health care expenditures could be far worse than currently anticipated. We suggest that public policy makers adopt this more robust forecasting tool and redouble efforts to develop and implement effective obesity-related prevention programs and interventions.
Bayesian evaluation of effect size after replicating an original study

PubMed Central

van Aert, Robbie C. M.; van Assen, Marcel A. L. M.

2017-01-01

The vast majority of published results in the literature is statistically significant, which raises concerns about their reliability. The Reproducibility Project Psychology (RPP) and Experimental Economics Replication Project (EE-RP) both replicated a large number of published studies in psychology and economics. The original study and replication were statistically significant in 36.1% in RPP and 68.8% in EE-RP suggesting many null effects among the replicated studies. However, evidence in favor of the null hypothesis cannot be examined with null hypothesis significance testing. We developed a Bayesian meta-analysis method called snapshot hybrid that is easy to use and understand and quantifies the amount of evidence in favor of a zero, small, medium and large effect. The method computes posterior model probabilities for a zero, small, medium, and large effect and adjusts for publication bias by taking into account that the original study is statistically significant. We first analytically approximate the methods performance, and demonstrate the necessity to control for the original study’s significance to enable the accumulation of evidence for a true zero effect. Then we applied the method to the data of RPP and EE-RP, showing that the underlying effect sizes of the included studies in EE-RP are generally larger than in RPP, but that the sample sizes of especially the included studies in RPP are often too small to draw definite conclusions about the true effect size. We also illustrate how snapshot hybrid can be used to determine the required sample size of the replication akin to power analysis in null hypothesis significance testing and present an easy to use web application (https://rvanaert.shinyapps.io/snapshot/) and R code for applying the method. PMID:28388646
Analysing attitude data through ridit schemes.

PubMed

El-rouby, M G

1994-12-02

The attitudes of individuals and populations on various issues are usually assessed through sample surveys. Responses to survey questions are then scaled and combined into a meaningful whole which defines the measured attitude. The applied scales may be of nominal, ordinal, interval, or ratio nature depending upon the degree of sophistication the researcher wants to introduce into the measurement. This paper discusses methods of analysis for categorical variables of the type used in attitude and human behavior research, and recommends adoption of ridit analysis, a technique which has been successfully applied to epidemiological, clinical investigation, laboratory, and microbiological data. The ridit methodology is described after reviewing some general attitude scaling methods and problems of analysis related to them. The ridit method is then applied to a recent study conducted to assess health care service quality in North Carolina. This technique is conceptually and computationally more simple than other conventional statistical methods, and is also distribution-free. Basic requirements and limitations on its use are indicated.
Applying Quantum Monte Carlo to the Electronic Structure Problem

NASA Astrophysics Data System (ADS)

Powell, Andrew D.; Dawes, Richard

2016-06-01

Two distinct types of Quantum Monte Carlo (QMC) calculations are applied to electronic structure problems such as calculating potential energy curves and producing benchmark values for reaction barriers. First, Variational and Diffusion Monte Carlo (VMC and DMC) methods using a trial wavefunction subject to the fixed node approximation were tested using the CASINO code.[1] Next, Full Configuration Interaction Quantum Monte Carlo (FCIQMC), along with its initiator extension (i-FCIQMC) were tested using the NECI code.[2] FCIQMC seeks the FCI energy for a specific basis set. At a reduced cost, the efficient i-FCIQMC method can be applied to systems in which the standard FCIQMC approach proves to be too costly. Since all of these methods are statistical approaches, uncertainties (error-bars) are introduced for each calculated energy. This study tests the performance of the methods relative to traditional quantum chemistry for some benchmark systems. References: [1] R. J. Needs et al., J. Phys.: Condensed Matter 22, 023201 (2010). [2] G. H. Booth et al., J. Chem. Phys. 131, 054106 (2009).
Applications of rule-induction in the derivation of quantitative structure-activity relationships.

PubMed

A-Razzak, M; Glen, R C

1992-08-01

Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
A refined method for multivariate meta-analysis and meta-regression

PubMed Central

Jackson, Daniel; Riley, Richard D

2014-01-01

Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351
Applications of rule-induction in the derivation of quantitative structure-activity relationships

NASA Astrophysics Data System (ADS)

A-Razzak, Mohammed; Glen, Robert C.

1992-08-01

Recently, methods have been developed in the field of Artificial Intelligence (AI), specifically in the expert systems area using rule-induction, designed to extract rules from data. We have applied these methods to the analysis of molecular series with the objective of generating rules which are predictive and reliable. The input to rule-induction consists of a number of examples with known outcomes (a training set) and the output is a tree-structured series of rules. Unlike most other analysis methods, the results of the analysis are in the form of simple statements which can be easily interpreted. These are readily applied to new data giving both a classification and a probability of correctness. Rule-induction has been applied to in-house generated and published QSAR datasets and the methodology, application and results of these analyses are discussed. The results imply that in some cases it would be advantageous to use rule-induction as a complementary technique in addition to conventional statistical and pattern-recognition methods.
Optical Fourier diffractometry applied to degraded bone structure recognition

NASA Astrophysics Data System (ADS)

Galas, Jacek; Godwod, Krzysztof; Szawdyn, Jacek; Sawicki, Andrzej

1993-09-01

Image processing and recognition methods are useful in many fields. This paper presents the hybrid optical and digital method applied to recognition of pathological changes in bones involved by metabolic bone diseases. The trabecular bone structure, registered by x ray on the photographic film, is analyzed in the new type of computer controlled diffractometer. The set of image parameters, extracted from diffractogram, is evaluated by statistical analysis. The synthetic image descriptors in discriminant space, constructed on the base of 3 training groups of images (control, osteoporosis, and osteomalacia groups) by discriminant analysis, allow us to recognize bone samples with degraded bone structure and to recognize the disease. About 89% of the images were classified correctly. This method after optimization process will be verified in medical investigations.
Reliability analysis of composite structures

NASA Technical Reports Server (NTRS)

Kan, Han-Pin

1992-01-01

A probabilistic static stress analysis methodology has been developed to estimate the reliability of a composite structure. Closed form stress analysis methods are the primary analytical tools used in this methodology. These structural mechanics methods are used to identify independent variables whose variations significantly affect the performance of the structure. Once these variables are identified, scatter in their values is evaluated and statistically characterized. The scatter in applied loads and the structural parameters are then fitted to appropriate probabilistic distribution functions. Numerical integration techniques are applied to compute the structural reliability. The predicted reliability accounts for scatter due to variability in material strength, applied load, fabrication and assembly processes. The influence of structural geometry and mode of failure are also considerations in the evaluation. Example problems are given to illustrate various levels of analytical complexity.
Forecasts of non-Gaussian parameter spaces using Box-Cox transformations

NASA Astrophysics Data System (ADS)

Joachimi, B.; Taylor, A. N.

2011-09-01

Forecasts of statistical constraints on model parameters using the Fisher matrix abound in many fields of astrophysics. The Fisher matrix formalism involves the assumption of Gaussianity in parameter space and hence fails to predict complex features of posterior probability distributions. Combining the standard Fisher matrix with Box-Cox transformations, we propose a novel method that accurately predicts arbitrary posterior shapes. The Box-Cox transformations are applied to parameter space to render it approximately multivariate Gaussian, performing the Fisher matrix calculation on the transformed parameters. We demonstrate that, after the Box-Cox parameters have been determined from an initial likelihood evaluation, the method correctly predicts changes in the posterior when varying various parameters of the experimental setup and the data analysis, with marginally higher computational cost than a standard Fisher matrix calculation. We apply the Box-Cox-Fisher formalism to forecast cosmological parameter constraints by future weak gravitational lensing surveys. The characteristic non-linear degeneracy between matter density parameter and normalization of matter density fluctuations is reproduced for several cases, and the capabilities of breaking this degeneracy by weak-lensing three-point statistics is investigated. Possible applications of Box-Cox transformations of posterior distributions are discussed, including the prospects for performing statistical data analysis steps in the transformed Gaussianized parameter space.

Comparing a single case to a control group - Applying linear mixed effects models to repeated measures data.

PubMed

Huber, Stefan; Klein, Elise; Moeller, Korbinian; Willmes, Klaus

2015-10-01

In neuropsychological research, single-cases are often compared with a small control sample. Crawford and colleagues developed inferential methods (i.e., the modified t-test) for such a research design. In the present article, we suggest an extension of the methods of Crawford and colleagues employing linear mixed models (LMM). We first show that a t-test for the significance of a dummy coded predictor variable in a linear regression is equivalent to the modified t-test of Crawford and colleagues. As an extension to this idea, we then generalized the modified t-test to repeated measures data by using LMMs to compare the performance difference in two conditions observed in a single participant to that of a small control group. The performance of LMMs regarding Type I error rates and statistical power were tested based on Monte-Carlo simulations. We found that starting with about 15-20 participants in the control sample Type I error rates were close to the nominal Type I error rate using the Satterthwaite approximation for the degrees of freedom. Moreover, statistical power was acceptable. Therefore, we conclude that LMMs can be applied successfully to statistically evaluate performance differences between a single-case and a control sample. Copyright © 2015 Elsevier Ltd. All rights reserved.
Statistics, Computation, and Modeling in Cosmology

NASA Astrophysics Data System (ADS)

Jewell, Jeff; Guiness, Joe; SAMSI 2016 Working Group in Cosmology

2017-01-01

Current and future ground and space based missions are designed to not only detect, but map out with increasing precision, details of the universe in its infancy to the present-day. As a result we are faced with the challenge of analyzing and interpreting observations from a wide variety of instruments to form a coherent view of the universe. Finding solutions to a broad range of challenging inference problems in cosmology is one of the goals of the “Statistics, Computation, and Modeling in Cosmology” workings groups, formed as part of the year long program on ‘Statistical, Mathematical, and Computational Methods for Astronomy’, hosted by the Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation funded institute. Two application areas have emerged for focused development in the cosmology working group involving advanced algorithmic implementations of exact Bayesian inference for the Cosmic Microwave Background, and statistical modeling of galaxy formation. The former includes study and development of advanced Markov Chain Monte Carlo algorithms designed to confront challenging inference problems including inference for spatial Gaussian random fields in the presence of sources of galactic emission (an example of a source separation problem). Extending these methods to future redshift survey data probing the nonlinear regime of large scale structure formation is also included in the working group activities. In addition, the working group is also focused on the study of ‘Galacticus’, a galaxy formation model applied to dark matter-only cosmological N-body simulations operating on time-dependent halo merger trees. The working group is interested in calibrating the Galacticus model to match statistics of galaxy survey observations; specifically stellar mass functions, luminosity functions, and color-color diagrams. The group will use subsampling approaches and fractional factorial designs to statistically and computationally efficiently explore the Galacticus parameter space. The group will also use the Galacticus simulations to study the relationship between the topological and physical structure of the halo merger trees and the properties of the resulting galaxies.
Parameter identification for structural dynamics based on interval analysis algorithm

NASA Astrophysics Data System (ADS)

Yang, Chen; Lu, Zixing; Yang, Zhenyu; Liang, Ke

2018-04-01

A parameter identification method using interval analysis algorithm for structural dynamics is presented in this paper. The proposed uncertain identification method is investigated by using central difference method and ARMA system. With the help of the fixed memory least square method and matrix inverse lemma, a set-membership identification technology is applied to obtain the best estimation of the identified parameters in a tight and accurate region. To overcome the lack of insufficient statistical description of the uncertain parameters, this paper treats uncertainties as non-probabilistic intervals. As long as we know the bounds of uncertainties, this algorithm can obtain not only the center estimations of parameters, but also the bounds of errors. To improve the efficiency of the proposed method, a time-saving algorithm is presented by recursive formula. At last, to verify the accuracy of the proposed method, two numerical examples are applied and evaluated by three identification criteria respectively.
Determination and discrimination of biodiesel fuels by gas chromatographic and chemometric methods

NASA Astrophysics Data System (ADS)

Milina, R.; Mustafa, Z.; Bojilov, D.; Dagnon, S.; Moskovkina, M.

2016-03-01

Pattern recognition method (PRM) was applied to gas chromatographic (GC) data for a fatty acid methyl esters (FAME) composition of commercial and laboratory synthesized biodiesel fuels from vegetable oils including sunflower, rapeseed, corn and palm oils. Two GC quantitative methods to calculate individual fames were compared: Area % and internal standard. The both methods were applied for analysis of two certified reference materials. The statistical processing of the obtained results demonstrates the accuracy and precision of the two methods and allows them to be compared. For further chemometric investigations of biodiesel fuels by their FAME-profiles any of those methods can be used. PRM results of FAME profiles of samples from different vegetable oils show a successful recognition of biodiesels according to the feedstock. The information obtained can be used for selection of feedstock to produce biodiesels with certain properties, for assessing their interchangeability, for fuel spillage and remedial actions in the environment.
A Simple Graphical Method for Quantification of Disaster Management Surge Capacity Using Computer Simulation and Process-control Tools.

PubMed

Franc, Jeffrey Michael; Ingrassia, Pier Luigi; Verde, Manuela; Colombo, Davide; Della Corte, Francesco

2015-02-01

Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge capacity using PV and LOS metrics. These objective metrics can then be applied to other simulation groups using simple graphical process-control tools to provide a numeric measure of surge capacity. Repeated use in simulations of actual EDs may represent a potential means of objectively quantifying disaster management surge capacity. It is hoped that the described statistical method, which is simple and reusable, will be useful for investigators in this field to apply to their own research.
Statistical Literacy among Applied Linguists and Second Language Acquisition Researchers

ERIC Educational Resources Information Center

Loewen, Shawn; Lavolette, Elizabeth; Spino, Le Anne; Papi, Mostafa; Schmidtke, Jens; Sterling, Scott; Wolff, Dominik

2014-01-01

The importance of statistical knowledge in applied linguistics and second language acquisition (SLA) research has been emphasized in recent publications. However, the last investigation of the statistical literacy of applied linguists occurred more than 25 years ago (Lazaraton, Riggenbach, & Ediger, 1987). The current study undertook a partial…
Fractal analysis of the short time series in a visibility graph method

NASA Astrophysics Data System (ADS)

Li, Ruixue; Wang, Jiang; Yu, Haitao; Deng, Bin; Wei, Xile; Chen, Yingyuan

2016-05-01

The aim of this study is to evaluate the performance of the visibility graph (VG) method on short fractal time series. In this paper, the time series of Fractional Brownian motions (fBm), characterized by different Hurst exponent H, are simulated and then mapped into a scale-free visibility graph, of which the degree distributions show the power-law form. The maximum likelihood estimation (MLE) is applied to estimate power-law indexes of degree distribution, and in this progress, the Kolmogorov-Smirnov (KS) statistic is used to test the performance of estimation of power-law index, aiming to avoid the influence of droop head and heavy tail in degree distribution. As a result, we find that the MLE gives an optimal estimation of power-law index when KS statistic reaches its first local minimum. Based on the results from KS statistic, the relationship between the power-law index and the Hurst exponent is reexamined and then amended to meet short time series. Thus, a method combining VG, MLE and KS statistics is proposed to estimate Hurst exponents from short time series. Lastly, this paper also offers an exemplification to verify the effectiveness of the combined method. In addition, the corresponding results show that the VG can provide a reliable estimation of Hurst exponents.
Interrelationships Between Receiver/Relative Operating Characteristics Display, Binomial, Logit, and Bayes' Rule Probability of Detection Methodologies

NASA Technical Reports Server (NTRS)

Generazio, Edward R.

2014-01-01

Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.
Applying a Mixed Methods Framework to Differential Item Function Analyses

ERIC Educational Resources Information Center

Hitchcock, John H.; Johanson, George A.

2015-01-01

Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Introducing Students to Plant Geography: Polar Ordination Applied to Hanging Gardens.

ERIC Educational Resources Information Center

Malanson, George P.; And Others

1993-01-01

Reports on a research study in which college students used a statistical ordination method to reveal relationships among plant community structures and physical, disturbance, and spatial variables. Concludes that polar ordination helps students understand the methodology of plant geography and encourages further student research. (CFR)
Economic Psychology: Its Connections with Research-Oriented Courses

ERIC Educational Resources Information Center

Christopher, Andrew N.; Marek, Pam; Benigno, Joann

2003-01-01

To enhance student interest in research methods, tests and measurement, and statistics classes, we describe how teachers may use resources from economic psychology to illustrate key concepts in these courses. Because of their applied nature and relevance to student experiences, topics covered by these resources may capture student attention and…
Learning Opportunities for Group Learning

ERIC Educational Resources Information Center

Gil, Alfonso J.; Mataveli, Mara

2017-01-01

Purpose: This paper aims to analyse the impact of organizational learning culture and learning facilitators in group learning. Design/methodology/approach: This study was conducted using a survey method applied to a statistically representative sample of employees from Rioja wine companies in Spain. A model was tested using a structural equation…
Development of a Self-Rated Mixed Methods Skills Assessment: The National Institutes of Health Mixed Methods Research Training Program for the Health Sciences.

PubMed

Guetterman, Timothy C; Creswell, John W; Wittink, Marsha; Barg, Fran K; Castro, Felipe G; Dahlberg, Britt; Watkins, Daphne C; Deutsch, Charles; Gallo, Joseph J

2017-01-01

Demand for training in mixed methods is high, with little research on faculty development or assessment in mixed methods. We describe the development of a self-rated mixed methods skills assessment and provide validity evidence. The instrument taps six research domains: "Research question," "Design/approach," "Sampling," "Data collection," "Analysis," and "Dissemination." Respondents are asked to rate their ability to define or explain concepts of mixed methods under each domain, their ability to apply the concepts to problems, and the extent to which they need to improve. We administered the questionnaire to 145 faculty and students using an internet survey. We analyzed descriptive statistics and performance characteristics of the questionnaire using the Cronbach alpha to assess reliability and an analysis of variance that compared a mixed methods experience index with assessment scores to assess criterion relatedness. Internal consistency reliability was high for the total set of items (0.95) and adequate (≥0.71) for all but one subscale. Consistent with establishing criterion validity, respondents who had more professional experiences with mixed methods (eg, published a mixed methods article) rated themselves as more skilled, which was statistically significant across the research domains. This self-rated mixed methods assessment instrument may be a useful tool to assess skills in mixed methods for training programs. It can be applied widely at the graduate and faculty level. For the learner, assessment may lead to enhanced motivation to learn and training focused on self-identified needs. For faculty, the assessment may improve curriculum and course content planning.
The application of the statistical classifying models for signal evaluation of the gas sensors analyzing mold contamination of the building materials

NASA Astrophysics Data System (ADS)

Majerek, Dariusz; Guz, Łukasz; Suchorab, Zbigniew; Łagód, Grzegorz; Sobczuk, Henryk

2017-07-01

Mold that develops on moistened building barriers is a major cause of the Sick Building Syndrome (SBS). Fungal contamination is normally evaluated using standard biological methods which are time-consuming and require a lot of manual labor. Fungi emit Volatile Organic Compounds (VOC) that can be detected in the indoor air using several techniques of detection e.g. chromatography. VOCs can be also detected using gas sensors arrays. All array sensors generate particular voltage signals that ought to be analyzed using properly selected statistical methods of interpretation. This work is focused on the attempt to apply statistical classifying models in evaluation of signals from gas sensors matrix to analyze the air sampled from the headspace of various types of the building materials at different level of contamination but also clean reference materials.
Geostatistics and GIS: tools for characterizing environmental contamination.

PubMed

Henshaw, Shannon L; Curriero, Frank C; Shields, Timothy M; Glass, Gregory E; Strickland, Paul T; Breysse, Patrick N

2004-08-01

Geostatistics is a set of statistical techniques used in the analysis of georeferenced data that can be applied to environmental contamination and remediation studies. In this study, the 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE) contamination at a Superfund site in western Maryland is evaluated. Concern about the site and its future clean up has triggered interest within the community because residential development surrounds the area. Spatial statistical methods, of which geostatistics is a subset, are becoming increasingly popular, in part due to the availability of geographic information system (GIS) software in a variety of application packages. In this article, the joint use of ArcGIS software and the R statistical computing environment are demonstrated as an approach for comprehensive geostatistical analyses. The spatial regression method, kriging, is used to provide predictions of DDE levels at unsampled locations both within the site and the surrounding areas where residential development is ongoing.
Statistical shape analysis using 3D Poisson equation--A quantitatively validated approach.

PubMed

Gao, Yi; Bouix, Sylvain

2016-05-01

Statistical shape analysis has been an important area of research with applications in biology, anatomy, neuroscience, agriculture, paleontology, etc. Unfortunately, the proposed methods are rarely quantitatively evaluated, and as shown in recent studies, when they are evaluated, significant discrepancies exist in their outputs. In this work, we concentrate on the problem of finding the consistent location of deformation between two population of shapes. We propose a new shape analysis algorithm along with a framework to perform a quantitative evaluation of its performance. Specifically, the algorithm constructs a Signed Poisson Map (SPoM) by solving two Poisson equations on the volumetric shapes of arbitrary topology, and statistical analysis is then carried out on the SPoMs. The method is quantitatively evaluated on synthetic shapes and applied on real shape data sets in brain structures. Copyright © 2016 Elsevier B.V. All rights reserved.
A Science and Risk-Based Pragmatic Methodology for Blend and Content Uniformity Assessment.

PubMed

Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Doshi, Chetan

2018-04-01

This paper describes a pragmatic approach that can be applied in assessing powder blend and unit dosage uniformity of solid dose products at Process Design, Process Performance Qualification, and Continued/Ongoing Process Verification stages of the Process Validation lifecycle. The statistically based sampling, testing, and assessment plan was developed due to the withdrawal of the FDA draft guidance for industry "Powder Blends and Finished Dosage Units-Stratified In-Process Dosage Unit Sampling and Assessment." This paper compares the proposed Grouped Area Variance Estimate (GAVE) method with an alternate approach outlining the practicality and statistical rationalization using traditional sampling and analytical methods. The approach is designed to fit solid dose processes assuring high statistical confidence in both powder blend uniformity and dosage unit uniformity during all three stages of the lifecycle complying with ASTM standards as recommended by the US FDA.
Protein Sectors: Statistical Coupling Analysis versus Conservation

PubMed Central

Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

2015-01-01

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535
Adaptive filtering in biological signal processing.

PubMed

Iyer, V K; Ploysongsang, Y; Ramamoorthy, P A

1990-01-01

The high dependence of conventional optimal filtering methods on the a priori knowledge of the signal and noise statistics render them ineffective in dealing with signals whose statistics cannot be predetermined accurately. Adaptive filtering methods offer a better alternative, since the a priori knowledge of statistics is less critical, real time processing is possible, and the computations are less expensive for this approach. Adaptive filtering methods compute the filter coefficients "on-line", converging to the optimal values in the least-mean square (LMS) error sense. Adaptive filtering is therefore apt for dealing with the "unknown" statistics situation and has been applied extensively in areas like communication, speech, radar, sonar, seismology, and biological signal processing and analysis for channel equalization, interference and echo canceling, line enhancement, signal detection, system identification, spectral analysis, beamforming, modeling, control, etc. In this review article adaptive filtering in the context of biological signals is reviewed. An intuitive approach to the underlying theory of adaptive filters and its applicability are presented. Applications of the principles in biological signal processing are discussed in a manner that brings out the key ideas involved. Current and potential future directions in adaptive biological signal processing are also discussed.
[Application of chemometrics in composition-activity relationship research of traditional Chinese medicine].

PubMed

Han, Sheng-Nan

2014-07-01

Chemometrics is a new branch of chemistry which is widely applied to various fields of analytical chemistry. Chemometrics can use theories and methods of mathematics, statistics, computer science and other related disciplines to optimize the chemical measurement process and maximize access to acquire chemical information and other information on material systems by analyzing chemical measurement data. In recent years, traditional Chinese medicine has attracted widespread attention. In the research of traditional Chinese medicine, it has been a key problem that how to interpret the relationship between various chemical components and its efficacy, which seriously restricts the modernization of Chinese medicine. As chemometrics brings the multivariate analysis methods into the chemical research, it has been applied as an effective research tool in the composition-activity relationship research of Chinese medicine. This article reviews the applications of chemometrics methods in the composition-activity relationship research in recent years. The applications of multivariate statistical analysis methods (such as regression analysis, correlation analysis, principal component analysis, etc. ) and artificial neural network (such as back propagation artificial neural network, radical basis function neural network, support vector machine, etc. ) are summarized, including the brief fundamental principles, the research contents and the advantages and disadvantages. Finally, the existing main problems and prospects of its future researches are proposed.

GSHSite: Exploiting an Iteratively Statistical Method to Identify S-Glutathionylation Sites with Substrate Specificity

PubMed Central

Chen, Yi-Ju; Lu, Cheng-Tsung; Huang, Kai-Yao; Wu, Hsin-Yi; Chen, Yu-Ju; Lee, Tzong-Yi

2015-01-01

S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences. PMID:25849935
Evidence and Clinical Trials.

NASA Astrophysics Data System (ADS)

Goodman, Steven N.

1989-11-01

This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.
Texture analysis with statistical methods for wheat ear extraction

NASA Astrophysics Data System (ADS)

Bakhouche, M.; Cointault, F.; Gouton, P.

2007-01-01

In agronomic domain, the simplification of crop counting, necessary for yield prediction and agronomic studies, is an important project for technical institutes such as Arvalis. Although the main objective of our global project is to conceive a mobile robot for natural image acquisition directly in a field, Arvalis has proposed us first to detect by image processing the number of wheat ears in images before to count them, which will allow to obtain the first component of the yield. In this paper we compare different texture image segmentation techniques based on feature extraction by first and higher order statistical methods which have been applied on our images. The extracted features are used for unsupervised pixel classification to obtain the different classes in the image. So, the K-means algorithm is implemented before the choice of a threshold to highlight the ears. Three methods have been tested in this feasibility study with very average error of 6%. Although the evaluation of the quality of the detection is visually done, automatic evaluation algorithms are currently implementing. Moreover, other statistical methods of higher order will be implemented in the future jointly with methods based on spatio-frequential transforms and specific filtering.
A Statistical Method of Identifying Interactions in Neuron–Glia Systems Based on Functional Multicell Ca2+ Imaging

PubMed Central

Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin

2014-01-01

Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
Assessing groundwater vulnerability to agrichemical contamination in the Midwest US

USGS Publications Warehouse

Burkart, M.R.; Kolpin, D.W.; James, D.E.

1999-01-01

Agrichemicals (herbicides and nitrate) are significant sources of diffuse pollution to groundwater. Indirect methods are needed to assess the potential for groundwater contamination by diffuse sources because groundwater monitoring is too costly to adequately define the geographic extent of contamination at a regional or national scale. This paper presents examples of the application of statistical, overlay and index, and process-based modeling methods for groundwater vulnerability assessments to a variety of data from the Midwest U.S. The principles for vulnerability assessment include both intrinsic (pedologic, climatologic, and hydrogeologic factors) and specific (contaminant and other anthropogenic factors) vulnerability of a location. Statistical methods use the frequency of contaminant occurrence, contaminant concentration, or contamination probability as a response variable. Statistical assessments are useful for defining the relations among explanatory and response variables whether they define intrinsic or specific vulnerability. Multivariate statistical analyses are useful for ranking variables critical to estimating water quality responses of interest. Overlay and index methods involve intersecting maps of intrinsic and specific vulnerability properties and indexing the variables by applying appropriate weights. Deterministic models use process-based equations to simulate contaminant transport and are distinguished from the other methods in their potential to predict contaminant transport in both space and time. An example of a one-dimensional leaching model linked to a geographic information system (GIS) to define a regional metamodel for contamination in the Midwest is included.
A study on the use of Gumbel approximation with the Bernoulli spatial scan statistic.

PubMed

Read, S; Bath, P A; Willett, P; Maheswaran, R

2013-08-30

The Bernoulli version of the spatial scan statistic is a well established method of detecting localised spatial clusters in binary labelled point data, a typical application being the epidemiological case-control study. A recent study suggests the inferential accuracy of several versions of the spatial scan statistic (principally the Poisson version) can be improved, at little computational cost, by using the Gumbel distribution, a method now available in SaTScan(TM) (www.satscan.org). We study in detail the effect of this technique when applied to the Bernoulli version and demonstrate that it is highly effective, albeit with some increase in false alarm rates at certain significance thresholds. We explain how this increase is due to the discrete nature of the Bernoulli spatial scan statistic and demonstrate that it can affect even small p-values. Despite this, we argue that the Gumbel method is actually preferable for very small p-values. Furthermore, we extend previous research by running benchmark trials on 12 000 synthetic datasets, thus demonstrating that the overall detection capability of the Bernoulli version (i.e. ratio of power to false alarm rate) is not noticeably affected by the use of the Gumbel method. We also provide an example application of the Gumbel method using data on hospital admissions for chronic obstructive pulmonary disease. Copyright © 2013 John Wiley & Sons, Ltd.
Logistic Regression with Multiple Random Effects: A Simulation Study of Estimation Methods and Statistical Packages

PubMed Central

Kim, Yoonsang; Emery, Sherry

2013-01-01

Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Living systematic reviews: 3. Statistical methods for updating meta-analyses.

PubMed

Simmonds, Mark; Salanti, Georgia; McKenzie, Joanne; Elliott, Julian

2017-11-01

A living systematic review (LSR) should keep the review current as new research evidence emerges. Any meta-analyses included in the review will also need updating as new material is identified. If the aim of the review is solely to present the best current evidence standard meta-analysis may be sufficient, provided reviewers are aware that results may change at later updates. If the review is used in a decision-making context, more caution may be needed. When using standard meta-analysis methods, the chance of incorrectly concluding that any updated meta-analysis is statistically significant when there is no effect (the type I error) increases rapidly as more updates are performed. Inaccurate estimation of any heterogeneity across studies may also lead to inappropriate conclusions. This paper considers four methods to avoid some of these statistical problems when updating meta-analyses: two methods, that is, law of the iterated logarithm and the Shuster method control primarily for inflation of type I error and two other methods, that is, trial sequential analysis and sequential meta-analysis control for type I and II errors (failing to detect a genuine effect) and take account of heterogeneity. This paper compares the methods and considers how they could be applied to LSRs. Copyright © 2017 Elsevier Inc. All rights reserved.
Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

PubMed

Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

2015-05-01

Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.
Statistical inference for noisy nonlinear ecological dynamic systems.

PubMed

Wood, Simon N

2010-08-26

Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Geostatistics - a tool applied to the distribution of Legionella pneumophila in a hospital water system.

PubMed

Laganà, Pasqualina; Moscato, Umberto; Poscia, Andrea; La Milia, Daniele Ignazio; Boccia, Stefania; Avventuroso, Emanuela; Delia, Santi

2015-01-01

Legionnaires' disease is normally acquired by inhalation of legionellae from a contaminated environmental source. Water systems of large buildings, such as hospitals, are often contaminated with legionellae and therefore represent a potential risk for the hospital population. The aim of this study was to evaluate the potential contamination of Legionella pneumophila (LP) in a large hospital in Italy through georeferential statistical analysis to assess the possible sources of dispersion and, consequently, the risk of exposure for both health care staff and patients. LP serogroups 1 and 2-14 distribution was considered in the wards housed on two consecutive floors of the hospital building. On the basis of information provided by 53 bacteriological analysis, a 'random' grid of points was chosen and spatial geostatistics or FAIk Kriging was applied and compared with the results of classical statistical analysis. Over 50% of the examined samples were positive for Legionella pneumophila. LP 1 was isolated in 69% of samples from the ground floor and in 60% of sample from the first floor; LP 2-14 in 36% of sample from the ground floor and 24% from the first. The iso-estimation maps show clearly the most contaminated pipe and the difference in the diffusion of the different L. pneumophila serogroups. Experimental work has demonstrated that geostatistical methods applied to the microbiological analysis of water matrices allows a better modeling of the phenomenon under study, a greater potential for risk management and a greater choice of methods of prevention and environmental recovery to be put in place with respect to the classical statistical analysis.
Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

PubMed

Gangnon, Ronald E

2012-03-01

The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.
Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

PubMed Central

Gangnon, Ronald E.

2011-01-01

Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118
Resolving anthropogenic aerosol pollution types - deconvolution and exploratory classification of pollution events

NASA Astrophysics Data System (ADS)

Äijälä, Mikko; Heikkinen, Liine; Fröhlich, Roman; Canonaco, Francesco; Prévôt, André S. H.; Junninen, Heikki; Petäjä, Tuukka; Kulmala, Markku; Worsnop, Douglas; Ehn, Mikael

2017-03-01

Mass spectrometric measurements commonly yield data on hundreds of variables over thousands of points in time. Refining and synthesizing this raw data into chemical information necessitates the use of advanced, statistics-based data analytical techniques. In the field of analytical aerosol chemistry, statistical, dimensionality reductive methods have become widespread in the last decade, yet comparable advanced chemometric techniques for data classification and identification remain marginal. Here we present an example of combining data dimensionality reduction (factorization) with exploratory classification (clustering), and show that the results cannot only reproduce and corroborate earlier findings, but also complement and broaden our current perspectives on aerosol chemical classification. We find that applying positive matrix factorization to extract spectral characteristics of the organic component of air pollution plumes, together with an unsupervised clustering algorithm, k-means+ + , for classification, reproduces classical organic aerosol speciation schemes. Applying appropriately chosen metrics for spectral dissimilarity along with optimized data weighting, the source-specific pollution characteristics can be statistically resolved even for spectrally very similar aerosol types, such as different combustion-related anthropogenic aerosol species and atmospheric aerosols with similar degree of oxidation. In addition to the typical oxidation level and source-driven aerosol classification, we were also able to classify and characterize outlier groups that would likely be disregarded in a more conventional analysis. Evaluating solution quality for the classification also provides means to assess the performance of mass spectral similarity metrics and optimize weighting for mass spectral variables. This facilitates algorithm-based evaluation of aerosol spectra, which may prove invaluable for future development of automatic methods for spectra identification and classification. Robust, statistics-based results and data visualizations also provide important clues to a human analyst on the existence and chemical interpretation of data structures. Applying these methods to a test set of data, aerosol mass spectrometric data of organic aerosol from a boreal forest site, yielded five to seven different recurring pollution types from various sources, including traffic, cooking, biomass burning and nearby sawmills. Additionally, three distinct, minor pollution types were discovered and identified as amine-dominated aerosols.
Remarks on a financial inverse problem by means of Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Cuomo, Salvatore; Di Somma, Vittorio; Sica, Federica

2017-10-01

Estimating the price of a barrier option is a typical inverse problem. In this paper we present a numerical and statistical framework for a market with risk-free interest rate and a risk asset, described by a Geometric Brownian Motion (GBM). After approximating the risk asset with a numerical method, we find the final option price by following an approach based on sequential Monte Carlo methods. All theoretical results are applied to the case of an option whose underlying is a real stock.
Sequential neural text compression.

PubMed

Schmidhuber, J; Heil, S

1996-01-01

The purpose of this paper is to show that neural networks may be promising tools for data compression without loss of information. We combine predictive neural nets and statistical coding techniques to compress text files. We apply our methods to certain short newspaper articles and obtain compression ratios exceeding those of the widely used Lempel-Ziv algorithms (which build the basis of the UNIX functions "compress" and "gzip"). The main disadvantage of our methods is that they are about three orders of magnitude slower than standard methods.
Precipitation forecast using artificial neural networks. An application to the Guadalupe Valley, Baja California, Mexico

NASA Astrophysics Data System (ADS)

Herrera-Oliva, C. S.

2013-05-01

In this work we design and implement a method for the determination of precipitation forecast through the application of an elementary neuronal network (perceptron) to the statistical analysis of the precipitation reported in catalogues. The method is limited mainly by the catalogue length (and, in a smaller degree, by its accuracy). The method performance is measured using grading functions that evaluate a tradeoff between positive and negative aspects of performance. The method is applied to the Guadalupe Valley, Baja California, Mexico. Using consecutive intervals of dt=0.1 year, employing the data of several climatological stations situated in and surrounding this important wine industries zone. We evaluated the performance of different models of ANN, whose variables of entrance are the heights of precipitation. The results obtained were satisfactory, except for exceptional values of rain. Key words: precipitation forecast, artificial neural networks, statistical analysis
Comparative study of some robust statistical methods: weighted, parametric, and nonparametric linear regression of HPLC convoluted peak responses using internal standard method in drug bioavailability studies.

PubMed

Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A

2013-05-01

This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.
Unification of the complex Langevin method and the Lefschetzthimble method

NASA Astrophysics Data System (ADS)

Nishimura, Jun; Shimasaki, Shinji

2018-03-01

Recently there has been remarkable progress in solving the sign problem, which occurs in investigating statistical systems with a complex weight. The two promising methods, the complex Langevin method and the Lefschetz thimble method, share the idea of complexifying the dynamical variables, but their relationship has not been clear. Here we propose a unified formulation, in which the sign problem is taken care of by both the Langevin dynamics and the holomorphic gradient flow. We apply our formulation to a simple model in three different ways and show that one of them interpolates the two methods by changing the flow time.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

NASA Astrophysics Data System (ADS)

Eroglu, Sertac

2014-10-01

The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.

Perturbative Gaussianizing transforms for cosmological fields

NASA Astrophysics Data System (ADS)

Hall, Alex; Mead, Alexander

2018-01-01

Constraints on cosmological parameters from large-scale structure have traditionally been obtained from two-point statistics. However, non-linear structure formation renders these statistics insufficient in capturing the full information content available, necessitating the measurement of higher order moments to recover information which would otherwise be lost. We construct quantities based on non-linear and non-local transformations of weakly non-Gaussian fields that Gaussianize the full multivariate distribution at a given order in perturbation theory. Our approach does not require a model of the fields themselves and takes as input only the first few polyspectra, which could be modelled or measured from simulations or data, making our method particularly suited to observables lacking a robust perturbative description such as the weak-lensing shear. We apply our method to simulated density fields, finding a significantly reduced bispectrum and an enhanced correlation with the initial field. We demonstrate that our method reconstructs a large proportion of the linear baryon acoustic oscillations, improving the information content over the raw field by 35 per cent. We apply the transform to toy 21 cm intensity maps, showing that our method still performs well in the presence of complications such as redshift-space distortions, beam smoothing, pixel noise and foreground subtraction. We discuss how this method might provide a route to constructing a perturbative model of the fully non-Gaussian multivariate likelihood function.
Assessment study of lichenometric methods for dating surfaces

NASA Astrophysics Data System (ADS)

Jomelli, Vincent; Grancher, Delphine; Naveau, Philippe; Cooley, Daniel; Brunstein, Daniel

2007-04-01

In this paper, we discuss the advantages and drawbacks of the most classical approaches used in lichenometry. In particular, we perform a detailed comparison among methods based on the statistical analysis of either the largest lichen diameters recorded on geomorphic features or the frequency of all lichens. To assess the performance of each method, a careful comparison design with well-defined criteria is proposed and applied to two distinct data sets. First, we study 350 tombstones. This represents an ideal test bed because tombstone dates are known and, therefore, the quality of the estimated lichen growth curve can be easily tested for the different techniques. Secondly, 37 moraines from two tropical glaciers are investigated. This analysis corresponds to our real case study. For both data sets, we apply our list of criteria that reflects precision, error measurements and their theoretical foundations when proposing estimated ages and their associated confidence intervals. From this comparison, it clearly appears that two methods, the mean of the n largest lichen diameters and the recent Bayesian method based on extreme value theory, offer the most reliable estimates of moraine and tombstones dates. Concerning the spread of the error, the latter approach provides the smallest uncertainty and it is the only one that takes advantage of the statistical nature of the observations by fitting an extreme value distribution to the largest diameters.
Generalising Ward's Method for Use with Manhattan Distances.

PubMed

Strauss, Trudie; von Maltitz, Michael Johan

2017-01-01

The claim that Ward's linkage algorithm in hierarchical clustering is limited to use with Euclidean distances is investigated. In this paper, Ward's clustering algorithm is generalised to use with l1 norm or Manhattan distances. We argue that the generalisation of Ward's linkage method to incorporate Manhattan distances is theoretically sound and provide an example of where this method outperforms the method using Euclidean distances. As an application, we perform statistical analyses on languages using methods normally applied to biology and genetic classification. We aim to quantify differences in character traits between languages and use a statistical language signature based on relative bi-gram (sequence of two letters) frequencies to calculate a distance matrix between 32 Indo-European languages. We then use Ward's method of hierarchical clustering to classify the languages, using the Euclidean distance and the Manhattan distance. Results obtained from using the different distance metrics are compared to show that the Ward's algorithm characteristic of minimising intra-cluster variation and maximising inter-cluster variation is not violated when using the Manhattan metric.
Fuzzy-logic based strategy for validation of multiplex methods: example with qualitative GMO assays.

PubMed

Bellocchi, Gianni; Bertholet, Vincent; Hamels, Sandrine; Moens, W; Remacle, José; Van den Eede, Guy

2010-02-01

This paper illustrates the advantages that a fuzzy-based aggregation method could bring into the validation of a multiplex method for GMO detection (DualChip GMO kit, Eppendorf). Guidelines for validation of chemical, bio-chemical, pharmaceutical and genetic methods have been developed and ad hoc validation statistics are available and routinely used, for in-house and inter-laboratory testing, and decision-making. Fuzzy logic allows summarising the information obtained by independent validation statistics into one synthetic indicator of overall method performance. The microarray technology, introduced for simultaneous identification of multiple GMOs, poses specific validation issues (patterns of performance for a variety of GMOs at different concentrations). A fuzzy-based indicator for overall evaluation is illustrated in this paper, and applied to validation data for different genetically modified elements. Remarks were drawn on the analytical results. The fuzzy-logic based rules were shown to be applicable to improve interpretation of results and facilitate overall evaluation of the multiplex method.
Standard methods for sampling North American freshwater fishes

USGS Publications Warehouse

Bonar, Scott A.; Hubert, Wayne A.; Willis, David W.

2009-01-01

This important reference book provides standard sampling methods recommended by the American Fisheries Society for assessing and monitoring freshwater fish populations in North America. Methods apply to ponds, reservoirs, natural lakes, and streams and rivers containing cold and warmwater fishes. Range-wide and eco-regional averages for indices of abundance, population structure, and condition for individual species are supplied to facilitate comparisons of standard data among populations. Provides information on converting nonstandard to standard data, statistical and database procedures for analyzing and storing standard data, and methods to prevent transfer of invasive species while sampling.
Free-energy landscapes from adaptively biased methods: Application to quantum systems

NASA Astrophysics Data System (ADS)

Calvo, F.

2010-10-01

Several parallel adaptive biasing methods are applied to the calculation of free-energy pathways along reaction coordinates, choosing as a difficult example the double-funnel landscape of the 38-atom Lennard-Jones cluster. In the case of classical statistics, the Wang-Landau and adaptively biased molecular-dynamics (ABMD) methods are both found efficient if multiple walkers and replication and deletion schemes are used. An extension of the ABMD technique to quantum systems, implemented through the path-integral MD framework, is presented and tested on Ne38 against the quantum superposition method.
A systematic review on the use of time series data in the study of antimicrobial consumption and Pseudomonas aeruginosa resistance.

PubMed

Athanasiou, Christos I; Kopsini, Angeliki

2018-06-12

In the field of antimicrobial resistance, the number of studies that use time series data has increased recently. The purpose of this study is the systematic review of all studies on antibacterial consumption and on Pseudomonas aeruginosa resistance in healthcare settings, that have used time series data. A systematic review of the literature till June 2017 was conducted. All the studies that have used time series data and have examined the inhospital antibiotic consumption and Ps. aeruginosa resistance rates or incidence were eligible. No other exclusion criteria were applied. Data on the structure, terminology used, methods used and results of each article were recorded and analyzed as possible. A total of thirty six studies were retrieved, twenty three of which were in accordance with our criteria. Thirteen of them were quasi experimental studies and ten were ecological observational studies. Eighteen studies collected time series data of both parameters and the statistical methodology of "time series analysis" was applied in nine studies. Most of the studies were published in the last eight years. The Interrupted Time Series design was the most widespread. As expected, there was high heterogeneity in regard to the study design, terminology and statistical methods applied. Copyright © 2018. Published by Elsevier Ltd.
Rigorous Approach in Investigation of Seismic Structure and Source Characteristicsin Northeast Asia: Hierarchical and Trans-dimensional Bayesian Inversion

NASA Astrophysics Data System (ADS)

Mustac, M.; Kim, S.; Tkalcic, H.; Rhie, J.; Chen, Y.; Ford, S. R.; Sebastian, N.

2015-12-01

Conventional approaches to inverse problems suffer from non-linearity and non-uniqueness in estimations of seismic structures and source properties. Estimated results and associated uncertainties are often biased by applied regularizations and additional constraints, which are commonly introduced to solve such problems. Bayesian methods, however, provide statistically meaningful estimations of models and their uncertainties constrained by data information. In addition, hierarchical and trans-dimensional (trans-D) techniques are inherently implemented in the Bayesian framework to account for involved error statistics and model parameterizations, and, in turn, allow more rigorous estimations of the same. Here, we apply Bayesian methods throughout the entire inference process to estimate seismic structures and source properties in Northeast Asia including east China, the Korean peninsula, and the Japanese islands. Ambient noise analysis is first performed to obtain a base three-dimensional (3-D) heterogeneity model using continuous broadband waveforms from more than 300 stations. As for the tomography of surface wave group and phase velocities in the 5-70 s band, we adopt a hierarchical and trans-D Bayesian inversion method using Voronoi partition. The 3-D heterogeneity model is further improved by joint inversions of teleseismic receiver functions and dispersion data using a newly developed high-efficiency Bayesian technique. The obtained model is subsequently used to prepare 3-D structural Green's functions for the source characterization. A hierarchical Bayesian method for point source inversion using regional complete waveform data is applied to selected events from the region. The seismic structure and source characteristics with rigorously estimated uncertainties from the novel Bayesian methods provide enhanced monitoring and discrimination of seismic events in northeast Asia.
Proceedings of the Fourth Annual U.S. Army Conference on Applied Statistics, 21-23 October 1998.

DTIC Science & Technology

1999-11-01

1833) published a memoir Nouvelles mithodes pour la determination des cometes in which he introduced and named the method of least squares. In 1809...251,1972. 2. Sprott, D. A. "Gauss’s Contributions to Statistics." Historia Mathematica, vol. 5, pp. 183-203,1978. 3. Stigler, S. M. "An Attack on Gauss...Published by Legendre in 1820." Historia Mathematica. vol. 4, pp. 31-35, 1977. 4. Stigler, S. M. "Gauss and the Invention of Least Squares." The
Statistical significance of the rich-club phenomenon in complex networks

NASA Astrophysics Data System (ADS)

Jiang, Zhi-Qiang; Zhou, Wei-Xing

2008-04-01

We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.
Computer program for prediction of fuel consumption statistical data for an upper stage three-axes stabilized on-off control system

NASA Technical Reports Server (NTRS)

1982-01-01

A FORTRAN coded computer program and method to predict the reaction control fuel consumption statistics for a three axis stabilized rocket vehicle upper stage is described. A Monte Carlo approach is used which is more efficient by using closed form estimates of impulses. The effects of rocket motor thrust misalignment, static unbalance, aerodynamic disturbances, and deviations in trajectory, mass properties and control system characteristics are included. This routine can be applied to many types of on-off reaction controlled vehicles. The pseudorandom number generation and statistical analyses subroutines including the output histograms can be used for other Monte Carlo analyses problems.
Big Data and Neuroimaging.

PubMed

Webb-Vargas, Yenny; Chen, Shaojie; Fisher, Aaron; Mejia, Amanda; Xu, Yuting; Crainiceanu, Ciprian; Caffo, Brian; Lindquist, Martin A

2017-12-01

Big Data are of increasing importance in a variety of areas, especially in the biosciences. There is an emerging critical need for Big Data tools and methods, because of the potential impact of advancements in these areas. Importantly, statisticians and statistical thinking have a major role to play in creating meaningful progress in this arena. We would like to emphasize this point in this special issue, as it highlights both the dramatic need for statistical input for Big Data analysis and for a greater number of statisticians working on Big Data problems. We use the field of statistical neuroimaging to demonstrate these points. As such, this paper covers several applications and novel methodological developments of Big Data tools applied to neuroimaging data.
Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

PubMed

White, H; Racine, J

2001-01-01

We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Energy landscape analysis of neuroimaging data

NASA Astrophysics Data System (ADS)

Ezaki, Takahiro; Watanabe, Takamitsu; Ohzeki, Masayuki; Masuda, Naoki

2017-05-01

Computational neuroscience models have been used for understanding neural dynamics in the brain and how they may be altered when physiological or other conditions change. We review and develop a data-driven approach to neuroimaging data called the energy landscape analysis. The methods are rooted in statistical physics theory, in particular the Ising model, also known as the (pairwise) maximum entropy model and Boltzmann machine. The methods have been applied to fitting electrophysiological data in neuroscience for a decade, but their use in neuroimaging data is still in its infancy. We first review the methods and discuss some algorithms and technical aspects. Then, we apply the methods to functional magnetic resonance imaging data recorded from healthy individuals to inspect the relationship between the accuracy of fitting, the size of the brain system to be analysed and the data length. This article is part of the themed issue `Mathematical methods in medicine: neuroscience, cardiology and pathology'.
Determination of antioxidant power of red and white wines by a new electrochemical method and its correlation with polyphenolic content.

PubMed

Alonso, Angeles M; Domínguez, Cristina; Guillén, Dominico A; Barroso, Carmelo G

2002-05-22

A new method for measuring the antioxidant power of wine has been developed based on the accelerated electrochemical oxidation of 2,2'-azino-bis(3-ethylbenzthiazoline-6-sulfonic acid) (ABTS). The calibration (R = 0.9922) and repeatability study (RSD = 7%) have provided good statistical parameters. The method is easy and quick to apply and gives reliable results, requiring only the monitoring of time and absorbance. It has been applied to various red and white wines of different origins. The results have been compared with those obtained by the total antioxidant status (TAS) method. Both methods reveal that the more antioxidant wines are those with higher polyphenolic content. From the HPLC study of the polyphenolic content of the same samples, it is confirmed that there is a positive correlation between the resveratrol content of a wine and its antioxidant power.
Quantitative comparison of tympanic membrane displacements using two optical methods to recover the optical phase

NASA Astrophysics Data System (ADS)

Santiago-Lona, Cynthia V.; Hernández-Montes, María del Socorro; Mendoza-Santoyo, Fernando; Esquivel-Tejeda, Jesús

2018-02-01

The study and quantification of the tympanic membrane (TM) displacements add important information to advance the knowledge about the hearing process. A comparative statistical analysis between two commonly used demodulation methods employed to recover the optical phase in digital holographic interferometry, namely the fast Fourier transform and phase-shifting interferometry, is presented as applied to study thin tissues such as the TM. The resulting experimental TM surface displacement data are used to contrast both methods through the analysis of variance and F tests. Data are gathered when the TMs are excited with continuous sound stimuli at levels 86, 89 and 93 dB SPL for the frequencies of 800, 1300 and 2500 Hz under the same experimental conditions. The statistical analysis shows repeatability in z-direction displacements with a standard deviation of 0.086, 0.098 and 0.080 μm using the Fourier method, and 0.080, 0.104 and 0.055 μm with the phase-shifting method at a 95% confidence level for all frequencies. The precision and accuracy are evaluated by means of the coefficient of variation; the results with the Fourier method are 0.06143, 0.06125, 0.06154 and 0.06154, 0.06118, 0.06111 with phase-shifting. The relative error between both methods is 7.143, 6.250 and 30.769%. On comparing the measured displacements, the results indicate that there is no statistically significant difference between both methods for frequencies at 800 and 1300 Hz; however, errors and other statistics increase at 2500 Hz.
Quality evaluation of no-reference MR images using multidirectional filters and image statistics.

PubMed

Jang, Jinseong; Bang, Kihun; Jang, Hanbyol; Hwang, Dosik

2018-09-01

This study aimed to develop a fully automatic, no-reference image-quality assessment (IQA) method for MR images. New quality-aware features were obtained by applying multidirectional filters to MR images and examining the feature statistics. A histogram of these features was then fitted to a generalized Gaussian distribution function for which the shape parameters yielded different values depending on the type of distortion in the MR image. Standard feature statistics were established through a training process based on high-quality MR images without distortion. Subsequently, the feature statistics of a test MR image were calculated and compared with the standards. The quality score was calculated as the difference between the shape parameters of the test image and the undistorted standard images. The proposed IQA method showed a >0.99 correlation with the conventional full-reference assessment methods; accordingly, this proposed method yielded the best performance among no-reference IQA methods for images containing six types of synthetic, MR-specific distortions. In addition, for authentically distorted images, the proposed method yielded the highest correlation with subjective assessments by human observers, thus demonstrating its superior performance over other no-reference IQAs. Our proposed IQA was designed to consider MR-specific features and outperformed other no-reference IQAs designed mainly for photographic images. Magn Reson Med 80:914-924, 2018. © 2018 International Society for Magnetic Resonance in Medicine. © 2018 International Society for Magnetic Resonance in Medicine.
Multivariate assessment of event-related potentials with the t-CWT method.

PubMed

Bostanov, Vladimir

2015-11-05

Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.
Power flow as a complement to statistical energy analysis and finite element analysis

NASA Technical Reports Server (NTRS)

Cuschieri, J. M.

1987-01-01

Present methods of analysis of the structural response and the structure-borne transmission of vibrational energy use either finite element (FE) techniques or statistical energy analysis (SEA) methods. The FE methods are a very useful tool at low frequencies where the number of resonances involved in the analysis is rather small. On the other hand SEA methods can predict with acceptable accuracy the response and energy transmission between coupled structures at relatively high frequencies where the structural modal density is high and a statistical approach is the appropriate solution. In the mid-frequency range, a relatively large number of resonances exist which make finite element method too costly. On the other hand SEA methods can only predict an average level form. In this mid-frequency range a possible alternative is to use power flow techniques, where the input and flow of vibrational energy to excited and coupled structural components can be expressed in terms of input and transfer mobilities. This power flow technique can be extended from low to high frequencies and this can be integrated with established FE models at low frequencies and SEA models at high frequencies to form a verification of the method. This method of structural analysis using power flo and mobility methods, and its integration with SEA and FE analysis is applied to the case of two thin beams joined together at right angles.
A comparative study of progressive versus successive spectrophotometric resolution techniques applied for pharmaceutical ternary mixtures

NASA Astrophysics Data System (ADS)

Saleh, Sarah S.; Lotfy, Hayam M.; Hassan, Nagiba Y.; Salem, Hesham

2014-11-01

This work represents a comparative study of a novel progressive spectrophotometric resolution technique namely, amplitude center method (ACM), versus the well-established successive spectrophotometric resolution techniques namely; successive derivative subtraction (SDS); successive derivative of ratio spectra (SDR) and mean centering of ratio spectra (MCR). All the proposed spectrophotometric techniques consist of several consecutive steps utilizing ratio and/or derivative spectra. The novel amplitude center method (ACM) can be used for the determination of ternary mixtures using single divisor where the concentrations of the components are determined through progressive manipulation performed on the same ratio spectrum. Those methods were applied for the analysis of the ternary mixture of chloramphenicol (CHL), dexamethasone sodium phosphate (DXM) and tetryzoline hydrochloride (TZH) in eye drops in the presence of benzalkonium chloride as a preservative. The proposed methods were checked using laboratory-prepared mixtures and were successfully applied for the analysis of pharmaceutical formulation containing the cited drugs. The proposed methods were validated according to the ICH guidelines. A comparative study was conducted between those methods regarding simplicity, limitation and sensitivity. The obtained results were statistically compared with those obtained from the official BP methods, showing no significant difference with respect to accuracy and precision.

Distinguishing Positive Selection From Neutral Evolution: Boosting the Performance of Summary Statistics

PubMed Central

Lin, Kao; Li, Haipeng; Schlötterer, Christian; Futschik, Andreas

2011-01-01

Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's θ was found to contribute the most information for distinguishing between bottlenecks and selection. PMID:21041556
Evaluating the efficiency of spectral resolution of univariate methods manipulating ratio spectra and comparing to multivariate methods: An application to ternary mixture in common cold preparation

NASA Astrophysics Data System (ADS)

Moustafa, Azza Aziz; Salem, Hesham; Hegazy, Maha; Ali, Omnia

2015-02-01

Simple, accurate, and selective methods have been developed and validated for simultaneous determination of a ternary mixture of Chlorpheniramine maleate (CPM), Pseudoephedrine HCl (PSE) and Ibuprofen (IBF), in tablet dosage form. Four univariate methods manipulating ratio spectra were applied, method A is the double divisor-ratio difference spectrophotometric method (DD-RD). Method B is double divisor-derivative ratio spectrophotometric method (DD-RD). Method C is derivative ratio spectrum-zero crossing method (DRZC), while method D is mean centering of ratio spectra (MCR). Two multivariate methods were also developed and validated, methods E and F are Principal Component Regression (PCR) and Partial Least Squares (PLSs). The proposed methods have the advantage of simultaneous determination of the mentioned drugs without prior separation steps. They were successfully applied to laboratory-prepared mixtures and to commercial pharmaceutical preparation without any interference from additives. The proposed methods were validated according to the ICH guidelines. The obtained results were statistically compared with the official methods where no significant difference was observed regarding both accuracy and precision.
SPATIO-TEMPORAL MODELING OF AGRICULTURAL YIELD DATA WITH AN APPLICATION TO PRICING CROP INSURANCE CONTRACTS

PubMed Central

Ozaki, Vitor A.; Ghosh, Sujit K.; Goodwin, Barry K.; Shirota, Ricardo

2009-01-01

This article presents a statistical model of agricultural yield data based on a set of hierarchical Bayesian models that allows joint modeling of temporal and spatial autocorrelation. This method captures a comprehensive range of the various uncertainties involved in predicting crop insurance premium rates as opposed to the more traditional ad hoc, two-stage methods that are typically based on independent estimation and prediction. A panel data set of county-average yield data was analyzed for 290 counties in the State of Paraná (Brazil) for the period of 1990 through 2002. Posterior predictive criteria are used to evaluate different model specifications. This article provides substantial improvements in the statistical and actuarial methods often applied to the calculation of insurance premium rates. These improvements are especially relevant to situations where data are limited. PMID:19890450
Multiple Phenotype Association Tests Using Summary Statistics in Genome-Wide Association Studies

PubMed Central

Liu, Zhonghua; Lin, Xihong

2017-01-01

Summary We study in this paper jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. PMID:28653391
Multiple phenotype association tests using summary statistics in genome-wide association studies.

PubMed

Liu, Zhonghua; Lin, Xihong

2018-03-01

We study in this article jointly testing the associations of a genetic variant with correlated multiple phenotypes using the summary statistics of individual phenotype analysis from Genome-Wide Association Studies (GWASs). We estimated the between-phenotype correlation matrix using the summary statistics of individual phenotype GWAS analyses, and developed genetic association tests for multiple phenotypes by accounting for between-phenotype correlation without the need to access individual-level data. Since genetic variants often affect multiple phenotypes differently across the genome and the between-phenotype correlation can be arbitrary, we proposed robust and powerful multiple phenotype testing procedures by jointly testing a common mean and a variance component in linear mixed models for summary statistics. We computed the p-values of the proposed tests analytically. This computational advantage makes our methods practically appealing in large-scale GWASs. We performed simulation studies to show that the proposed tests maintained correct type I error rates, and to compare their powers in various settings with the existing methods. We applied the proposed tests to a GWAS Global Lipids Genetics Consortium summary statistics data set and identified additional genetic variants that were missed by the original single-trait analysis. © 2017, The International Biometric Society.
Spectrophotometric methods for simultaneous determination of betamethasone valerate and fusidic acid in their binary mixture.

PubMed

Lotfy, Hayam Mahmoud; Salem, Hesham; Abdelkawy, Mohammad; Samir, Ahmed

2015-04-05

Five spectrophotometric methods were successfully developed and validated for the determination of betamethasone valerate and fusidic acid in their binary mixture. Those methods are isoabsorptive point method combined with the first derivative (ISO Point--D1) and the recently developed and well established methods namely ratio difference (RD) and constant center coupled with spectrum subtraction (CC) methods, in addition to derivative ratio (1DD) and mean centering of ratio spectra (MCR). New enrichment technique called spectrum addition technique was used instead of traditional spiking technique. The proposed spectrophotometric procedures do not require any separation steps. Accuracy, precision and linearity ranges of the proposed methods were determined and the specificity was assessed by analyzing synthetic mixtures of both drugs. They were applied to their pharmaceutical formulation and the results obtained were statistically compared to that of official methods. The statistical comparison showed that there is no significant difference between the proposed methods and the official ones regarding both accuracy and precision. Copyright © 2015 Elsevier B.V. All rights reserved.
Avoid lost discoveries, because of violations of standard assumptions, by using modern robust statistical methods.

PubMed

Wilcox, Rand; Carlson, Mike; Azen, Stan; Clark, Florence

2013-03-01

Recently, there have been major advances in statistical techniques for assessing central tendency and measures of association. The practical utility of modern methods has been documented extensively in the statistics literature, but they remain underused and relatively unknown in clinical trials. Our objective was to address this issue. STUDY DESIGN AND PURPOSE: The first purpose was to review common problems associated with standard methodologies (low power, lack of control over type I errors, and incorrect assessments of the strength of the association). The second purpose was to summarize some modern methods that can be used to circumvent such problems. The third purpose was to illustrate the practical utility of modern robust methods using data from the Well Elderly 2 randomized controlled trial. In multiple instances, robust methods uncovered differences among groups and associations among variables that were not detected by classic techniques. In particular, the results demonstrated that details of the nature and strength of the association were sometimes overlooked when using ordinary least squares regression and Pearson correlation. Modern robust methods can make a practical difference in detecting and describing differences between groups and associations between variables. Such procedures should be applied more frequently when analyzing trial-based data. Copyright © 2013 Elsevier Inc. All rights reserved.
Minimal spanning tree algorithm for γ-ray source detection in sparse photon images: cluster parameters and selection strategies

DOE PAGES

Campana, R.; Bernieri, E.; Massaro, E.; ...

2013-05-22

We present that the minimal spanning tree (MST) algorithm is a graph-theoretical cluster-finding method. We previously applied it to γ-ray bidimensional images, showing that it is quite sensitive in finding faint sources. Possible sources are associated with the regions where the photon arrival directions clusterize. MST selects clusters starting from a particular “tree” connecting all the point of the image and performing a cut based on the angular distance between photons, with a number of events higher than a given threshold. In this paper, we show how a further filtering, based on some parameters linked to the cluster properties, canmore » be applied to reduce spurious detections. We find that the most efficient parameter for this secondary selection is the magnitudeM of a cluster, defined as the product of its number of events by its clustering degree. We test the sensitivity of the method by means of simulated and real Fermi-Large Area Telescope (LAT) fields. Our results show that √M is strongly correlated with other statistical significance parameters, derived from a wavelet based algorithm and maximum likelihood (ML) analysis, and that it can be used as a good estimator of statistical significance of MST detections. Finally, we apply the method to a 2-year LAT image at energies higher than 3 GeV, and we show the presence of new clusters, likely associated with BL Lac objects.« less
Application of a deep-learning method to the forecast of daily solar flare occurrence using Convolution Neural Network

NASA Astrophysics Data System (ADS)

Shin, Seulki; Moon, Yong-Jae; Chu, Hyoungseok

2017-08-01

As the application of deep-learning methods has been succeeded in various fields, they have a high potential to be applied to space weather forecasting. Convolutional neural network, one of deep learning methods, is specialized in image recognition. In this study, we apply the AlexNet architecture, which is a winner of Imagenet Large Scale Virtual Recognition Challenge (ILSVRC) 2012, to the forecast of daily solar flare occurrence using the MatConvNet software of MATLAB. Our input images are SOHO/MDI, EIT 195Å, and 304Å from January 1996 to December 2010, and output ones are yes or no of flare occurrence. We select training dataset from Jan 1996 to Dec 2000 and from Jan 2003 to Dec 2008. Testing dataset is chosen from Jan 2001 to Dec 2002 and from Jan 2009 to Dec 2010 in order to consider the solar cycle effect. In training dataset, we randomly select one fifth of training data for validation dataset to avoid the overfitting problem. Our model successfully forecasts the flare occurrence with about 0.90 probability of detection (POD) for common flares (C-, M-, and X-class). While POD of major flares (M- and X-class) forecasting is 0.96, false alarm rate (FAR) also scores relatively high(0.60). We also present several statistical parameters such as critical success index (CSI) and true skill statistics (TSS). Our model can immediately be applied to automatic forecasting service when image data are available.
Magnification Bias in Gravitational Arc Statistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caminha, G. B.; Estrada, J.; Makler, M.

2013-08-29

The statistics of gravitational arcs in galaxy clusters is a powerful probe of cluster structure and may provide complementary cosmological constraints. Despite recent progresses, discrepancies still remain among modelling and observations of arc abundance, specially regarding the redshift distribution of strong lensing clusters. Besides, fast "semi-analytic" methods still have to incorporate the success obtained with simulations. In this paper we discuss the contribution of the magnification in gravitational arc statistics. Although lensing conserves surface brightness, the magnification increases the signal-to-noise ratio of the arcs, enhancing their detectability. We present an approach to include this and other observational effects in semi-analyticmore » calculations for arc statistics. The cross section for arc formation ({\\sigma}) is computed through a semi-analytic method based on the ratio of the eigenvalues of the magnification tensor. Using this approach we obtained the scaling of {\\sigma} with respect to the magnification, and other parameters, allowing for a fast computation of the cross section. We apply this method to evaluate the expected number of arcs per cluster using an elliptical Navarro--Frenk--White matter distribution. Our results show that the magnification has a strong effect on the arc abundance, enhancing the fraction of arcs, moving the peak of the arc fraction to higher redshifts, and softening its decrease at high redshifts. We argue that the effect of magnification should be included in arc statistics modelling and that it could help to reconcile arcs statistics predictions with the observational data.« less
Statistical process control methods allow the analysis and improvement of anesthesia care.

PubMed

Fasting, Sigurd; Gisvold, Sven E

2003-10-01

Quality aspects of the anesthetic process are reflected in the rate of intraoperative adverse events. The purpose of this report is to illustrate how the quality of the anesthesia process can be analyzed using statistical process control methods, and exemplify how this analysis can be used for quality improvement. We prospectively recorded anesthesia-related data from all anesthetics for five years. The data included intraoperative adverse events, which were graded into four levels, according to severity. We selected four adverse events, representing important quality and safety aspects, for statistical process control analysis. These were: inadequate regional anesthesia, difficult emergence from general anesthesia, intubation difficulties and drug errors. We analyzed the underlying process using 'p-charts' for statistical process control. In 65,170 anesthetics we recorded adverse events in 18.3%; mostly of lesser severity. Control charts were used to define statistically the predictable normal variation in problem rate, and then used as a basis for analysis of the selected problems with the following results: Inadequate plexus anesthesia: stable process, but unacceptably high failure rate; Difficult emergence: unstable process, because of quality improvement efforts; Intubation difficulties: stable process, rate acceptable; Medication errors: methodology not suited because of low rate of errors. By applying statistical process control methods to the analysis of adverse events, we have exemplified how this allows us to determine if a process is stable, whether an intervention is required, and if quality improvement efforts have the desired effect.
Analysis of data collected from right and left limbs: Accounting for dependence and improving statistical efficiency in musculoskeletal research.

PubMed

Stewart, Sarah; Pearson, Janet; Rome, Keith; Dalbeth, Nicola; Vandal, Alain C

2018-01-01

Statistical techniques currently used in musculoskeletal research often inefficiently account for paired-limb measurements or the relationship between measurements taken from multiple regions within limbs. This study compared three commonly used analysis methods with a mixed-models approach that appropriately accounted for the association between limbs, regions, and trials and that utilised all information available from repeated trials. Four analysis were applied to an existing data set containing plantar pressure data, which was collected for seven masked regions on right and left feet, over three trials, across three participant groups. Methods 1-3 averaged data over trials and analysed right foot data (Method 1), data from a randomly selected foot (Method 2), and averaged right and left foot data (Method 3). Method 4 used all available data in a mixed-effects regression that accounted for repeated measures taken for each foot, foot region and trial. Confidence interval widths for the mean differences between groups for each foot region were used as a criterion for comparison of statistical efficiency. Mean differences in pressure between groups were similar across methods for each foot region, while the confidence interval widths were consistently smaller for Method 4. Method 4 also revealed significant between-group differences that were not detected by Methods 1-3. A mixed effects linear model approach generates improved efficiency and power by producing more precise estimates compared to alternative approaches that discard information in the process of accounting for paired-limb measurements. This approach is recommended in generating more clinically sound and statistically efficient research outputs. Copyright © 2017 Elsevier B.V. All rights reserved.
Automatic summarization of changes in biological image sequences using algorithmic information theory.

PubMed

Cohen, Andrew R; Bjornsson, Christopher S; Temple, Sally; Banker, Gary; Roysam, Badrinath

2009-08-01

An algorithmic information-theoretic method is presented for object-level summarization of meaningful changes in image sequences. Object extraction and tracking data are represented as an attributed tracking graph (ATG). Time courses of object states are compared using an adaptive information distance measure, aided by a closed-form multidimensional quantization. The notion of meaningful summarization is captured by using the gap statistic to estimate the randomness deficiency from algorithmic statistics. The summary is the clustering result and feature subset that maximize the gap statistic. This approach was validated on four bioimaging applications: 1) It was applied to a synthetic data set containing two populations of cells differing in the rate of growth, for which it correctly identified the two populations and the single feature out of 23 that separated them; 2) it was applied to 59 movies of three types of neuroprosthetic devices being inserted in the brain tissue at three speeds each, for which it correctly identified insertion speed as the primary factor affecting tissue strain; 3) when applied to movies of cultured neural progenitor cells, it correctly distinguished neurons from progenitors without requiring the use of a fixative stain; and 4) when analyzing intracellular molecular transport in cultured neurons undergoing axon specification, it automatically confirmed the role of kinesins in axon specification.
Wear behavior of AA 5083/SiC nano-particle metal matrix composite: Statistical analysis

NASA Astrophysics Data System (ADS)

Hussain Idrisi, Amir; Ismail Mourad, Abdel-Hamid; Thekkuden, Dinu Thomas; Christy, John Victor

2018-03-01

This paper reports study on statistical analysis of the wear characteristics of AA5083/SiC nanocomposite. The aluminum matrix composites with different wt % (0%, 1% and 2%) of SiC nanoparticles were fabricated by using stir casting route. The developed composites were used in the manufacturing of spur gears on which the study was conducted. A specially designed test rig was used in testing the wear performance of the gears. The wear was investigated under different conditions of applied load (10N, 20N, and 30N) and operation time (30 mins, 60 mins, 90 mins, and 120mins). The analysis carried out at room temperature under constant speed of 1450 rpm. The wear parameters were optimized by using Taguchi’s method. During this statistical approach, L27 Orthogonal array was selected for the analysis of output. Furthermore, analysis of variance (ANOVA) was used to investigate the influence of applied load, operation time and SiC wt. % on wear behaviour. The wear resistance was analyzed by selecting “smaller is better” characteristics as the objective of the model. From this research, it is observed that experiment time and SiC wt % have the most significant effect on the wear performance followed by the applied load.
Robot Trajectories Comparison: A Statistical Approach

PubMed Central

Ansuategui, A.; Arruti, A.; Susperregi, L.; Yurramendi, Y.; Jauregi, E.; Lazkano, E.; Sierra, B.

2014-01-01

The task of planning a collision-free trajectory from a start to a goal position is fundamental for an autonomous mobile robot. Although path planning has been extensively investigated since the beginning of robotics, there is no agreement on how to measure the performance of a motion algorithm. This paper presents a new approach to perform robot trajectories comparison that could be applied to any kind of trajectories and in both simulated and real environments. Given an initial set of features, it automatically selects the most significant ones and performs a statistical comparison using them. Additionally, a graphical data visualization named polygraph which helps to better understand the obtained results is provided. The proposed method has been applied, as an example, to compare two different motion planners, FM2 and WaveFront, using different environments, robots, and local planners. PMID:25525618
Statistical mechanics of competitive resource allocation using agent-based models

NASA Astrophysics Data System (ADS)

Chakraborti, Anirban; Challet, Damien; Chatterjee, Arnab; Marsili, Matteo; Zhang, Yi-Cheng; Chakrabarti, Bikas K.

2015-01-01

Demand outstrips available resources in most situations, which gives rise to competition, interaction and learning. In this article, we review a broad spectrum of multi-agent models of competition (El Farol Bar problem, Minority Game, Kolkata Paise Restaurant problem, Stable marriage problem, Parking space problem and others) and the methods used to understand them analytically. We emphasize the power of concepts and tools from statistical mechanics to understand and explain fully collective phenomena such as phase transitions and long memory, and the mapping between agent heterogeneity and physical disorder. As these methods can be applied to any large-scale model of competitive resource allocation made up of heterogeneous adaptive agent with non-linear interaction, they provide a prospective unifying paradigm for many scientific disciplines.
Global statistics of liquid water content and effective number concentration of water clouds over ocean derived from combined CALIPSO and MODIS measurements

NASA Astrophysics Data System (ADS)

Hu, Y.; Vaughan, M.; McClain, C.; Behrenfeld, M.; Maring, H.; Anderson, D.; Sun-Mack, S.; Flittner, D.; Huang, J.; Wielicki, B.; Minnis, P.; Weimer, C.; Trepte, C.; Kuehn, R.

2007-06-01

This study presents an empirical relation that links the volume extinction coefficients of water clouds, the layer integrated depolarization ratios measured by lidar, and the effective radii of water clouds derived from collocated passive sensor observations. Based on Monte Carlo simulations of CALIPSO lidar observations, this method combines the cloud effective radius reported by MODIS with the lidar depolarization ratios measured by CALIPSO to estimate both the liquid water content and the effective number concentration of water clouds. The method is applied to collocated CALIPSO and MODIS measurements obtained during July and October of 2006, and January 2007. Global statistics of the cloud liquid water content and effective number concentration are presented.
Tree Colors: Color Schemes for Tree-Structured Data.

PubMed

Tennekes, Martijn; de Jonge, Edwin

2014-12-01

We present a method to map tree structures to colors from the Hue-Chroma-Luminance color model, which is known for its well balanced perceptual properties. The Tree Colors method can be tuned with several parameters, whose effect on the resulting color schemes is discussed in detail. We provide a free and open source implementation with sensible parameter defaults. Categorical data are very common in statistical graphics, and often these categories form a classification tree. We evaluate applying Tree Colors to tree structured data with a survey on a large group of users from a national statistical institute. Our user study suggests that Tree Colors are useful, not only for improving node-link diagrams, but also for unveiling tree structure in non-hierarchical visualizations.
Introduction of statistical information in a syntactic analyzer for document image recognition

NASA Astrophysics Data System (ADS)

Maroneze, André O.; Coüasnon, Bertrand; Lemaitre, Aurélie

2011-01-01

This paper presents an improvement to document layout analysis systems, offering a possible solution to Sayre's paradox (which states that an element "must be recognized before it can be segmented; and it must be segmented before it can be recognized"). This improvement, based on stochastic parsing, allows integration of statistical information, obtained from recognizers, during syntactic layout analysis. We present how this fusion of numeric and symbolic information in a feedback loop can be applied to syntactic methods to improve document description expressiveness. To limit combinatorial explosion during exploration of solutions, we devised an operator that allows optional activation of the stochastic parsing mechanism. Our evaluation on 1250 handwritten business letters shows this method allows the improvement of global recognition scores.
Daily Reportable Disease Spatiotemporal Cluster Detection, New York City, New York, USA, 2014-2015.

PubMed

Greene, Sharon K; Peterson, Eric R; Kapell, Deborah; Fine, Annie D; Kulldorff, Martin

2016-10-01

Each day, the New York City Department of Health and Mental Hygiene uses the free SaTScan software to apply prospective space-time permutation scan statistics to strengthen early outbreak detection for 35 reportable diseases. This method prompted early detection of outbreaks of community-acquired legionellosis and shigellosis.

A Survey of Popular R Packages for Cluster Analysis

ERIC Educational Resources Information Center

Flynt, Abby; Dean, Nema

2016-01-01

Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring data sets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans, and hclust functions; the mclust library; the poLCA…
Return to Our Roots: Raising Radishes to Teach Experimental Design. Methods and Techniques.

ERIC Educational Resources Information Center

Stallings, William M.

1993-01-01

Reviews research in teaching applied statistics. Concludes that students should analyze data from studies they have designed and conducted. Describes an activity in which students study germination and growth of radish seeds. Includes a table providing student instructions for both the experimental procedure and data analysis. (CFR)
Reliability Estimation for Aggregated Data: Applications for Organizational Research.

ERIC Educational Resources Information Center

Hart, Roland J.; Bradshaw, Stephen C.

This report provides the statistical tools necessary to measure the extent of error that exists in organizational record data and group survey data. It is felt that traditional methods of measuring error are inappropriate or incomplete when applied to organizational groups, especially in studies of organizational change when the same variables are…
Principal Component Analysis: Resources for an Essential Application of Linear Algebra

ERIC Educational Resources Information Center

Pankavich, Stephen; Swanson, Rebecca

2015-01-01

Principal Component Analysis (PCA) is a highly useful topic within an introductory Linear Algebra course, especially since it can be used to incorporate a number of applied projects. This method represents an essential application and extension of the Spectral Theorem and is commonly used within a variety of fields, including statistics,…
10 CFR 431.445 - Determination of small electric motor efficiency.

Code of Federal Regulations, 2010 CFR

2010-01-01

... determined either by testing in accordance with § 431.444 of this subpart, or by application of an... method. An AEDM applied to a basic model must be: (i) Derived from a mathematical model that represents... statistical analysis, computer simulation or modeling, or other analytic evaluation of performance data. (3...
New Tools for "New" History: Computers and the Teaching of Quantitative Historical Methods.

ERIC Educational Resources Information Center

Burton, Orville Vernon; Finnegan, Terence

1989-01-01

Explains the development of an instructional software package and accompanying workbook which teaches students to apply computerized statistical analysis to historical data, improving the study of social history. Concludes that the use of microcomputers and supercomputers to manipulate historical data enhances critical thinking skills and the use…
Publication Bias in Meta-Analyses of the Efficacy of Psychotherapeutic Interventions for Depression

ERIC Educational Resources Information Center

Niemeyer, Helen; Musch, Jochen; Pietrowsky, Reinhard

2013-01-01

Objective: The aim of this study was to assess whether systematic reviews investigating psychotherapeutic interventions for depression are affected by publication bias. Only homogeneous data sets were included, as heterogeneous data sets can distort statistical tests of publication bias. Method: We applied Begg and Mazumdar's adjusted rank…
75 FR 47592 - Final Test Guideline; Product Performance of Skin-applied Insect Repellents of Insect and Other...

Federal Register 2010, 2011, 2012, 2013, 2014

2010-08-06

... considerations affecting the design and conduct of repellent studies when human subjects are involved. Any... recommendations for the design and execution of studies to evaluate the performance of pesticide products intended... recommends appropriate study designs and methods for selecting subjects, statistical analysis, and reporting...
A better way to evaluate remote monitoring programs in chronic disease care: receiver operating characteristic analysis.

PubMed

Brown Connolly, Nancy E

2014-12-01

This foundational study applies the process of receiver operating characteristic (ROC) analysis to evaluate utility and predictive value of a disease management (DM) model that uses RM devices for chronic obstructive pulmonary disease (COPD). The literature identifies a need for a more rigorous method to validate and quantify evidence-based value for remote monitoring (RM) systems being used to monitor persons with a chronic disease. ROC analysis is an engineering approach widely applied in medical testing, but that has not been evaluated for its utility in RM. Classifiers (saturated peripheral oxygen [SPO2], blood pressure [BP], and pulse), optimum threshold, and predictive accuracy are evaluated based on patient outcomes. Parametric and nonparametric methods were used. Event-based patient outcomes included inpatient hospitalization, accident and emergency, and home health visits. Statistical analysis tools included Microsoft (Redmond, WA) Excel(®) and MedCalc(®) (MedCalc Software, Ostend, Belgium) version 12 © 1993-2013 to generate ROC curves and statistics. Persons with COPD were monitored a minimum of 183 days, with at least one inpatient hospitalization within 12 months prior to monitoring. Retrospective, de-identified patient data from a United Kingdom National Health System COPD program were used. Datasets included biometric readings, alerts, and resource utilization. SPO2 was identified as a predictive classifier, with an optimal average threshold setting of 85-86%. BP and pulse were failed classifiers, and areas of design were identified that may improve utility and predictive capacity. Cost avoidance methodology was developed. RESULTS can be applied to health services planning decisions. Methods can be applied to system design and evaluation based on patient outcomes. This study validated the use of ROC in RM program evaluation.
Correlation and agreement: overview and clarification of competing concepts and measures.

PubMed

Liu, Jinyuan; Tang, Wan; Chen, Guanqin; Lu, Yin; Feng, Changyong; Tu, Xin M

2016-04-25

Agreement and correlation are widely-used concepts that assess the association between variables. Although similar and related, they represent completely different notions of association. Assessing agreement between variables assumes that the variables measure the same construct, while correlation of variables can be assessed for variables that measure completely different constructs. This conceptual difference requires the use of different statistical methods, and when assessing agreement or correlation, the statistical method may vary depending on the distribution of the data and the interest of the investigator. For example, the Pearson correlation, a popular measure of correlation between continuous variables, is only informative when applied to variables that have linear relationships; it may be non-informative or even misleading when applied to variables that are not linearly related. Likewise, the intraclass correlation, a popular measure of agreement between continuous variables, may not provide sufficient information for investigators if the nature of poor agreement is of interest. This report reviews the concepts of agreement and correlation and discusses differences in the application of several commonly used measures.
3D Texture Analysis in Renal Cell Carcinoma Tissue Image Grading

PubMed Central

Cho, Nam-Hoon; Choi, Heung-Kook

2014-01-01

One of the most significant processes in cancer cell and tissue image analysis is the efficient extraction of features for grading purposes. This research applied two types of three-dimensional texture analysis methods to the extraction of feature values from renal cell carcinoma tissue images, and then evaluated the validity of the methods statistically through grade classification. First, we used a confocal laser scanning microscope to obtain image slices of four grades of renal cell carcinoma, which were then reconstructed into 3D volumes. Next, we extracted quantitative values using a 3D gray level cooccurrence matrix (GLCM) and a 3D wavelet based on two types of basis functions. To evaluate their validity, we predefined 6 different statistical classifiers and applied these to the extracted feature sets. In the grade classification results, 3D Haar wavelet texture features combined with principal component analysis showed the best discrimination results. Classification using 3D wavelet texture features was significantly better than 3D GLCM, suggesting that the former has potential for use in a computer-based grading system. PMID:25371701
Analysis of swarm behaviors based on an inversion of the fluctuation theorem.

PubMed

Hamann, Heiko; Schmickl, Thomas; Crailsheim, Karl

2014-01-01

A grand challenge in the field of artificial life is to find a general theory of emergent self-organizing systems. In swarm systems most of the observed complexity is based on motion of simple entities. Similarly, statistical mechanics focuses on collective properties induced by the motion of many interacting particles. In this article we apply methods from statistical mechanics to swarm systems. We try to explain the emergent behavior of a simulated swarm by applying methods based on the fluctuation theorem. Empirical results indicate that swarms are able to produce negative entropy within an isolated subsystem due to frozen accidents. Individuals of a swarm are able to locally detect fluctuations of the global entropy measure and store them, if they are negative entropy productions. By accumulating these stored fluctuations over time the swarm as a whole is producing negative entropy and the system ends up in an ordered state. We claim that this indicates the existence of an inverted fluctuation theorem for emergent self-organizing dissipative systems. This approach bears the potential of general applicability.
Scaling of plane-wave functions in statistically optimized near-field acoustic holography.

PubMed

Hald, Jørgen

2014-11-01

Statistically Optimized Near-field Acoustic Holography (SONAH) is a Patch Holography method, meaning that it can be applied in cases where the measurement area covers only part of the source surface. The method performs projections directly in the spatial domain, avoiding the use of spatial discrete Fourier transforms and the associated errors. First, an inverse problem is solved using regularization. For each calculation point a multiplication must then be performed with two transfer vectors--one to get the sound pressure and the other to get the particle velocity. Considering SONAH based on sound pressure measurements, existing derivations consider only pressure reconstruction when setting up the inverse problem, so the evanescent wave amplification associated with the calculation of particle velocity is not taken into account in the regularized solution of the inverse problem. The present paper introduces a scaling of the applied plane wave functions that takes the amplification into account, and it is shown that the previously published virtual source-plane retraction has almost the same effect. The effectiveness of the different solutions is verified through a set of simulated measurements.
Spectroflurimetric estimation of the new antiviral agent ledipasvir in presence of sofosbuvir

NASA Astrophysics Data System (ADS)

Salama, Fathy M.; Attia, Khalid A.; Abouserie, Ahmed A.; El-Olemy, Ahmed; Abolmagd, Ebrahim

2018-02-01

A spectroflurimetric method has been developed and validated for the selective quantitative determination of ledipasvir in presence of sofosbuvir. In this method the native fluorescence of ledipasvir in ethanol at 405 nm was measured after excitation at 340 nm. The proposed method was validated according to ICH guidelines and show high sensitivity, accuracy and precision. Furthermore this method was successfully applied to the analysis of ledipasvir in pharmaceutical dosage form without interference from sofosbuvir and other additives and the results were statistically compared to a reported method and found no significant difference.
A method for determining electrophoretic and electroosmotic mobilities using AC and DC electric field particle displacements.

PubMed

Oddy, M H; Santiago, J G

2004-01-01

We have developed a method for measuring the electrophoretic mobility of submicrometer, fluorescently labeled particles and the electroosmotic mobility of a microchannel. We derive explicit expressions for the unknown electrophoretic and the electroosmotic mobilities as a function of particle displacements resulting from alternating current (AC) and direct current (DC) applied electric fields. Images of particle displacements are captured using an epifluorescent microscope and a CCD camera. A custom image-processing code was developed to determine image streak lengths associated with AC measurements, and a custom particle tracking velocimetry (PTV) code was devised to determine DC particle displacements. Statistical analysis was applied to relate mobility estimates to measured particle displacement distributions.
Application of Savitzky-Golay differentiation filters and Fourier functions to simultaneous determination of cefepime and the co-administered drug, levofloxacin, in spiked human plasma.

PubMed

Abdel-Aziz, Omar; Abdel-Ghany, Maha F; Nagi, Reham; Abdel-Fattah, Laila

2015-03-15

The present work is concerned with simultaneous determination of cefepime (CEF) and the co-administered drug, levofloxacin (LEV), in spiked human plasma by applying a new approach, Savitzky-Golay differentiation filters, and combined trigonometric Fourier functions to their ratio spectra. The different parameters associated with the calculation of Savitzky-Golay and Fourier coefficients were optimized. The proposed methods were validated and applied for determination of the two drugs in laboratory prepared mixtures and spiked human plasma. The results were statistically compared with reported HPLC methods and were found accurate and precise. Copyright © 2014 Elsevier B.V. All rights reserved.
Development of a Self-Rated Mixed Methods Skills Assessment: The NIH Mixed Methods Research Training Program for the Health Sciences

PubMed Central

Guetterman, Timothy C.; Creswell, John W.; Wittink, Marsha; Barg, Fran K.; Castro, Felipe G.; Dahlberg, Britt; Watkins, Daphne C.; Deutsch, Charles; Gallo, Joseph J.

2017-01-01

Introduction Demand for training in mixed methods is high, with little research on faculty development or assessment in mixed methods. We describe the development of a Self-Rated Mixed Methods Skills Assessment and provide validity evidence. The instrument taps six research domains: “Research question,” “Design/approach,” “Sampling,” “Data collection,” “Analysis,” and “Dissemination.” Respondents are asked to rate their ability to define or explain concepts of mixed methods under each domain, their ability to apply the concepts to problems, and the extent to which they need to improve. Methods We administered the questionnaire to 145 faculty and students using an internet survey. We analyzed descriptive statistics and performance characteristics of the questionnaire using Cronbach’s alpha to assess reliability and an ANOVA that compared a mixed methods experience index with assessment scores to assess criterion-relatedness. Results Internal consistency reliability was high for the total set of items (.95) and adequate (>=.71) for all but one subscale. Consistent with establishing criterion validity, respondents who had more professional experiences with mixed methods (e.g., published a mixed methods paper) rated themselves as more skilled, which was statistically significant across the research domains. Discussion This Self-Rated Mixed Methods Assessment instrument may be a useful tool to assess skills in mixed methods for training programs. It can be applied widely at the graduate and faculty level. For the learner, assessment may lead to enhanced motivation to learn and training focused on self-identified needs. For faculty, the assessment may improve curriculum and course content planning. PMID:28562495
Rtop - an R package for interpolation of data with a variable spatial support - examples from river networks

NASA Astrophysics Data System (ADS)

Olav Skøien, Jon; Laaha, Gregor; Koffler, Daniel; Blöschl, Günter; Pebesma, Edzer; Parajka, Juraj; Viglione, Alberto

2013-04-01

Geostatistical methods have been applied only to a limited extent for spatial interpolation in applications where the observations have an irregular support, such as runoff characteristics or population health data. Several studies have shown the potential of such methods (Gottschalk 1993, Sauquet et al. 2000, Gottschalk et al. 2006, Skøien et al. 2006, Goovaerts 2008), but these developments have so far not led to easily accessible, versatile, easy to apply and open source software. Based on the top-kriging approach suggested by Skøien et al. (2006), we will here present the package rtop, which has been implemented in the statistical environment R (R Core Team 2012). Taking advantage of the existing methods in R for analysis of spatial objects (Bivand et al. 2008), and the extensive possibilities for visualizing the results, rtop makes it easy to apply geostatistical interpolation methods when observations have a non-point spatial support. Although the package is flexible regarding data input, the main application so far has been for interpolation along river networks. We will present some examples showing how the package can easily be used for such interpolation. The model will soon be uploaded to CRAN, but is in the meantime also available from R-forge and can be installed by: > install.packages("rtop", repos="http://R-Forge.R-project.org") Bivand, R.S., Pebesma, E.J. & Gómez-Rubio, V., 2008. Applied spatial data analysis with r: Springer. Goovaerts, P., 2008. Kriging and semivariogram deconvolution in the presence of irregular geographical units. Mathematical Geosciences, 40 (1), 101-128. Gottschalk, L., 1993. Interpolation of runoff applying objective methods. Stochastic Hydrology and Hydraulics, 7, 269-281. Gottschalk, L., Krasovskaia, I., Leblois, E. & Sauquet, E., 2006. Mapping mean and variance of runoff in a river basin. Hydrology and Earth System Sciences, 10, 469-484. R Core Team, 2012. R: A language and environment for statistical computing. Vienna, Austria, ISBN 3-900051-07-0. Sauquet, E., Gottschalk, L. & Leblois, E., 2000. Mapping average annual runoff: A hierarchical approach applying a stochastic interpolation scheme. Hydrological Sciences Journal, 45 (6), 799-815. Skøien, J.O., Merz, R. & Blöschl, G., 2006. Top-kriging - geostatistics on stream networks. Hydrology and Earth System Sciences, 10, 277-287.
Constructing networks from a dynamical system perspective for multivariate nonlinear time series.

PubMed

Nakamura, Tomomichi; Tanizawa, Toshihiro; Small, Michael

2016-03-01

We describe a method for constructing networks for multivariate nonlinear time series. We approach the interaction between the various scalar time series from a deterministic dynamical system perspective and provide a generic and algorithmic test for whether the interaction between two measured time series is statistically significant. The method can be applied even when the data exhibit no obvious qualitative similarity: a situation in which the naive method utilizing the cross correlation function directly cannot correctly identify connectivity. To establish the connectivity between nodes we apply the previously proposed small-shuffle surrogate (SSS) method, which can investigate whether there are correlation structures in short-term variabilities (irregular fluctuations) between two data sets from the viewpoint of deterministic dynamical systems. The procedure to construct networks based on this idea is composed of three steps: (i) each time series is considered as a basic node of a network, (ii) the SSS method is applied to verify the connectivity between each pair of time series taken from the whole multivariate time series, and (iii) the pair of nodes is connected with an undirected edge when the null hypothesis cannot be rejected. The network constructed by the proposed method indicates the intrinsic (essential) connectivity of the elements included in the system or the underlying (assumed) system. The method is demonstrated for numerical data sets generated by known systems and applied to several experimental time series.
Improving binding mode and binding affinity predictions of docking by ligand-based search of protein conformations: evaluation in D3R grand challenge 2015

NASA Astrophysics Data System (ADS)

Xu, Xianjin; Yan, Chengfei; Zou, Xiaoqin

2017-08-01

The growing number of protein-ligand complex structures, particularly the structures of proteins co-bound with different ligands, in the Protein Data Bank helps us tackle two major challenges in molecular docking studies: the protein flexibility and the scoring function. Here, we introduced a systematic strategy by using the information embedded in the known protein-ligand complex structures to improve both binding mode and binding affinity predictions. Specifically, a ligand similarity calculation method was employed to search a receptor structure with a bound ligand sharing high similarity with the query ligand for the docking use. The strategy was applied to the two datasets (HSP90 and MAP4K4) in recent D3R Grand Challenge 2015. In addition, for the HSP90 dataset, a system-specific scoring function (ITScore2_hsp90) was generated by recalibrating our statistical potential-based scoring function (ITScore2) using the known protein-ligand complex structures and the statistical mechanics-based iterative method. For the HSP90 dataset, better performances were achieved for both binding mode and binding affinity predictions comparing with the original ITScore2 and with ensemble docking. For the MAP4K4 dataset, although there were only eight known protein-ligand complex structures, our docking strategy achieved a comparable performance with ensemble docking. Our method for receptor conformational selection and iterative method for the development of system-specific statistical potential-based scoring functions can be easily applied to other protein targets that have a number of protein-ligand complex structures available to improve predictions on binding.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.