statistical modeling tools: Topics by Science.gov

Sample records for statistical modeling tools

Peer Review of EPA's Draft BMDS Document: Exponential ...

EPA Pesticide Factsheets

BMDS is one of the Agency's premier tools for estimating risk assessments, therefore the validity and reliability of its statistical models are of paramount importance. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling. This page provides links to peer review of the BMDS applications and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling.
THE ATMOSPHERIC MODEL EVALUATION TOOL

EPA Science Inventory

This poster describes a model evaluation tool that is currently being developed and applied for meteorological and air quality model evaluation. The poster outlines the framework and provides examples of statistical evaluations that can be performed with the model evaluation tool...
Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II)

DTIC Science & Technology

2014-09-30

Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II) Len Thomas, John Harwood, Catriona Harris, and Robert S... mammals changes over time. This project will develop statistical tools to allow mathematical models of the population consequences of acoustic...disturbance to be fitted to data from marine mammal populations. We will work closely with Phase II of the ONR PCAD Working Group, and will provide
Personalizing oncology treatments by predicting drug efficacy, side-effects, and improved therapy: mathematics, statistics, and their integration.

PubMed

Agur, Zvia; Elishmereni, Moran; Kheifetz, Yuri

2014-01-01

Despite its great promise, personalized oncology still faces many hurdles, and it is increasingly clear that targeted drugs and molecular biomarkers alone yield only modest clinical benefit. One reason is the complex relationships between biomarkers and the patient's response to drugs, obscuring the true weight of the biomarkers in the overall patient's response. This complexity can be disentangled by computational models that integrate the effects of personal biomarkers into a simulator of drug-patient dynamic interactions, for predicting the clinical outcomes. Several computational tools have been developed for personalized oncology, notably evidence-based tools for simulating pharmacokinetics, Bayesian-estimated tools for predicting survival, etc. We describe representative statistical and mathematical tools, and discuss their merits, shortcomings and preliminary clinical validation attesting to their potential. Yet, the individualization power of mathematical models alone, or statistical models alone, is limited. More accurate and versatile personalization tools can be constructed by a new application of the statistical/mathematical nonlinear mixed effects modeling (NLMEM) approach, which until recently has been used only in drug development. Using these advanced tools, clinical data from patient populations can be integrated with mechanistic models of disease and physiology, for generating personal mathematical models. Upon a more substantial validation in the clinic, this approach will hopefully be applied in personalized clinical trials, P-trials, hence aiding the establishment of personalized medicine within the main stream of clinical oncology. © 2014 Wiley Periodicals, Inc.
An integrated user-friendly ArcMAP tool for bivariate statistical modeling in geoscience applications

NASA Astrophysics Data System (ADS)

Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusof, Z.; Tehrany, M. S.

2014-10-01

Modeling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modeling. Bivariate statistical analysis (BSA) assists in hazard modeling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, BSM (bivariate statistical modeler), for BSA technique is proposed. Three popular BSA techniques such as frequency ratio, weights-of-evidence, and evidential belief function models are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and is created by a simple graphical user interface, which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.
An integrated user-friendly ArcMAP tool for bivariate statistical modelling in geoscience applications

NASA Astrophysics Data System (ADS)

Jebur, M. N.; Pradhan, B.; Shafri, H. Z. M.; Yusoff, Z. M.; Tehrany, M. S.

2015-03-01

Modelling and classification difficulties are fundamental issues in natural hazard assessment. A geographic information system (GIS) is a domain that requires users to use various tools to perform different types of spatial modelling. Bivariate statistical analysis (BSA) assists in hazard modelling. To perform this analysis, several calculations are required and the user has to transfer data from one format to another. Most researchers perform these calculations manually by using Microsoft Excel or other programs. This process is time-consuming and carries a degree of uncertainty. The lack of proper tools to implement BSA in a GIS environment prompted this study. In this paper, a user-friendly tool, bivariate statistical modeler (BSM), for BSA technique is proposed. Three popular BSA techniques, such as frequency ratio, weight-of-evidence (WoE), and evidential belief function (EBF) models, are applied in the newly proposed ArcMAP tool. This tool is programmed in Python and created by a simple graphical user interface (GUI), which facilitates the improvement of model performance. The proposed tool implements BSA automatically, thus allowing numerous variables to be examined. To validate the capability and accuracy of this program, a pilot test area in Malaysia is selected and all three models are tested by using the proposed program. Area under curve (AUC) is used to measure the success rate and prediction rate. Results demonstrate that the proposed program executes BSA with reasonable accuracy. The proposed BSA tool can be used in numerous applications, such as natural hazard, mineral potential, hydrological, and other engineering and environmental applications.
Bayesian models based on test statistics for multiple hypothesis testing problems.

PubMed

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

NASA Astrophysics Data System (ADS)

El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

2006-11-01

Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers.

PubMed

Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad

2016-04-27

Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.
Modelling of peak temperature during friction stir processing of magnesium alloy AZ91

NASA Astrophysics Data System (ADS)

Vaira Vignesh, R.; Padmanaban, R.

2018-02-01

Friction stir processing (FSP) is a solid state processing technique with potential to modify the properties of the material through microstructural modification. The study of heat transfer in FSP aids in the identification of defects like flash, inadequate heat input, poor material flow and mixing etc. In this paper, transient temperature distribution during FSP of magnesium alloy AZ91 was simulated using finite element modelling. The numerical model results were validated using the experimental results from the published literature. The model was used to predict the peak temperature obtained during FSP for various process parameter combinations. The simulated peak temperature results were used to develop a statistical model. The effect of process parameters namely tool rotation speed, tool traverse speed and shoulder diameter of the tool on the peak temperature was investigated using the developed statistical model. It was found that peak temperature was directly proportional to tool rotation speed and shoulder diameter and inversely proportional to tool traverse speed.
Peer Review Documents Related to the Evaluation of ...

EPA Pesticide Factsheets

BMDS is one of the Agency's premier tools for estimating risk assessments, therefore the validity and reliability of its statistical models are of paramount importance. This page provides links to peer review and expert summaries of the BMDS application and its models as they were developed and eventually released documenting the rigorous review process taken to provide the best science tools available for statistical modeling. This page provides links to peer reviews and expert summaries of the BMDS applications and its models as they were developed and eventually released.
Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

USGS Publications Warehouse

Lee, L.; Helsel, D.

2005-01-01

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Children's Services Statistical Neighbour Benchmarking Tool. Practitioner User Guide

ERIC Educational Resources Information Center

National Foundation for Educational Research, 2007

2007-01-01

Statistical neighbour models provide one method for benchmarking progress. For each local authority (LA), these models designate a number of other LAs deemed to have similar characteristics. These designated LAs are known as statistical neighbours. Any LA may compare its performance (as measured by various indicators) against its statistical…
SEPEM: A tool for statistical modeling the solar energetic particle environment

NASA Astrophysics Data System (ADS)

Crosby, Norma; Heynderickx, Daniel; Jiggens, Piers; Aran, Angels; Sanahuja, Blai; Truscott, Pete; Lei, Fan; Jacobs, Carla; Poedts, Stefaan; Gabriel, Stephen; Sandberg, Ingmar; Glover, Alexi; Hilgers, Alain

2015-07-01

Solar energetic particle (SEP) events are a serious radiation hazard for spacecraft as well as a severe health risk to humans traveling in space. Indeed, accurate modeling of the SEP environment constitutes a priority requirement for astrophysics and solar system missions and for human exploration in space. The European Space Agency's Solar Energetic Particle Environment Modelling (SEPEM) application server is a World Wide Web interface to a complete set of cross-calibrated data ranging from 1973 to 2013 as well as new SEP engineering models and tools. Both statistical and physical modeling techniques have been included, in order to cover the environment not only at 1 AU but also in the inner heliosphere ranging from 0.2 AU to 1.6 AU using a newly developed physics-based shock-and-particle model to simulate particle flux profiles of gradual SEP events. With SEPEM, SEP peak flux and integrated fluence statistics can be studied, as well as durations of high SEP flux periods. Furthermore, effects tools are also included to allow calculation of single event upset rate and radiation doses for a variety of engineering scenarios.
The Importance of Statistical Modeling in Data Analysis and Inference

ERIC Educational Resources Information Center

Rollins, Derrick, Sr.

2017-01-01

Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Constructing and Modifying Sequence Statistics for relevent Using informR in 𝖱

PubMed Central

Marcum, Christopher Steven; Butts, Carter T.

2015-01-01

The informR package greatly simplifies the analysis of complex event histories in 𝖱 by providing user friendly tools to build sufficient statistics for the relevent package. Historically, building sufficient statistics to model event sequences (of the form a→b) using the egocentric generalization of Butts’ (2008) relational event framework for modeling social action has been cumbersome. The informR package simplifies the construction of the complex list of arrays needed by the rem() model fitting for a variety of cases involving egocentric event data, multiple event types, and/or support constraints. This paper introduces these tools using examples from real data extracted from the American Time Use Survey. PMID:26185488
Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

PubMed

Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

2016-01-22

Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.
PSSMSearch: a server for modeling, visualization, proteome-wide discovery and annotation of protein motif specificity determinants.

PubMed

Krystkowiak, Izabella; Manguy, Jean; Davey, Norman E

2018-06-05

There is a pressing need for in silico tools that can aid in the identification of the complete repertoire of protein binding (SLiMs, MoRFs, miniMotifs) and modification (moiety attachment/removal, isomerization, cleavage) motifs. We have created PSSMSearch, an interactive web-based tool for rapid statistical modeling, visualization, discovery and annotation of protein motif specificity determinants to discover novel motifs in a proteome-wide manner. PSSMSearch analyses proteomes for regions with significant similarity to a motif specificity determinant model built from a set of aligned motif-containing peptides. Multiple scoring methods are available to build a position-specific scoring matrix (PSSM) describing the motif specificity determinant model. This model can then be modified by a user to add prior knowledge of specificity determinants through an interactive PSSM heatmap. PSSMSearch includes a statistical framework to calculate the significance of specificity determinant model matches against a proteome of interest. PSSMSearch also includes the SLiMSearch framework's annotation, motif functional analysis and filtering tools to highlight relevant discriminatory information. Additional tools to annotate statistically significant shared keywords and GO terms, or experimental evidence of interaction with a motif-recognizing protein have been added. Finally, PSSM-based conservation metrics have been created for taxonomic range analyses. The PSSMSearch web server is available at http://slim.ucd.ie/pssmsearch/.
Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

NASA Astrophysics Data System (ADS)

Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

2010-10-01

This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.
Two Paradoxes in Linear Regression Analysis.

PubMed

Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

2016-12-25

Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.

Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

ERIC Educational Resources Information Center

Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

2006-01-01

We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…
SCOUT: A Fast Monte-Carlo Modeling Tool of Scintillation Camera Output

PubMed Central

Hunter, William C. J.; Barrett, Harrison H.; Lewellen, Thomas K.; Miyaoka, Robert S.; Muzi, John P.; Li, Xiaoli; McDougald, Wendy; MacDonald, Lawrence R.

2011-01-01

We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:22072297
SCOUT: a fast Monte-Carlo modeling tool of scintillation camera output†

PubMed Central

Hunter, William C J; Barrett, Harrison H.; Muzi, John P.; McDougald, Wendy; MacDonald, Lawrence R.; Miyaoka, Robert S.; Lewellen, Thomas K.

2013-01-01

We have developed a Monte-Carlo photon-tracking and readout simulator called SCOUT to study the stochastic behavior of signals output from a simplified rectangular scintillation-camera design. SCOUT models the salient processes affecting signal generation, transport, and readout of a scintillation camera. Presently, we compare output signal statistics from SCOUT to experimental results for both a discrete and a monolithic camera. We also benchmark the speed of this simulation tool and compare it to existing simulation tools. We find this modeling tool to be relatively fast and predictive of experimental results. Depending on the modeled camera geometry, we found SCOUT to be 4 to 140 times faster than other modeling tools. PMID:23640136
Comparison between two statistically based methods, and two physically based models developed to compute daily mean streamflow at ungaged locations in the Cedar River Basin, Iowa

USGS Publications Warehouse

Linhart, S. Mike; Nania, Jon F.; Christiansen, Daniel E.; Hutchinson, Kasey J.; Sanders, Curtis L.; Archfield, Stacey A.

2013-01-01

A variety of individuals from water resource managers to recreational users need streamflow information for planning and decisionmaking at locations where there are no streamgages. To address this problem, two statistically based methods, the Flow Duration Curve Transfer method and the Flow Anywhere method, were developed for statewide application and the two physically based models, the Precipitation Runoff Modeling-System and the Soil and Water Assessment Tool, were only developed for application for the Cedar River Basin. Observed and estimated streamflows for the two methods and models were compared for goodness of fit at 13 streamgages modeled in the Cedar River Basin by using the Nash-Sutcliffe and the percent-bias efficiency values. Based on median and mean Nash-Sutcliffe values for the 13 streamgages the Precipitation Runoff Modeling-System and Soil and Water Assessment Tool models appear to have performed similarly and better than Flow Duration Curve Transfer and Flow Anywhere methods. Based on median and mean percent bias values, the Soil and Water Assessment Tool model appears to have generally overestimated daily mean streamflows, whereas the Precipitation Runoff Modeling-System model and statistical methods appear to have underestimated daily mean streamflows. The Flow Duration Curve Transfer method produced the lowest median and mean percent bias values and appears to perform better than the other models.
Cure Models as a Useful Statistical Tool for Analyzing Survival

PubMed Central

Othus, Megan; Barlogie, Bart; LeBlanc, Michael L.; Crowley, John J.

2013-01-01

Cure models are a popular topic within statistical literature but are not as widely known in the clinical literature. Many patients with cancer can be long-term survivors of their disease, and cure models can be a useful tool to analyze and describe cancer survival data. The goal of this article is to review what a cure model is, explain when cure models can be used, and use cure models to describe multiple myeloma survival trends. Multiple myeloma is generally considered an incurable disease, and this article shows that by using cure models, rather than the standard Cox proportional hazards model, we can evaluate whether there is evidence that therapies at the University of Arkansas for Medical Sciences induce a proportion of patients to be long-term survivors. PMID:22675175
Tree injury and mortality in fires: developing process-based models

Treesearch

Bret W. Butler; Matthew B. Dickinson

2010-01-01

Wildland fire managers are often required to predict tree injury and mortality when planning a prescribed burn or when considering wildfire management options; and, currently, statistical models based on post-fire observations are the only tools available for this purpose. Implicit in the derivation of statistical models is the assumption that they are strictly...
Two Paradoxes in Linear Regression Analysis

PubMed Central

FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

2016-01-01

Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Bayesian Posterior Odds Ratios: Statistical Tools for Collaborative Evaluations

ERIC Educational Resources Information Center

Hicks, Tyler; Rodríguez-Campos, Liliana; Choi, Jeong Hoon

2018-01-01

To begin statistical analysis, Bayesians quantify their confidence in modeling hypotheses with priors. A prior describes the probability of a certain modeling hypothesis apart from the data. Bayesians should be able to defend their choice of prior to a skeptical audience. Collaboration between evaluators and stakeholders could make their choices…
Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation

NASA Technical Reports Server (NTRS)

DePriest, Douglas; Morgan, Carolyn

2003-01-01

The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.
Current state of the art for statistical modeling of species distributions [Chapter 16

Treesearch

Troy M. Hegel; Samuel A. Cushman; Jeffrey Evans; Falk Huettmann

2010-01-01

Over the past decade the number of statistical modelling tools available to ecologists to model species' distributions has increased at a rapid pace (e.g. Elith et al. 2006; Austin 2007), as have the number of species distribution models (SDM) published in the literature (e.g. Scott et al. 2002). Ten years ago, basic logistic regression (Hosmer and Lemeshow 2000)...
Development of the AFRL Aircrew Perfomance and Protection Data Bank

DTIC Science & Technology

2007-12-01

Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It
Pre-Service Mathematics Teachers' Use of Probability Models in Making Informal Inferences about a Chance Game

ERIC Educational Resources Information Center

Kazak, Sibel; Pratt, Dave

2017-01-01

This study considers probability models as tools for both making informal statistical inferences and building stronger conceptual connections between data and chance topics in teaching statistics. In this paper, we aim to explore pre-service mathematics teachers' use of probability models for a chance game, where the sum of two dice matters in…
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

PubMed

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

2017-01-01

Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).
ADAPTATION OF THE ADVANCED STATISTICAL TRAJECTORY REGIONAL AIR POLLUTION (ASTRAP) MODEL TO THE EPA VAX COMPUTER - MODIFICATIONS AND TESTING

EPA Science Inventory

The Advanced Statistical Trajectory Regional Air Pollution (ASTRAP) model simulates long-term transport and deposition of oxides of and nitrogen. t is a potential screening tool for assessing long-term effects on regional visibility from sulfur emission sources. owever, a rigorou...
GPCC - A weather generator-based statistical downscaling tool for site-specific assessment of climate change impacts

USDA-ARS?s Scientific Manuscript database

Resolution of climate model outputs are too coarse to be used as direct inputs to impact models for assessing climate change impacts on agricultural production, water resources, and eco-system services at local or site-specific scales. Statistical downscaling approaches are usually used to bridge th...
Rasch Model Based Analysis of the Force Concept Inventory

ERIC Educational Resources Information Center

Planinic, Maja; Ivanjek, Lana; Susac, Ana

2010-01-01

The Force Concept Inventory (FCI) is an important diagnostic instrument which is widely used in the field of physics education research. It is therefore very important to evaluate and monitor its functioning using different tools for statistical analysis. One of such tools is the stochastic Rasch model, which enables construction of linear…
Open Source Tools for Seismicity Analysis

NASA Astrophysics Data System (ADS)

Powers, P.

2010-12-01

The spatio-temporal analysis of seismicity plays an important role in earthquake forecasting and is integral to research on earthquake interactions and triggering. For instance, the third version of the Uniform California Earthquake Rupture Forecast (UCERF), currently under development, will use Epidemic Type Aftershock Sequences (ETAS) as a model for earthquake triggering. UCERF will be a "living" model and therefore requires robust, tested, and well-documented ETAS algorithms to ensure transparency and reproducibility. Likewise, as earthquake aftershock sequences unfold, real-time access to high quality hypocenter data makes it possible to monitor the temporal variability of statistical properties such as the parameters of the Omori Law and the Gutenberg Richter b-value. Such statistical properties are valuable as they provide a measure of how much a particular sequence deviates from expected behavior and can be used when assigning probabilities of aftershock occurrence. To address these demands and provide public access to standard methods employed in statistical seismology, we present well-documented, open-source JavaScript and Java software libraries for the on- and off-line analysis of seismicity. The Javascript classes facilitate web-based asynchronous access to earthquake catalog data and provide a framework for in-browser display, analysis, and manipulation of catalog statistics; implementations of this framework will be made available on the USGS Earthquake Hazards website. The Java classes, in addition to providing tools for seismicity analysis, provide tools for modeling seismicity and generating synthetic catalogs. These tools are extensible and will be released as part of the open-source OpenSHA Commons library.
Integrated Wind Power Planning Tool

NASA Astrophysics Data System (ADS)

Rosgaard, M. H.; Giebel, G.; Nielsen, T. S.; Hahmann, A.; Sørensen, P.; Madsen, H.

2012-04-01

This poster presents the current state of the public service obligation (PSO) funded project PSO 10464, with the working title "Integrated Wind Power Planning Tool". The project commenced October 1, 2011, and the goal is to integrate a numerical weather prediction (NWP) model with purely statistical tools in order to assess wind power fluctuations, with focus on long term power system planning for future wind farms as well as short term forecasting for existing wind farms. Currently, wind power fluctuation models are either purely statistical or integrated with NWP models of limited resolution. With regard to the latter, one such simulation tool has been developed at the Wind Energy Division, Risø DTU, intended for long term power system planning. As part of the PSO project the inferior NWP model used at present will be replaced by the state-of-the-art Weather Research & Forecasting (WRF) model. Furthermore, the integrated simulation tool will be improved so it can handle simultaneously 10-50 times more turbines than the present ~ 300, as well as additional atmospheric parameters will be included in the model. The WRF data will also be input for a statistical short term prediction model to be developed in collaboration with ENFOR A/S; a danish company that specialises in forecasting and optimisation for the energy sector. This integrated prediction model will allow for the description of the expected variability in wind power production in the coming hours to days, accounting for its spatio-temporal dependencies, and depending on the prevailing weather conditions defined by the WRF output. The output from the integrated prediction tool constitute scenario forecasts for the coming period, which can then be fed into any type of system model or decision making problem to be solved. The high resolution of the WRF results loaded into the integrated prediction model will ensure a high accuracy data basis is available for use in the decision making process of the Danish transmission system operator, and the need for high accuracy predictions will only increase over the next decade as Denmark approaches the goal of 50% wind power based electricity in 2020, from the current 20%.
Probability of Detection (POD) as a statistical model for the validation of qualitative methods.

PubMed

Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T

2011-01-01

A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.
Statistical error model for a solar electric propulsion thrust subsystem

NASA Technical Reports Server (NTRS)

Bantell, M. H.

1973-01-01

The solar electric propulsion thrust subsystem statistical error model was developed as a tool for investigating the effects of thrust subsystem parameter uncertainties on navigation accuracy. The model is currently being used to evaluate the impact of electric engine parameter uncertainties on navigation system performance for a baseline mission to Encke's Comet in the 1980s. The data given represent the next generation in statistical error modeling for low-thrust applications. Principal improvements include the representation of thrust uncertainties and random process modeling in terms of random parametric variations in the thrust vector process for a multi-engine configuration.

Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

EPA Science Inventory

This tool consists of two parts: model performance evaluation and scenario analysis (MPESA). The model performance evaluation consists of two components: model performance evaluation metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit m...
RooStatsCms: A tool for analysis modelling, combination and statistical studies

NASA Astrophysics Data System (ADS)

Piparo, D.; Schott, G.; Quast, G.

2010-04-01

RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
Automation of Ocean Product Metrics

DTIC Science & Technology

2008-09-30

Presented in: Ocean Sciences 2008 Conf., 5 Mar 2008. Shriver, J., J. D. Dykes, and J. Fabre: Automation of Operational Ocean Product Metrics. Presented in 2008 EGU General Assembly , 14 April 2008. 9 ...processing (multiple data cuts per day) and multiple-nested models. Routines for generating automated evaluations of model forecast statistics will be...developed and pre-existing tools will be collected to create a generalized tool set, which will include user-interface tools to the metrics data
Integrated Wind Power Planning Tool

NASA Astrophysics Data System (ADS)

Rosgaard, Martin; Giebel, Gregor; Skov Nielsen, Torben; Hahmann, Andrea; Sørensen, Poul; Madsen, Henrik

2013-04-01

This poster presents the current state of the public service obligation (PSO) funded project PSO 10464, with the title "Integrated Wind Power Planning Tool". The goal is to integrate a mesoscale numerical weather prediction (NWP) model with purely statistical tools in order to assess wind power fluctuations, with focus on long term power system planning for future wind farms as well as short term forecasting for existing wind farms. Currently, wind power fluctuation models are either purely statistical or integrated with NWP models of limited resolution. Using the state-of-the-art mesoscale NWP model Weather Research & Forecasting model (WRF) the forecast error is sought quantified in dependence of the time scale involved. This task constitutes a preparative study for later implementation of features accounting for NWP forecast errors in the DTU Wind Energy maintained Corwind code - a long term wind power planning tool. Within the framework of PSO 10464 research related to operational short term wind power prediction will be carried out, including a comparison of forecast quality at different mesoscale NWP model resolutions and development of a statistical wind power prediction tool taking input from WRF. The short term prediction part of the project is carried out in collaboration with ENFOR A/S; a Danish company that specialises in forecasting and optimisation for the energy sector. The integrated prediction model will allow for the description of the expected variability in wind power production in the coming hours to days, accounting for its spatio-temporal dependencies, and depending on the prevailing weather conditions defined by the WRF output. The output from the integrated short term prediction tool constitutes scenario forecasts for the coming period, which can then be fed into any type of system model or decision making problem to be solved. The high resolution of the WRF results loaded into the integrated prediction model will ensure a high accuracy data basis is available for use in the decision making process of the Danish transmission system operator. The need for high accuracy predictions will only increase over the next decade as Denmark approaches the goal of 50% wind power based electricity in 2025 from the current 20%.
Translating statistical species-habitat models to interactive decision support tools

USGS Publications Warehouse

Wszola, Lyndsie S.; Simonsen, Victoria L.; Stuber, Erica F.; Gillespie, Caitlyn R.; Messinger, Lindsey N.; Decker, Karie L.; Lusk, Jeffrey J.; Jorgensen, Christopher F.; Bishop, Andrew A.; Fontaine, Joseph J.

2017-01-01

Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences.
Translating statistical species-habitat models to interactive decision support tools.

PubMed

Wszola, Lyndsie S; Simonsen, Victoria L; Stuber, Erica F; Gillespie, Caitlyn R; Messinger, Lindsey N; Decker, Karie L; Lusk, Jeffrey J; Jorgensen, Christopher F; Bishop, Andrew A; Fontaine, Joseph J

2017-01-01

Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences.
Translating statistical species-habitat models to interactive decision support tools

PubMed Central

Simonsen, Victoria L.; Stuber, Erica F.; Gillespie, Caitlyn R.; Messinger, Lindsey N.; Decker, Karie L.; Lusk, Jeffrey J.; Jorgensen, Christopher F.; Bishop, Andrew A.; Fontaine, Joseph J.

2017-01-01

Understanding species-habitat relationships is vital to successful conservation, but the tools used to communicate species-habitat relationships are often poorly suited to the information needs of conservation practitioners. Here we present a novel method for translating a statistical species-habitat model, a regression analysis relating ring-necked pheasant abundance to landcover, into an interactive online tool. The Pheasant Habitat Simulator combines the analytical power of the R programming environment with the user-friendly Shiny web interface to create an online platform in which wildlife professionals can explore the effects of variation in local landcover on relative pheasant habitat suitability within spatial scales relevant to individual wildlife managers. Our tool allows users to virtually manipulate the landcover composition of a simulated space to explore how changes in landcover may affect pheasant relative habitat suitability, and guides users through the economic tradeoffs of landscape changes. We offer suggestions for development of similar interactive applications and demonstrate their potential as innovative science delivery tools for diverse professional and public audiences. PMID:29236707
A Field Guide to Extra-Tropical Cyclones: Comparing Models to Observations

NASA Astrophysics Data System (ADS)

Bauer, M.

2008-12-01

Climate it is said is the accumulation of weather. And weather is not the concern of climate models. Justification for this latter sentiment has long hidden behind coarse model resolutions and blunt validation tools based on climatological maps and the like. The spatial-temporal resolutions of today's models and observations are converging onto meteorological scales however, which means that with the correct tools we can test the largely unproven assumption that climate model weather is correct enough, or at least lacks perverting biases, such that its accumulation does in fact result in a robust climate prediction. Towards this effort we introduce a new tool for extracting detailed cyclone statistics from climate model output. These include the usual cyclone distribution statistics (maps, histograms), but also adaptive cyclone- centric composites. We have also created a complementary dataset, The MAP Climatology of Mid-latitude Storminess (MCMS), which provides a detailed 6 hourly assessment of the areas under the influence of mid- latitude cyclones based on Reanalysis products. Using this we then extract complimentary composites from sources such as ISCCP and GPCP to create a large comparative dataset for climate model validation. A demonstration of the potential usefulness of these tools will be shown. dime.giss.nasa.gov/mcms/mcms.html
Multilevel Model Prediction

ERIC Educational Resources Information Center

Frees, Edward W.; Kim, Jee-Seon

2006-01-01

Multilevel models are proven tools in social research for modeling complex, hierarchical systems. In multilevel modeling, statistical inference is based largely on quantification of random variables. This paper distinguishes among three types of random variables in multilevel modeling--model disturbances, random coefficients, and future response…
The GenABEL Project for statistical genomics.

PubMed

Karssen, Lennart C; van Duijn, Cornelia M; Aulchenko, Yurii S

2016-01-01

Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the "core team", facilitating agile statistical omics methodology development and fast dissemination.
A Quantitative Comparative Study of Blended and Traditional Models in the Secondary Advanced Placement Statistics Classroom

ERIC Educational Resources Information Center

Owens, Susan T.

2017-01-01

Technology is becoming an integral tool in the classroom and can make a positive impact on how the students learn. This quantitative comparative research study examined gender-based differences among secondary Advanced Placement (AP) Statistic students comparing Educational Testing Service (ETS) College Board AP Statistic examination scores…
A data-based conservation planning tool for Florida panthers

USGS Publications Warehouse

Murrow, Jennifer L.; Thatcher, Cindy A.; Van Manen, Frank T.; Clark, Joseph D.

2013-01-01

Habitat loss and fragmentation are the greatest threats to the endangered Florida panther (Puma concolor coryi). We developed a data-based habitat model and user-friendly interface so that land managers can objectively evaluate Florida panther habitat. We used a geographic information system (GIS) and the Mahalanobis distance statistic (D2) to develop a model based on broad-scale landscape characteristics associated with panther home ranges. Variables in our model were Euclidean distance to natural land cover, road density, distance to major roads, human density, amount of natural land cover, amount of semi-natural land cover, amount of permanent or semi-permanent flooded area–open water, and a cost–distance variable. We then developed a Florida Panther Habitat Estimator tool, which automates and replicates the GIS processes used to apply the statistical habitat model. The estimator can be used by persons with moderate GIS skills to quantify effects of land-use changes on panther habitat at local and landscape scales. Example applications of the tool are presented.
Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

PubMed Central

Fiori, Simone

2007-01-01

Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641
Fall 2014 SEI Research Review Probabilistic Analysis of Time Sensitive Systems

DTIC Science & Technology

2014-10-28

Osmosis SMC Tool Osmosis is a tool for Statistical Model Checking (SMC) with Semantic Importance Sampling. • Input model is written in subset of C...ASSERT() statements in model indicate conditions that must hold. • Input probability distributions defined by the user. • Osmosis returns the...on: – Target relative error, or – Set number of simulations Osmosis Main Algorithm 1 http://dreal.cs.cmu.edu/ (?⃑?): Indicator
Predicting Operator Execution Times Using CogTool

NASA Technical Reports Server (NTRS)

Santiago-Espada, Yamira; Latorella, Kara A.

2013-01-01

Researchers and developers of NextGen systems can use predictive human performance modeling tools as an initial approach to obtain skilled user performance times analytically, before system testing with users. This paper describes the CogTool models for a two pilot crew executing two different types of a datalink clearance acceptance tasks, and on two different simulation platforms. The CogTool time estimates for accepting and executing Required Time of Arrival and Interval Management clearances were compared to empirical data observed in video tapes and registered in simulation files. Results indicate no statistically significant difference between empirical data and the CogTool predictions. A population comparison test found no significant differences between the CogTool estimates and the empirical execution times for any of the four test conditions. We discuss modeling caveats and considerations for applying CogTool to crew performance modeling in advanced cockpit environments.
Probability and Statistics in Sensor Performance Modeling

DTIC Science & Technology

2010-12-01

language software program is called Environmental Awareness for Sensor and Emitter Employment. Some important numerical issues in the implementation...3 Statistical analysis for measuring sensor performance...complementary cumulative distribution function cdf cumulative distribution function DST decision-support tool EASEE Environmental Awareness of
LANDSCAPE ASSESSMENT TOOLS FOR WATERSHED CHARACTERIZATION

EPA Science Inventory

A combination of process-based, empirical and statistical models has been developed to assist states in their efforts to assess water quality, locate impairments over large areas, and calculate TMDL allocations. By synthesizing outputs from a number of these tools, LIPS demonstr...
Using Computational Modeling to Assess the Impact of Clinical Decision Support on Cancer Screening within Community Health Centers

PubMed Central

Carney, Timothy Jay; Morgan, Geoffrey P.; Jones, Josette; McDaniel, Anna M.; Weaver, Michael; Weiner, Bryan; Haggstrom, David A.

2014-01-01

Our conceptual model demonstrates our goal to investigate the impact of clinical decision support (CDS) utilization on cancer screening improvement strategies in the community health care (CHC) setting. We employed a dual modeling technique using both statistical and computational modeling to evaluate impact. Our statistical model used the Spearman’s Rho test to evaluate the strength of relationship between our proximal outcome measures (CDS utilization) against our distal outcome measure (provider self-reported cancer screening improvement). Our computational model relied on network evolution theory and made use of a tool called Construct-TM to model the use of CDS measured by the rate of organizational learning. We employed the use of previously collected survey data from community health centers Cancer Health Disparities Collaborative (HDCC). Our intent is to demonstrate the added valued gained by using a computational modeling tool in conjunction with a statistical analysis when evaluating the impact a health information technology, in the form of CDS, on health care quality process outcomes such as facility-level screening improvement. Significant simulated disparities in organizational learning over time were observed between community health centers beginning the simulation with high and low clinical decision support capability. PMID:24953241
RADSS: an integration of GIS, spatial statistics, and network service for regional data mining

NASA Astrophysics Data System (ADS)

Hu, Haitang; Bao, Shuming; Lin, Hui; Zhu, Qing

2005-10-01

Regional data mining, which aims at the discovery of knowledge about spatial patterns, clusters or association between regions, has widely applications nowadays in social science, such as sociology, economics, epidemiology, crime, and so on. Many applications in the regional or other social sciences are more concerned with the spatial relationship, rather than the precise geographical location. Based on the spatial continuity rule derived from Tobler's first law of geography: observations at two sites tend to be more similar to each other if the sites are close together than if far apart, spatial statistics, as an important means for spatial data mining, allow the users to extract the interesting and useful information like spatial pattern, spatial structure, spatial association, spatial outlier and spatial interaction, from the vast amount of spatial data or non-spatial data. Therefore, by integrating with the spatial statistical methods, the geographical information systems will become more powerful in gaining further insights into the nature of spatial structure of regional system, and help the researchers to be more careful when selecting appropriate models. However, the lack of such tools holds back the application of spatial data analysis techniques and development of new methods and models (e.g., spatio-temporal models). Herein, we make an attempt to develop such an integrated software and apply it into the complex system analysis for the Poyang Lake Basin. This paper presents a framework for integrating GIS, spatial statistics and network service in regional data mining, as well as their implementation. After discussing the spatial statistics methods involved in regional complex system analysis, we introduce RADSS (Regional Analysis and Decision Support System), our new regional data mining tool, by integrating GIS, spatial statistics and network service. RADSS includes the functions of spatial data visualization, exploratory spatial data analysis, and spatial statistics. The tool also includes some fundamental spatial and non-spatial database in regional population and environment, which can be updated by external database via CD or network. Utilizing this data mining and exploratory analytical tool, the users can easily and quickly analyse the huge mount of the interrelated regional data, and better understand the spatial patterns and trends of the regional development, so as to make a credible and scientific decision. Moreover, it can be used as an educational tool for spatial data analysis and environmental studies. In this paper, we also present a case study on Poyang Lake Basin as an application of the tool and spatial data mining in complex environmental studies. At last, several concluding remarks are discussed.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

PubMed

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

A new approach to fracture modelling in reservoirs using deterministic, genetic and statistical models of fracture growth

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rawnsley, K.; Swaby, P.

1996-08-01

It is increasingly acknowledged that in order to understand and forecast the behavior of fracture influenced reservoirs we must attempt to reproduce the fracture system geometry and use this as a basis for fluid flow calculation. This article aims to present a recently developed fracture modelling prototype designed specifically for use in hydrocarbon reservoir environments. The prototype {open_quotes}FRAME{close_quotes} (FRActure Modelling Environment) aims to provide a tool which will allow the generation of realistic 3D fracture systems within a reservoir model, constrained to the known geology of the reservoir by both mechanical and statistical considerations, and which can be used asmore » a basis for fluid flow calculation. Two newly developed modelling techniques are used. The first is an interactive tool which allows complex fault surfaces and their associated deformations to be reproduced. The second is a {open_quotes}genetic{close_quotes} model which grows fracture patterns from seeds using conceptual models of fracture development. The user defines the mechanical input and can retrieve all the statistics of the growing fractures to allow comparison to assumed statistical distributions for the reservoir fractures. Input parameters include growth rate, fracture interaction characteristics, orientation maps and density maps. More traditional statistical stochastic fracture models are also incorporated. FRAME is designed to allow the geologist to input hard or soft data including seismically defined surfaces, well fractures, outcrop models, analogue or numerical mechanical models or geological {open_quotes}feeling{close_quotes}. The geologist is not restricted to {open_quotes}a priori{close_quotes} models of fracture patterns that may not correspond to the data.« less
Vortex dynamics and Lagrangian statistics in a model for active turbulence.

PubMed

James, Martin; Wilczek, Michael

2018-02-14

Cellular suspensions such as dense bacterial flows exhibit a turbulence-like phase under certain conditions. We study this phenomenon of "active turbulence" statistically by using numerical tools. Following Wensink et al. (Proc. Natl. Acad. Sci. U.S.A. 109, 14308 (2012)), we model active turbulence by means of a generalized Navier-Stokes equation. Two-point velocity statistics of active turbulence, both in the Eulerian and the Lagrangian frame, is explored. We characterize the scale-dependent features of two-point statistics in this system. Furthermore, we extend this statistical study with measurements of vortex dynamics in this system. Our observations suggest that the large-scale statistics of active turbulence is close to Gaussian with sub-Gaussian tails.
MyPMFs: a simple tool for creating statistical potentials to assess protein structural models.

PubMed

Postic, Guillaume; Hamelryck, Thomas; Chomilier, Jacques; Stratmann, Dirk

2018-05-29

Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs. Copyright © 2018. Published by Elsevier B.V.
The GenABEL Project for statistical genomics

PubMed Central

Karssen, Lennart C.; van Duijn, Cornelia M.; Aulchenko, Yurii S.

2016-01-01

Development of free/libre open source software is usually done by a community of people with an interest in the tool. For scientific software, however, this is less often the case. Most scientific software is written by only a few authors, often a student working on a thesis. Once the paper describing the tool has been published, the tool is no longer developed further and is left to its own device. Here we describe the broad, multidisciplinary community we formed around a set of tools for statistical genomics. The GenABEL project for statistical omics actively promotes open interdisciplinary development of statistical methodology and its implementation in efficient and user-friendly software under an open source licence. The software tools developed withing the project collectively make up the GenABEL suite, which currently consists of eleven tools. The open framework of the project actively encourages involvement of the community in all stages, from formulation of methodological ideas to application of software to specific data sets. A web forum is used to channel user questions and discussions, further promoting the use of the GenABEL suite. Developer discussions take place on a dedicated mailing list, and development is further supported by robust development practices including use of public version control, code review and continuous integration. Use of this open science model attracts contributions from users and developers outside the “core team”, facilitating agile statistical omics methodology development and fast dissemination. PMID:27347381
ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data

PubMed Central

Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J.; Intarapanich, Apichart; Tongsima, Sissades

2017-01-01

Background Biochemical methods are available for enriching 5′ ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5′ ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. Results We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5′ ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5′ ends than TSSAR. In general, the transcript 5′ ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. Conclusion ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5′ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER). PMID:28542466
Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools 2)

DTIC Science & Technology

2013-09-30

proceedings of a recent conference on The Effects of Noise on Aquatic Life (Schick et al. 2014). In addition to this work, Schick has been working...Lisbon, Portugal (April), the UK National Centre for Statistical Ecology annual workshop (June), and the Effects of Aquatic Noise conference (August). We...A. N. Popper and A. Hawkins, editors. Effects of Noise on Aquatic Life II. Springer. [in press] Fleishman, E., M. Burgman, M. C. Runge, R. S
Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

NASA Astrophysics Data System (ADS)

Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

2017-12-01

Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online

PubMed Central

Posada, David

2006-01-01

ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102
Simple Statistics: - Summarized!

ERIC Educational Resources Information Center

Blai, Boris, Jr.

Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…
Comparative study of coated and uncoated tool inserts with dry machining of EN47 steel using Taguchi L9 optimization technique

NASA Astrophysics Data System (ADS)

Vasu, M.; Shivananda, Nayaka H.

2018-04-01

EN47 steel samples are machined on a self-centered lathe using Chemical Vapor Deposition of coated TiCN/Al2O3/TiN and uncoated tungsten carbide tool inserts, with nose radius 0.8mm. Results are compared with each other and optimized using statistical tool. Input (cutting) parameters that are considered in this work are feed rate (f), cutting speed (Vc), and depth of cut (ap), the optimization criteria are based on the Taguchi (L9) orthogonal array. ANOVA method is adopted to evaluate the statistical significance and also percentage contribution for each model. Multiple response characteristics namely cutting force (Fz), tool tip temperature (T) and surface roughness (Ra) are evaluated. The results discovered that coated tool insert (TiCN/Al2O3/TiN) exhibits 1.27 and 1.29 times better than the uncoated tool insert for tool tip temperature and surface roughness respectively. A slight increase in cutting force was observed for coated tools.
Consequences of Base Time for Redundant Signals Experiments

PubMed Central

Townsend, James T.; Honey, Christopher

2007-01-01

We report analytical and computational investigations into the effects of base time on the diagnosticity of two popular theoretical tools in the redundant signals literature: (1) the race model inequality and (2) the capacity coefficient. We show analytically and without distributional assumptions that the presence of base time decreases the sensitivity of both of these measures to model violations. We further use simulations to investigate the statistical power model selection tools based on the race model inequality, both with and without base time. Base time decreases statistical power, and biases the race model test toward conservatism. The magnitude of this biasing effect increases as we increase the proportion of total reaction time variance contributed by base time. We marshal empirical evidence to suggest that the proportion of reaction time variance contributed by base time is relatively small, and that the effects of base time on the diagnosticity of our model-selection tools are therefore likely to be minor. However, uncertainty remains concerning the magnitude and even the definition of base time. Experimentalists should continue to be alert to situations in which base time may contribute a large proportion of the total reaction time variance. PMID:18670591
Integrated data management for clinical studies: automatic transformation of data models with semantic annotations for principal investigators, data managers and statisticians.

PubMed

Dugas, Martin; Dugas-Breit, Susanne

2014-01-01

Design, execution and analysis of clinical studies involves several stakeholders with different professional backgrounds. Typically, principle investigators are familiar with standard office tools, data managers apply electronic data capture (EDC) systems and statisticians work with statistics software. Case report forms (CRFs) specify the data model of study subjects, evolve over time and consist of hundreds to thousands of data items per study. To avoid erroneous manual transformation work, a converting tool for different representations of study data models was designed. It can convert between office format, EDC and statistics format. In addition, it supports semantic annotations, which enable precise definitions for data items. A reference implementation is available as open source package ODMconverter at http://cran.r-project.org.
Getting the big picture in community science: methods that capture context.

PubMed

Luke, Douglas A

2005-06-01

Community science has a rich tradition of using theories and research designs that are consistent with its core value of contextualism. However, a survey of empirical articles published in the American Journal of Community Psychology shows that community scientists utilize a narrow range of statistical tools that are not well suited to assess contextual data. Multilevel modeling, geographic information systems (GIS), social network analysis, and cluster analysis are recommended as useful tools to address contextual questions in community science. An argument for increased methodological consilience is presented, where community scientists are encouraged to adopt statistical methodology that is capable of modeling a greater proportion of the data than is typical with traditional methods.
Heterogeneous Structure of Stem Cells Dynamics: Statistical Models and Quantitative Predictions

PubMed Central

Bogdan, Paul; Deasy, Bridget M.; Gharaibeh, Burhan; Roehrs, Timo; Marculescu, Radu

2014-01-01

Understanding stem cell (SC) population dynamics is essential for developing models that can be used in basic science and medicine, to aid in predicting cells fate. These models can be used as tools e.g. in studying patho-physiological events at the cellular and tissue level, predicting (mal)functions along the developmental course, and personalized regenerative medicine. Using time-lapsed imaging and statistical tools, we show that the dynamics of SC populations involve a heterogeneous structure consisting of multiple sub-population behaviors. Using non-Gaussian statistical approaches, we identify the co-existence of fast and slow dividing subpopulations, and quiescent cells, in stem cells from three species. The mathematical analysis also shows that, instead of developing independently, SCs exhibit a time-dependent fractal behavior as they interact with each other through molecular and tactile signals. These findings suggest that more sophisticated models of SC dynamics should view SC populations as a collective and avoid the simplifying homogeneity assumption by accounting for the presence of more than one dividing sub-population, and their multi-fractal characteristics. PMID:24769917
A phylogenetic transform enhances analysis of compositional microbiota data.

PubMed

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-02-15

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.
Regression Models for Identifying Noise Sources in Magnetic Resonance Images

PubMed Central

Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

2009-01-01

Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
An optimization model to agroindustrial sector in antioquia (Colombia, South America)

NASA Astrophysics Data System (ADS)

Fernandez, J.

2015-06-01

This paper develops a proposal of a general optimization model for the flower industry, which is defined by using discrete simulation and nonlinear optimization, whose mathematical models have been solved by using ProModel simulation tools and Gams optimization. It defines the operations that constitute the production and marketing of the sector, statistically validated data taken directly from each operation through field work, the discrete simulation model of the operations and the linear optimization model of the entire industry chain are raised. The model is solved with the tools described above and presents the results validated in a case study.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Jarocki, John Charles; Zage, David John; Fisher, Andrew N.

LinkShop is a software tool for applying the method of Linkography to the analysis time-sequence data. LinkShop provides command line, web, and application programming interfaces (API) for input and processing of time-sequence data, abstraction models, and ontologies. The software creates graph representations of the abstraction model, ontology, and derived linkograph. Finally, the tool allows the user to perform statistical measurements of the linkograph and refine the ontology through direct manipulation of the linkograph.
Class Evolution Tree: A Graphical Tool to Support Decisions on the Number of Classes in Exploratory Categorical Latent Variable Modeling for Rehabilitation Research

ERIC Educational Resources Information Center

Kriston, Levente; Melchior, Hanne; Hergert, Anika; Bergelt, Corinna; Watzke, Birgit; Schulz, Holger; von Wolff, Alessa

2011-01-01

The aim of our study was to develop a graphical tool that can be used in addition to standard statistical criteria to support decisions on the number of classes in explorative categorical latent variable modeling for rehabilitation research. Data from two rehabilitation research projects were used. In the first study, a latent profile analysis was…
A Diagnostics Tool to detect ensemble forecast system anomaly and guide operational decisions

NASA Astrophysics Data System (ADS)

Park, G. H.; Srivastava, A.; Shrestha, E.; Thiemann, M.; Day, G. N.; Draijer, S.

2017-12-01

The hydrologic community is moving toward using ensemble forecasts to take uncertainty into account during the decision-making process. The New York City Department of Environmental Protection (DEP) implements several types of ensemble forecasts in their decision-making process: ensemble products for a statistical model (Hirsch and enhanced Hirsch); the National Weather Service (NWS) Advanced Hydrologic Prediction Service (AHPS) forecasts based on the classical Ensemble Streamflow Prediction (ESP) technique; and the new NWS Hydrologic Ensemble Forecasting Service (HEFS) forecasts. To remove structural error and apply the forecasts to additional forecast points, the DEP post processes both the AHPS and the HEFS forecasts. These ensemble forecasts provide mass quantities of complex data, and drawing conclusions from these forecasts is time-consuming and difficult. The complexity of these forecasts also makes it difficult to identify system failures resulting from poor data, missing forecasts, and server breakdowns. To address these issues, we developed a diagnostic tool that summarizes ensemble forecasts and provides additional information such as historical forecast statistics, forecast skill, and model forcing statistics. This additional information highlights the key information that enables operators to evaluate the forecast in real-time, dynamically interact with the data, and review additional statistics, if needed, to make better decisions. We used Bokeh, a Python interactive visualization library, and a multi-database management system to create this interactive tool. This tool compiles and stores data into HTML pages that allows operators to readily analyze the data with built-in user interaction features. This paper will present a brief description of the ensemble forecasts, forecast verification results, and the intended applications for the diagnostic tool.

Trends in modeling Biomedical Complex Systems

PubMed Central

Milanesi, Luciano; Romano, Paolo; Castellani, Gastone; Remondini, Daniel; Liò, Petro

2009-01-01

In this paper we provide an introduction to the techniques for multi-scale complex biological systems, from the single bio-molecule to the cell, combining theoretical modeling, experiments, informatics tools and technologies suitable for biological and biomedical research, which are becoming increasingly multidisciplinary, multidimensional and information-driven. The most important concepts on mathematical modeling methodologies and statistical inference, bioinformatics and standards tools to investigate complex biomedical systems are discussed and the prominent literature useful to both the practitioner and the theoretician are presented. PMID:19828068
In silico environmental chemical science: properties and processes from statistical and computational modelling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tratnyek, Paul G.; Bylaska, Eric J.; Weber, Eric J.

2017-01-01

Quantitative structure–activity relationships (QSARs) have long been used in the environmental sciences. More recently, molecular modeling and chemoinformatic methods have become widespread. These methods have the potential to expand and accelerate advances in environmental chemistry because they complement observational and experimental data with “in silico” results and analysis. The opportunities and challenges that arise at the intersection between statistical and theoretical in silico methods are most apparent in the context of properties that determine the environmental fate and effects of chemical contaminants (degradation rate constants, partition coefficients, toxicities, etc.). The main example of this is the calibration of QSARs usingmore » descriptor variable data calculated from molecular modeling, which can make QSARs more useful for predicting property data that are unavailable, but also can make them more powerful tools for diagnosis of fate determining pathways and mechanisms. Emerging opportunities for “in silico environmental chemical science” are to move beyond the calculation of specific chemical properties using statistical models and toward more fully in silico models, prediction of transformation pathways and products, incorporation of environmental factors into model predictions, integration of databases and predictive models into more comprehensive and efficient tools for exposure assessment, and extending the applicability of all the above from chemicals to biologicals and materials.« less
Automated clustering-based workload characterization

NASA Technical Reports Server (NTRS)

Pentakalos, Odysseas I.; Menasce, Daniel A.; Yesha, Yelena

1996-01-01

The demands placed on the mass storage systems at various federal agencies and national laboratories are continuously increasing in intensity. This forces system managers to constantly monitor the system, evaluate the demand placed on it, and tune it appropriately using either heuristics based on experience or analytic models. Performance models require an accurate workload characterization. This can be a laborious and time consuming process. It became evident from our experience that a tool is necessary to automate the workload characterization process. This paper presents the design and discusses the implementation of a tool for workload characterization of mass storage systems. The main features of the tool discussed here are: (1)Automatic support for peak-period determination. Histograms of system activity are generated and presented to the user for peak-period determination; (2) Automatic clustering analysis. The data collected from the mass storage system logs is clustered using clustering algorithms and tightness measures to limit the number of generated clusters; (3) Reporting of varied file statistics. The tool computes several statistics on file sizes such as average, standard deviation, minimum, maximum, frequency, as well as average transfer time. These statistics are given on a per cluster basis; (4) Portability. The tool can easily be used to characterize the workload in mass storage systems of different vendors. The user needs to specify through a simple log description language how the a specific log should be interpreted. The rest of this paper is organized as follows. Section two presents basic concepts in workload characterization as they apply to mass storage systems. Section three describes clustering algorithms and tightness measures. The following section presents the architecture of the tool. Section five presents some results of workload characterization using the tool.Finally, section six presents some concluding remarks.
Multivariate Strategies in Functional Magnetic Resonance Imaging

ERIC Educational Resources Information Center

Hansen, Lars Kai

2007-01-01

We discuss aspects of multivariate fMRI modeling, including the statistical evaluation of multivariate models and means for dimensional reduction. In a case study we analyze linear and non-linear dimensional reduction tools in the context of a "mind reading" predictive multivariate fMRI model.
The magnitude and effects of extreme solar particle events

NASA Astrophysics Data System (ADS)

Jiggens, Piers; Chavy-Macdonald, Marc-Andre; Santin, Giovanni; Menicucci, Alessandra; Evans, Hugh; Hilgers, Alain

2014-06-01

The solar energetic particle (SEP) radiation environment is an important consideration for spacecraft design, spacecraft mission planning and human spaceflight. Herein is presented an investigation into the likely severity of effects of a very large Solar Particle Event (SPE) on technology and humans in space. Fluences for SPEs derived using statistical models are compared to historical SPEs to verify their appropriateness for use in the analysis which follows. By combining environment tools with tools to model effects behind varying layers of spacecraft shielding it is possible to predict what impact a large SPE would be likely to have on a spacecraft in Near-Earth interplanetary space or geostationary Earth orbit. Also presented is a comparison of results generated using the traditional method of inputting the environment spectra, determined using a statistical model, into effects tools and a new method developed as part of the ESA SEPEM Project allowing for the creation of an effect time series on which statistics, previously applied to the flux data, can be run directly. The SPE environment spectra is determined and presented as energy integrated proton fluence (cm-2) as a function of particle energy (in MeV). This is input into the SHIELDOSE-2, MULASSIS, NIEL, GRAS and SEU effects tools to provide the output results. In the case of the new method for analysis, the flux time series is fed directly into the MULASSIS and GEMAT tools integrated into the SEPEM system. The output effect quantities include total ionising dose (in rads), non-ionising energy loss (MeV g-1), single event upsets (upsets/bit) and the dose in humans compared to established limits for stochastic (or cancer-causing) effects and tissue reactions (such as acute radiation sickness) in humans given in grey-equivalent and sieverts respectively.
An Interactive Tool For Semi-automated Statistical Prediction Using Earth Observations and Models

NASA Astrophysics Data System (ADS)

Zaitchik, B. F.; Berhane, F.; Tadesse, T.

2015-12-01

We developed a semi-automated statistical prediction tool applicable to concurrent analysis or seasonal prediction of any time series variable in any geographic location. The tool was developed using Shiny, JavaScript, HTML and CSS. A user can extract a predictand by drawing a polygon over a region of interest on the provided user interface (global map). The user can select the Climatic Research Unit (CRU) precipitation or Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) as predictand. They can also upload their own predictand time series. Predictors can be extracted from sea surface temperature, sea level pressure, winds at different pressure levels, air temperature at various pressure levels, and geopotential height at different pressure levels. By default, reanalysis fields are applied as predictors, but the user can also upload their own predictors, including a wide range of compatible satellite-derived datasets. The package generates correlations of the variables selected with the predictand. The user also has the option to generate composites of the variables based on the predictand. Next, the user can extract predictors by drawing polygons over the regions that show strong correlations (composites). Then, the user can select some or all of the statistical prediction models provided. Provided models include Linear Regression models (GLM, SGLM), Tree-based models (bagging, random forest, boosting), Artificial Neural Network, and other non-linear models such as Generalized Additive Model (GAM) and Multivariate Adaptive Regression Splines (MARS). Finally, the user can download the analysis steps they used, such as the region they selected, the time period they specified, the predictand and predictors they chose and preprocessing options they used, and the model results in PDF or HTML format. Key words: Semi-automated prediction, Shiny, R, GLM, ANN, RF, GAM, MARS
Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

PubMed

Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean

2010-11-01

Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.
Progress with modeling activity landscapes in drug discovery.

PubMed

Vogt, Martin

2018-04-19

Activity landscapes (ALs) are representations and models of compound data sets annotated with a target-specific activity. In contrast to quantitative structure-activity relationship (QSAR) models, ALs aim at characterizing structure-activity relationships (SARs) on a large-scale level encompassing all active compounds for specific targets. The popularity of AL modeling has grown substantially with the public availability of large activity-annotated compound data sets. AL modeling crucially depends on molecular representations and similarity metrics used to assess structural similarity. Areas covered: The concepts of AL modeling are introduced and its basis in quantitatively assessing molecular similarity is discussed. The different types of AL modeling approaches are introduced. AL designs can broadly be divided into three categories: compound-pair based, dimensionality reduction, and network approaches. Recent developments for each of these categories are discussed focusing on the application of mathematical, statistical, and machine learning tools for AL modeling. AL modeling using chemical space networks is covered in more detail. Expert opinion: AL modeling has remained a largely descriptive approach for the analysis of SARs. Beyond mere visualization, the application of analytical tools from statistics, machine learning and network theory has aided in the sophistication of AL designs and provides a step forward in transforming ALs from descriptive to predictive tools. To this end, optimizing representations that encode activity relevant features of molecules might prove to be a crucial step.
Tropical geometry of statistical models.

PubMed

Pachter, Lior; Sturmfels, Bernd

2004-11-16

This article presents a unified mathematical framework for inference in graphical models, building on the observation that graphical models are algebraic varieties. From this geometric viewpoint, observations generated from a model are coordinates of a point in the variety, and the sum-product algorithm is an efficient tool for evaluating specific coordinates. Here, we address the question of how the solutions to various inference problems depend on the model parameters. The proposed answer is expressed in terms of tropical algebraic geometry. The Newton polytope of a statistical model plays a key role. Our results are applied to the hidden Markov model and the general Markov model on a binary tree.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

PubMed Central

Chu, Annie; Cui, Jenny; Dinov, Ivo D.

2011-01-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments

PubMed Central

2010-01-01

Background The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Results Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Conclusions Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/. PMID:20482791
vFitness: a web-based computing tool for improving estimation of in vitro HIV-1 fitness experiments.

PubMed

Ma, Jingming; Dykes, Carrie; Wu, Tao; Huang, Yangxin; Demeter, Lisa; Wu, Hulin

2010-05-18

The replication rate (or fitness) between viral variants has been investigated in vivo and in vitro for human immunodeficiency virus (HIV). HIV fitness plays an important role in the development and persistence of drug resistance. The accurate estimation of viral fitness relies on complicated computations based on statistical methods. This calls for tools that are easy to access and intuitive to use for various experiments of viral fitness. Based on a mathematical model and several statistical methods (least-squares approach and measurement error models), a Web-based computing tool has been developed for improving estimation of virus fitness in growth competition assays of human immunodeficiency virus type 1 (HIV-1). Unlike the two-point calculation used in previous studies, the estimation here uses linear regression methods with all observed data in the competition experiment to more accurately estimate relative viral fitness parameters. The dilution factor is introduced for making the computational tool more flexible to accommodate various experimental conditions. This Web-based tool is implemented in C# language with Microsoft ASP.NET, and is publicly available on the Web at http://bis.urmc.rochester.edu/vFitness/.
Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

USGS Publications Warehouse

Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

2016-01-01

Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.
Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Relevance of the c-statistic when evaluating risk-adjustment models in surgery.

PubMed

Merkow, Ryan P; Hall, Bruce L; Cohen, Mark E; Dimick, Justin B; Wang, Edward; Chow, Warren B; Ko, Clifford Y; Bilimoria, Karl Y

2012-05-01

The measurement of hospital quality based on outcomes requires risk adjustment. The c-statistic is a popular tool used to judge model performance, but can be limited, particularly when evaluating specific operations in focused populations. Our objectives were to examine the interpretation and relevance of the c-statistic when used in models with increasingly similar case mix and to consider an alternative perspective on model calibration based on a graphical depiction of model fit. From the American College of Surgeons National Surgical Quality Improvement Program (2008-2009), patients were identified who underwent a general surgery procedure, and procedure groups were increasingly restricted: colorectal-all, colorectal-elective cases only, and colorectal-elective cancer cases only. Mortality and serious morbidity outcomes were evaluated using logistic regression-based risk adjustment, and model c-statistics and calibration curves were used to compare model performance. During the study period, 323,427 general, 47,605 colorectal-all, 39,860 colorectal-elective, and 21,680 colorectal cancer patients were studied. Mortality ranged from 1.0% in general surgery to 4.1% in the colorectal-all group, and serious morbidity ranged from 3.9% in general surgery to 12.4% in the colorectal-all procedural group. As case mix was restricted, c-statistics progressively declined from the general to the colorectal cancer surgery cohorts for both mortality and serious morbidity (mortality: 0.949 to 0.866; serious morbidity: 0.861 to 0.668). Calibration was evaluated graphically by examining predicted vs observed number of events over risk deciles. For both mortality and serious morbidity, there was no qualitative difference in calibration identified between the procedure groups. In the present study, we demonstrate how the c-statistic can become less informative and, in certain circumstances, can lead to incorrect model-based conclusions, as case mix is restricted and patients become more homogenous. Although it remains an important tool, caution is advised when the c-statistic is advanced as the sole measure of a model performance. Copyright © 2012 American College of Surgeons. All rights reserved.
Active Subspace Methods for Data-Intensive Inverse Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Qiqi

2017-04-27

The project has developed theory and computational tools to exploit active subspaces to reduce the dimension in statistical calibration problems. This dimension reduction enables MCMC methods to calibrate otherwise intractable models. The same theoretical and computational tools can also reduce the measurement dimension for calibration problems that use large stores of data.
Predicting Knowledge Workers' Participation in Voluntary Learning with Employee Characteristics and Online Learning Tools

ERIC Educational Resources Information Center

Hicks, Catherine

2018-01-01

Purpose: This paper aims to explore predicting employee learning activity via employee characteristics and usage for two online learning tools. Design/methodology/approach: Statistical analysis focused on observational data collected from user logs. Data are analyzed via regression models. Findings: Findings are presented for over 40,000…
PCA as a practical indicator of OPLS-DA model reliability.

PubMed

Worley, Bradley; Powers, Robert

Principal Component Analysis (PCA) and Orthogonal Projections to Latent Structures Discriminant Analysis (OPLS-DA) are powerful statistical modeling tools that provide insights into separations between experimental groups based on high-dimensional spectral measurements from NMR, MS or other analytical instrumentation. However, when used without validation, these tools may lead investigators to statistically unreliable conclusions. This danger is especially real for Partial Least Squares (PLS) and OPLS, which aggressively force separations between experimental groups. As a result, OPLS-DA is often used as an alternative method when PCA fails to expose group separation, but this practice is highly dangerous. Without rigorous validation, OPLS-DA can easily yield statistically unreliable group separation. A Monte Carlo analysis of PCA group separations and OPLS-DA cross-validation metrics was performed on NMR datasets with statistically significant separations in scores-space. A linearly increasing amount of Gaussian noise was added to each data matrix followed by the construction and validation of PCA and OPLS-DA models. With increasing added noise, the PCA scores-space distance between groups rapidly decreased and the OPLS-DA cross-validation statistics simultaneously deteriorated. A decrease in correlation between the estimated loadings (added noise) and the true (original) loadings was also observed. While the validity of the OPLS-DA model diminished with increasing added noise, the group separation in scores-space remained basically unaffected. Supported by the results of Monte Carlo analyses of PCA group separations and OPLS-DA cross-validation metrics, we provide practical guidelines and cross-validatory recommendations for reliable inference from PCA and OPLS-DA models.
A Climate Statistics Tool and Data Repository

NASA Astrophysics Data System (ADS)

Wang, J.; Kotamarthi, V. R.; Kuiper, J. A.; Orr, A.

2017-12-01

Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.

A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

PubMed

Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

2013-01-01

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
A Statistical Bias Correction Tool for Generating Climate Change Scenarios in Indonesia based on CMIP5 Datasets

NASA Astrophysics Data System (ADS)

Faqih, A.

2017-03-01

Providing information regarding future climate scenarios is very important in climate change study. The climate scenario can be used as basic information to support adaptation and mitigation studies. In order to deliver future climate scenarios over specific region, baseline and projection data from the outputs of global climate models (GCM) is needed. However, due to its coarse resolution, the data have to be downscaled and bias corrected in order to get scenario data with better spatial resolution that match the characteristics of the observed data. Generating this downscaled data is mostly difficult for scientist who do not have specific background, experience and skill in dealing with the complex data from the GCM outputs. In this regards, it is necessary to develop a tool that can be used to simplify the downscaling processes in order to help scientist, especially in Indonesia, for generating future climate scenario data that can be used for their climate change-related studies. In this paper, we introduce a tool called as “Statistical Bias Correction for Climate Scenarios (SiBiaS)”. The tool is specially designed to facilitate the use of CMIP5 GCM data outputs and process their statistical bias corrections relative to the reference data from observations. It is prepared for supporting capacity building in climate modeling in Indonesia as part of the Indonesia 3rd National Communication (TNC) project activities.
Efficient exploration of pan-cancer networks by generalized covariance selection and interactive web content

PubMed Central

Kling, Teresia; Johansson, Patrik; Sanchez, José; Marinescu, Voichita D.; Jörnsten, Rebecka; Nelander, Sven

2015-01-01

Statistical network modeling techniques are increasingly important tools to analyze cancer genomics data. However, current tools and resources are not designed to work across multiple diagnoses and technical platforms, thus limiting their applicability to comprehensive pan-cancer datasets such as The Cancer Genome Atlas (TCGA). To address this, we describe a new data driven modeling method, based on generalized Sparse Inverse Covariance Selection (SICS). The method integrates genetic, epigenetic and transcriptional data from multiple cancers, to define links that are present in multiple cancers, a subset of cancers, or a single cancer. It is shown to be statistically robust and effective at detecting direct pathway links in data from TCGA. To facilitate interpretation of the results, we introduce a publicly accessible tool (cancerlandscapes.org), in which the derived networks are explored as interactive web content, linked to several pathway and pharmacological databases. To evaluate the performance of the method, we constructed a model for eight TCGA cancers, using data from 3900 patients. The model rediscovered known mechanisms and contained interesting predictions. Possible applications include prediction of regulatory relationships, comparison of network modules across multiple forms of cancer and identification of drug targets. PMID:25953855
The Use of Modelling for Theory Building in Qualitative Analysis

ERIC Educational Resources Information Center

Briggs, Ann R. J.

2007-01-01

The purpose of this article is to exemplify and enhance the place of modelling as a qualitative process in educational research. Modelling is widely used in quantitative research as a tool for analysis, theory building and prediction. Statistical data lend themselves to graphical representation of values, interrelationships and operational…
Averaging Models: Parameters Estimation with the R-Average Procedure

ERIC Educational Resources Information Center

Vidotto, G.; Massidda, D.; Noventa, S.

2010-01-01

The Functional Measurement approach, proposed within the theoretical framework of Information Integration Theory (Anderson, 1981, 1982), can be a useful multi-attribute analysis tool. Compared to the majority of statistical models, the averaging model can account for interaction effects without adding complexity. The R-Average method (Vidotto &…
Application of spatial technology in malaria research & control: some new insights.

PubMed

Saxena, Rekha; Nagpal, B N; Srivastava, Aruna; Gupta, S K; Dash, A P

2009-08-01

Geographical information System (GIS) has emerged as the core of the spatial technology which integrates wide range of dataset available from different sources including Remote Sensing (RS) and Global Positioning System (GPS). Literature published during the decade (1998-2007) has been compiled and grouped into six categories according to the usage of the technology in malaria epidemiology. Different GIS modules like spatial data sources, mapping and geo-processing tools, distance calculation, digital elevation model (DEM), buffer zone and geo-statistical analysis have been investigated in detail, illustrated with examples as per the derived results. These GIS tools have contributed immensely in understanding the epidemiological processes of malaria and examples drawn have shown that GIS is now widely used for research and decision making in malaria control. Statistical data analysis currently is the most consistent and established set of tools to analyze spatial datasets. The desired future development of GIS is in line with the utilization of geo-statistical tools which combined with high quality data has capability to provide new insight into malaria epidemiology and the complexity of its transmission potential in endemic areas.
Comparison of in silico models for prediction of mutagenicity.

PubMed

Bakhtyari, Nazanin G; Raitano, Giuseppa; Benfenati, Emilio; Martin, Todd; Young, Douglas

2013-01-01

Using a dataset with more than 6000 compounds, the performance of eight quantitative structure activity relationships (QSAR) models was evaluated: ACD/Tox Suite, Absorption, Distribution, Metabolism, Elimination, and Toxicity of chemical substances (ADMET) predictor, Derek, Toxicity Estimation Software Tool (T.E.S.T.), TOxicity Prediction by Komputer Assisted Technology (TOPKAT), Toxtree, CEASAR, and SARpy (SAR in python). In general, the results showed a high level of performance. To have a realistic estimate of the predictive ability, the results for chemicals inside and outside the training set for each model were considered. The effect of applicability domain tools (when available) on the prediction accuracy was also evaluated. The predictive tools included QSAR models, knowledge-based systems, and a combination of both methods. Models based on statistical QSAR methods gave better results.
GAMBIT: the global and modular beyond-the-standard-model inference tool

NASA Astrophysics Data System (ADS)

Athron, Peter; Balazs, Csaba; Bringmann, Torsten; Buckley, Andy; Chrząszcz, Marcin; Conrad, Jan; Cornell, Jonathan M.; Dal, Lars A.; Dickinson, Hugh; Edsjö, Joakim; Farmer, Ben; Gonzalo, Tomás E.; Jackson, Paul; Krislock, Abram; Kvellestad, Anders; Lundberg, Johan; McKay, James; Mahmoudi, Farvah; Martinez, Gregory D.; Putze, Antje; Raklev, Are; Ripken, Joachim; Rogan, Christopher; Saavedra, Aldo; Savage, Christopher; Scott, Pat; Seo, Seon-Hee; Serra, Nicola; Weniger, Christoph; White, Martin; Wild, Sebastian

2017-11-01

We describe the open-source global fitting package GAMBIT: the Global And Modular Beyond-the-Standard-Model Inference Tool. GAMBIT combines extensive calculations of observables and likelihoods in particle and astroparticle physics with a hierarchical model database, advanced tools for automatically building analyses of essentially any model, a flexible and powerful system for interfacing to external codes, a suite of different statistical methods and parameter scanning algorithms, and a host of other utilities designed to make scans faster, safer and more easily-extendible than in the past. Here we give a detailed description of the framework, its design and motivation, and the current models and other specific components presently implemented in GAMBIT. Accompanying papers deal with individual modules and present first GAMBIT results. GAMBIT can be downloaded from gambit.hepforge.org.
Bayesian models: A statistical primer for ecologists

USGS Publications Warehouse

Hobbs, N. Thompson; Hooten, Mevin B.

2015-01-01

Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.

Code of Federal Regulations, 2010 CFR

2010-10-01

... Statistical Tool Reporting. 1852.223-76 Section 1852.223-76 Federal Acquisition Regulations System NATIONAL... Provisions and Clauses 1852.223-76 Federal Automotive Statistical Tool Reporting. As prescribed at 1823.271 and 1851.205, insert the following clause: Federal Automotive Statistical Tool Reporting (JUL 2003) If...
A Study on Predictive Analytics Application to Ship Machinery Maintenance

DTIC Science & Technology

2013-09-01

Looking at the nature of the time series forecasting method , it would be better applied to offline analysis . The application for real- time online...other system attributes in future. Two techniques of statistical analysis , mainly time series models and cumulative sum control charts, are discussed in...statistical tool employed for the two techniques of statistical analysis . Both time series forecasting as well as CUSUM control charts are shown to be
An Algebraic Implicitization and Specialization of Minimum KL-Divergence Models

NASA Astrophysics Data System (ADS)

Dukkipati, Ambedkar; Manathara, Joel George

In this paper we study representation of KL-divergence minimization, in the cases where integer sufficient statistics exists, using tools from polynomial algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. In particular, we also study the case of Kullback-Csisźar iteration scheme. We present implicit descriptions of these models and show that implicitization preserves specialization of prior distribution. This result leads us to a Gröbner bases method to compute an implicit representation of minimum KL-divergence models.
The Runaway Crisis: Is Family Therapy the Answer?

ERIC Educational Resources Information Center

Ostensen, Kay Wickett

1981-01-01

Presents research on the relationship of two family counseling models (one with temporary foster placement, one without) to the recidivism of runaway teenagers. Research shows the Brief Family Intervention counseling model to be a statistically viable tool in deterring repeated runaway episodes. (Author)
Quantifying falsifiability of scientific theories

NASA Astrophysics Data System (ADS)

Nemenman, Ilya

I argue that the notion of falsifiability, a key concept in defining a valid scientific theory, can be quantified using Bayesian Model Selection, which is a standard tool in modern statistics. This relates falsifiability to the quantitative version of the statistical Occam's razor, and allows transforming some long-running arguments about validity of scientific theories from philosophical discussions to rigorous mathematical calculations.
A phylogenetic transform enhances analysis of compositional microbiota data

PubMed Central

Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

2017-01-01

Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697
Statistical Tools for Fitting Models of the Population Consequences of Acoustic Disturbance to Data from Marine Mammal Populations (PCAD Tools II)

DTIC Science & Technology

2015-09-30

Interim PCOD approach. In both of these case studies we relied on expert knowledge to link disturbance to vital rates. In the right whale case study...the Interim Population Consequences of Disturbance ( PCoD ) Approach: Quantifying and Assessing the Effects of UK Offshore Renewable Energy
A review of statistical updating methods for clinical prediction models.

PubMed

Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

2018-01-01

A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.
Final Report, DOE Early Career Award: Predictive modeling of complex physical systems: new tools for statistical inference, uncertainty quantification, and experimental design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marzouk, Youssef

Predictive simulation of complex physical systems increasingly rests on the interplay of experimental observations with computational models. Key inputs, parameters, or structural aspects of models may be incomplete or unknown, and must be developed from indirect and limited observations. At the same time, quantified uncertainties are needed to qualify computational predictions in the support of design and decision-making. In this context, Bayesian statistics provides a foundation for inference from noisy and limited data, but at prohibitive computional expense. This project intends to make rigorous predictive modeling *feasible* in complex physical systems, via accelerated and scalable tools for uncertainty quantification, Bayesianmore » inference, and experimental design. Specific objectives are as follows: 1. Develop adaptive posterior approximations and dimensionality reduction approaches for Bayesian inference in high-dimensional nonlinear systems. 2. Extend accelerated Bayesian methodologies to large-scale {\\em sequential} data assimilation, fully treating nonlinear models and non-Gaussian state and parameter distributions. 3. Devise efficient surrogate-based methods for Bayesian model selection and the learning of model structure. 4. Develop scalable simulation/optimization approaches to nonlinear Bayesian experimental design, for both parameter inference and model selection. 5. Demonstrate these inferential tools on chemical kinetic models in reacting flow, constructing and refining thermochemical and electrochemical models from limited data. Demonstrate Bayesian filtering on canonical stochastic PDEs and in the dynamic estimation of inhomogeneous subsurface properties and flow fields.« less
Modeling the milling tool wear by using an evolutionary SVM-based model from milling runs experimental data

NASA Astrophysics Data System (ADS)

Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade

2015-12-01

The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.
Exploring complex dynamics in multi agent-based intelligent systems: Theoretical and experimental approaches using the Multi Agent-based Behavioral Economic Landscape (MABEL) model

NASA Astrophysics Data System (ADS)

Alexandridis, Konstantinos T.

This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.

A new statistical model for subgrid dispersion in large eddy simulations of particle-laden flows

NASA Astrophysics Data System (ADS)

Muela, Jordi; Lehmkuhl, Oriol; Pérez-Segarra, Carles David; Oliva, Asensi

2016-09-01

Dispersed multiphase turbulent flows are present in many industrial and commercial applications like internal combustion engines, turbofans, dispersion of contaminants, steam turbines, etc. Therefore, there is a clear interest in the development of models and numerical tools capable of performing detailed and reliable simulations about these kind of flows. Large Eddy Simulations offer good accuracy and reliable results together with reasonable computational requirements, making it a really interesting method to develop numerical tools for particle-laden turbulent flows. Nonetheless, in multiphase dispersed flows additional difficulties arises in LES, since the effect of the unresolved scales of the continuous phase over the dispersed phase is lost due to the filtering procedure. In order to solve this issue a model able to reconstruct the subgrid velocity seen by the particles is required. In this work a new model for the reconstruction of the subgrid scale effects over the dispersed phase is presented and assessed. This innovative methodology is based in the reconstruction of statistics via Probability Density Functions (PDFs).
The application of data mining techniques to oral cancer prognosis.

PubMed

Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

2015-05-01

This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.
Modelling for Prediction vs. Modelling for Understanding: Commentary on Musso et al. (2013)

ERIC Educational Resources Information Center

Edelsbrunner, Peter; Schneider, Michael

2013-01-01

Musso et al. (2013) predict students' academic achievement with high accuracy one year in advance from cognitive and demographic variables, using artificial neural networks (ANNs). They conclude that ANNs have high potential for theoretical and practical improvements in learning sciences. ANNs are powerful statistical modelling tools but they can…
PRANAS: A New Platform for Retinal Analysis and Simulation.

PubMed

Cessac, Bruno; Kornprobst, Pierre; Kraria, Selim; Nasser, Hassan; Pamplona, Daniela; Portelli, Geoffrey; Viéville, Thierry

2017-01-01

The retina encodes visual scenes by trains of action potentials that are sent to the brain via the optic nerve. In this paper, we describe a new free access user-end software allowing to better understand this coding. It is called PRANAS (https://pranas.inria.fr), standing for Platform for Retinal ANalysis And Simulation. PRANAS targets neuroscientists and modelers by providing a unique set of retina-related tools. PRANAS integrates a retina simulator allowing large scale simulations while keeping a strong biological plausibility and a toolbox for the analysis of spike train population statistics. The statistical method (entropy maximization under constraints) takes into account both spatial and temporal correlations as constraints, allowing to analyze the effects of memory on statistics. PRANAS also integrates a tool computing and representing in 3D (time-space) receptive fields. All these tools are accessible through a friendly graphical user interface. The most CPU-costly of them have been implemented to run in parallel.
Evaluation of in silico tools to predict the skin sensitization potential of chemicals.

PubMed

Verheyen, G R; Braeken, E; Van Deun, K; Van Miert, S

2017-01-01

Public domain and commercial in silico tools were compared for their performance in predicting the skin sensitization potential of chemicals. The packages were either statistical based (Vega, CASE Ultra) or rule based (OECD Toolbox, Toxtree, Derek Nexus). In practice, several of these in silico tools are used in gap filling and read-across, but here their use was limited to make predictions based on presence/absence of structural features associated to sensitization. The top 400 ranking substances of the ATSDR 2011 Priority List of Hazardous Substances were selected as a starting point. Experimental information was identified for 160 chemically diverse substances (82 positive and 78 negative). The prediction for skin sensitization potential was compared with the experimental data. Rule-based tools perform slightly better, with accuracies ranging from 0.6 (OECD Toolbox) to 0.78 (Derek Nexus), compared with statistical tools that had accuracies ranging from 0.48 (Vega) to 0.73 (CASE Ultra - LLNA weak model). Combining models increased the performance, with positive and negative predictive values up to 80% and 84%, respectively. However, the number of substances that were predicted positive or negative for skin sensitization in both models was low. Adding more substances to the dataset will increase the confidence in the conclusions reached. The insights obtained in this evaluation are incorporated in a web database www.asopus.weebly.com that provides a potential end user context for the scope and performance of different in silico tools with respect to a common dataset of curated skin sensitization data.
Estimating the Regional Economic Significance of Airports

DTIC Science & Technology

1992-09-01

following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single
Model identification using stochastic differential equation grey-box models in diabetes.

PubMed

Duun-Henriksen, Anne Katrine; Schmidt, Signe; Røge, Rikke Meldgaard; Møller, Jonas Bech; Nørgaard, Kirsten; Jørgensen, John Bagterp; Madsen, Henrik

2013-03-01

The acceptance of virtual preclinical testing of control algorithms is growing and thus also the need for robust and reliable models. Models based on ordinary differential equations (ODEs) can rarely be validated with standard statistical tools. Stochastic differential equations (SDEs) offer the possibility of building models that can be validated statistically and that are capable of predicting not only a realistic trajectory, but also the uncertainty of the prediction. In an SDE, the prediction error is split into two noise terms. This separation ensures that the errors are uncorrelated and provides the possibility to pinpoint model deficiencies. An identifiable model of the glucoregulatory system in a type 1 diabetes mellitus (T1DM) patient is used as the basis for development of a stochastic-differential-equation-based grey-box model (SDE-GB). The parameters are estimated on clinical data from four T1DM patients. The optimal SDE-GB is determined from likelihood-ratio tests. Finally, parameter tracking is used to track the variation in the "time to peak of meal response" parameter. We found that the transformation of the ODE model into an SDE-GB resulted in a significant improvement in the prediction and uncorrelated errors. Tracking of the "peak time of meal absorption" parameter showed that the absorption rate varied according to meal type. This study shows the potential of using SDE-GBs in diabetes modeling. Improved model predictions were obtained due to the separation of the prediction error. SDE-GBs offer a solid framework for using statistical tools for model validation and model development. © 2013 Diabetes Technology Society.
A Tutorial for Analyzing Human Reaction Times: How to Filter Data, Manage Missing Values, and Choose a Statistical Model

ERIC Educational Resources Information Center

Lachaud, Christian Michel; Renaud, Olivier

2011-01-01

This tutorial for the statistical processing of reaction times collected through a repeated-measure design is addressed to researchers in psychology. It aims at making explicit some important methodological issues, at orienting researchers to the existing solutions, and at providing them some evaluation tools for choosing the most robust and…
Camera-Model Identification Using Markovian Transition Probability Matrix

NASA Astrophysics Data System (ADS)

Xu, Guanshuo; Gao, Shang; Shi, Yun Qing; Hu, Ruimin; Su, Wei

Detecting the (brands and) models of digital cameras from given digital images has become a popular research topic in the field of digital forensics. As most of images are JPEG compressed before they are output from cameras, we propose to use an effective image statistical model to characterize the difference JPEG 2-D arrays of Y and Cb components from the JPEG images taken by various camera models. Specifically, the transition probability matrices derived from four different directional Markov processes applied to the image difference JPEG 2-D arrays are used to identify statistical difference caused by image formation pipelines inside different camera models. All elements of the transition probability matrices, after a thresholding technique, are directly used as features for classification purpose. Multi-class support vector machines (SVM) are used as the classification tool. The effectiveness of our proposed statistical model is demonstrated by large-scale experimental results.
Summary of hydrologic modeling for the Delaware River Basin using the Water Availability Tool for Environmental Resources (WATER)

USGS Publications Warehouse

Williamson, Tanja N.; Lant, Jeremiah G.; Claggett, Peter; Nystrom, Elizabeth A.; Milly, Paul C.D.; Nelson, Hugh L.; Hoffman, Scott A.; Colarullo, Susan J.; Fischer, Jeffrey M.

2015-11-18

The Water Availability Tool for Environmental Resources (WATER) is a decision support system for the nontidal part of the Delaware River Basin that provides a consistent and objective method of simulating streamflow under historical, forecasted, and managed conditions. In order to quantify the uncertainty associated with these simulations, however, streamflow and the associated hydroclimatic variables of potential evapotranspiration, actual evapotranspiration, and snow accumulation and snowmelt must be simulated and compared to long-term, daily observations from sites. This report details model development and optimization, statistical evaluation of simulations for 57 basins ranging from 2 to 930 km2 and 11.0 to 99.5 percent forested cover, and how this statistical evaluation of daily streamflow relates to simulating environmental changes and management decisions that are best examined at monthly time steps normalized over multiple decades. The decision support system provides a database of historical spatial and climatic data for simulating streamflow for 2001–11, in addition to land-cover and general circulation model forecasts that focus on 2030 and 2060. WATER integrates geospatial sampling of landscape characteristics, including topographic and soil properties, with a regionally calibrated hillslope-hydrology model, an impervious-surface model, and hydroclimatic models that were parameterized by using three hydrologic response units: forested, agricultural, and developed land cover. This integration enables the regional hydrologic modeling approach used in WATER without requiring site-specific optimization or those stationary conditions inferred when using a statistical model.
Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

PubMed

LaBudde, Robert A; Harnly, James M

2012-01-01

A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
A comparison of methods of fitting several models to nutritional response data.

PubMed

Vedenov, D; Pesti, G M

2008-02-01

A variety of models have been proposed to fit nutritional input-output response data. The models are typically nonlinear; therefore, fitting the models usually requires sophisticated statistical software and training to use it. An alternative tool for fitting nutritional response models was developed by using widely available and easier-to-use Microsoft Excel software. The tool, implemented as an Excel workbook (NRM.xls), allows simultaneous fitting and side-by-side comparisons of several popular models. This study compared the results produced by the tool we developed and PROC NLIN of SAS. The models compared were the broken line (ascending linear and quadratic segments), saturation kinetics, 4-parameter logistics, sigmoidal, and exponential models. The NRM.xls workbook provided results nearly identical to those of PROC NLIN. Furthermore, the workbook successfully fit several models that failed to converge in PROC NLIN. Two data sets were used as examples to compare fits by the different models. The results suggest that no particular nonlinear model is necessarily best for all nutritional response data.
EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks.

PubMed

Jenness, Samuel M; Goodreau, Steven M; Morris, Martina

2018-04-01

Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel , designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel , designed to facilitate the exploration of novel research questions for advanced modelers.
EpiModel: An R Package for Mathematical Modeling of Infectious Disease over Networks

PubMed Central

Jenness, Samuel M.; Goodreau, Steven M.; Morris, Martina

2018-01-01

Package EpiModel provides tools for building, simulating, and analyzing mathematical models for the population dynamics of infectious disease transmission in R. Several classes of models are included, but the unique contribution of this software package is a general stochastic framework for modeling the spread of epidemics on networks. EpiModel integrates recent advances in statistical methods for network analysis (temporal exponential random graph models) that allow the epidemic modeling to be grounded in empirical data on contacts that can spread infection. This article provides an overview of both the modeling tools built into EpiModel, designed to facilitate learning for students new to modeling, and the application programming interface for extending package EpiModel, designed to facilitate the exploration of novel research questions for advanced modelers. PMID:29731699
Integrating satellite imagery with simulation modeling to improve burn severity mapping

Treesearch

Eva C. Karau; Pamela G. Sikkink; Robert E. Keane; Gregory K. Dillon

2014-01-01

Both satellite imagery and spatial fire effects models are valuable tools for generating burn severity maps that are useful to fire scientists and resource managers. The purpose of this study was to test a new mapping approach that integrates imagery and modeling to create more accurate burn severity maps. We developed and assessed a statistical model that combines the...
On the Benefits of Latent Variable Modeling for Norming Scales: The Case of the "Supports Intensity Scale-Children's Version"

ERIC Educational Resources Information Center

Seo, Hyojeong; Little, Todd D.; Shogren, Karrie A.; Lang, Kyle M.

2016-01-01

Structural equation modeling (SEM) is a powerful and flexible analytic tool to model latent constructs and their relations with observed variables and other constructs. SEM applications offer advantages over classical models in dealing with statistical assumptions and in adjusting for measurement error. So far, however, SEM has not been fully used…
Fire and Smoke Model Evaluation Experiment (FASMEE): Modeling gaps and data needs

Treesearch

Yongqiang Liu; Adam Kochanski; Kirk Baker; Ruddy Mell; Rodman Linn; Ronan Paugam; Jan Mandel; Aime Fournier; Mary Ann Jenkins; Scott Goodrick; Gary Achtemeier; Andrew Hudak; Matthew Dickson; Brian Potter; Craig Clements; Shawn Urbanski; Roger Ottmar; Narasimhan Larkin; Timothy Brown; Nancy French; Susan Prichard; Adam Watts; Derek McNamara

2017-01-01

Fire and smoke models are numerical tools for simulating fire behavior, smoke dynamics, and air quality impacts of wildland fires. Fire models are developed based on the fundamental chemistry and physics of combustion and fire spread or statistical analysis of experimental data (Sullivan 2009). They provide information on fire spread and fuel consumption for safe and...
[Is there life beyond SPSS? Discover R].

PubMed

Elosua Oliden, Paula

2009-11-01

R is a GNU statistical and programming environment with very high graphical capabilities. It is very powerful for research purposes, but it is also an exceptional tool for teaching. R is composed of more than 1400 packages that allow using it for simple statistics and applying the most complex and most recent formal models. Using graphical interfaces like the Rcommander package, permits working in user-friendly environments which are similar to the graphical environment used by SPSS. This last characteristic allows non-statisticians to overcome the obstacle of accessibility, and it makes R the best tool for teaching. Is there anything better? Open, free, affordable, accessible and always on the cutting edge.
ConvAn: a convergence analyzing tool for optimization of biochemical networks.

PubMed

Kostromins, Andrejs; Mozga, Ivars; Stalidzans, Egils

2012-01-01

Dynamic models of biochemical networks usually are described as a system of nonlinear differential equations. In case of optimization of models for purpose of parameter estimation or design of new properties mainly numerical methods are used. That causes problems of optimization predictability as most of numerical optimization methods have stochastic properties and the convergence of the objective function to the global optimum is hardly predictable. Determination of suitable optimization method and necessary duration of optimization becomes critical in case of evaluation of high number of combinations of adjustable parameters or in case of large dynamic models. This task is complex due to variety of optimization methods, software tools and nonlinearity features of models in different parameter spaces. A software tool ConvAn is developed to analyze statistical properties of convergence dynamics for optimization runs with particular optimization method, model, software tool, set of optimization method parameters and number of adjustable parameters of the model. The convergence curves can be normalized automatically to enable comparison of different methods and models in the same scale. By the help of the biochemistry adapted graphical user interface of ConvAn it is possible to compare different optimization methods in terms of ability to find the global optima or values close to that as well as the necessary computational time to reach them. It is possible to estimate the optimization performance for different number of adjustable parameters. The functionality of ConvAn enables statistical assessment of necessary optimization time depending on the necessary optimization accuracy. Optimization methods, which are not suitable for a particular optimization task, can be rejected if they have poor repeatability or convergence properties. The software ConvAn is freely available on www.biosystems.lv/convan. Copyright Â© 2011 Elsevier Ireland Ltd. All rights reserved.
A new computer code for discrete fracture network modelling

NASA Astrophysics Data System (ADS)

Xu, Chaoshui; Dowd, Peter

2010-03-01

The authors describe a comprehensive software package for two- and three-dimensional stochastic rock fracture simulation using marked point processes. Fracture locations can be modelled by a Poisson, a non-homogeneous, a cluster or a Cox point process; fracture geometries and properties are modelled by their respective probability distributions. Virtual sampling tools such as plane, window and scanline sampling are included in the software together with a comprehensive set of statistical tools including histogram analysis, probability plots, rose diagrams and hemispherical projections. The paper describes in detail the theoretical basis of the implementation and provides a case study in rock fracture modelling to demonstrate the application of the software.

The Earthquake‐Source Inversion Validation (SIV) Project

USGS Publications Warehouse

Mai, P. Martin; Schorlemmer, Danijel; Page, Morgan T.; Ampuero, Jean-Paul; Asano, Kimiyuki; Causse, Mathieu; Custodio, Susana; Fan, Wenyuan; Festa, Gaetano; Galis, Martin; Gallovic, Frantisek; Imperatori, Walter; Käser, Martin; Malytskyy, Dmytro; Okuwaki, Ryo; Pollitz, Fred; Passone, Luca; Razafindrakoto, Hoby N. T.; Sekiguchi, Haruko; Song, Seok Goo; Somala, Surendra N.; Thingbaijam, Kiran K. S.; Twardzik, Cedric; van Driel, Martin; Vyas, Jagdish C.; Wang, Rongjiang; Yagi, Yuji; Zielke, Olaf

2016-01-01

Finite‐fault earthquake source inversions infer the (time‐dependent) displacement on the rupture surface from geophysical data. The resulting earthquake source models document the complexity of the rupture process. However, multiple source models for the same earthquake, obtained by different research teams, often exhibit remarkable dissimilarities. To address the uncertainties in earthquake‐source inversion methods and to understand strengths and weaknesses of the various approaches used, the Source Inversion Validation (SIV) project conducts a set of forward‐modeling exercises and inversion benchmarks. In this article, we describe the SIV strategy, the initial benchmarks, and current SIV results. Furthermore, we apply statistical tools for quantitative waveform comparison and for investigating source‐model (dis)similarities that enable us to rank the solutions, and to identify particularly promising source inversion approaches. All SIV exercises (with related data and descriptions) and statistical comparison tools are available via an online collaboration platform, and we encourage source modelers to use the SIV benchmarks for developing and testing new methods. We envision that the SIV efforts will lead to new developments for tackling the earthquake‐source imaging problem.
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis

ERIC Educational Resources Information Center

Camilleri, Liberato; Cefai, Carmel

2013-01-01

Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Designing a Qualitative Data Collection Strategy (QDCS) for Africa - Phase 1: A Gap Analysis of Existing Models, Simulations, and Tools Relating to Africa

DTIC Science & Technology

2012-06-01

generalized behavioral model characterized after the fictional Seldon equations (the one elaborated upon by Isaac Asimov in the 1951 novel, The...Foundation). Asimov described the Seldon equations as essentially statistical models with historical data of a sufficient size and variability that they
Network Meta-Analysis Using R: A Review of Currently Available Automated Packages

PubMed Central

Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

2014-01-01

Network meta-analysis (NMA) – a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously – has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA. PMID:25541687
Network meta-analysis using R: a review of currently available automated packages.

PubMed

Neupane, Binod; Richer, Danielle; Bonner, Ashley Joel; Kibret, Taddele; Beyene, Joseph

2014-01-01

Network meta-analysis (NMA)--a statistical technique that allows comparison of multiple treatments in the same meta-analysis simultaneously--has become increasingly popular in the medical literature in recent years. The statistical methodology underpinning this technique and software tools for implementing the methods are evolving. Both commercial and freely available statistical software packages have been developed to facilitate the statistical computations using NMA with varying degrees of functionality and ease of use. This paper aims to introduce the reader to three R packages, namely, gemtc, pcnetmeta, and netmeta, which are freely available software tools implemented in R. Each automates the process of performing NMA so that users can perform the analysis with minimal computational effort. We present, compare and contrast the availability and functionality of different important features of NMA in these three packages so that clinical investigators and researchers can determine which R packages to implement depending on their analysis needs. Four summary tables detailing (i) data input and network plotting, (ii) modeling options, (iii) assumption checking and diagnostic testing, and (iv) inference and reporting tools, are provided, along with an analysis of a previously published dataset to illustrate the outputs available from each package. We demonstrate that each of the three packages provides a useful set of tools, and combined provide users with nearly all functionality that might be desired when conducting a NMA.
An R package for analyzing and modeling ranking data

PubMed Central

2013-01-01

Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645
An R package for analyzing and modeling ranking data.

PubMed

Lee, Paul H; Yu, Philip L H

2013-05-14

In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.
A Bayesian statistical analysis of mouse dermal tumor promotion assay data for evaluating cigarette smoke condensate.

PubMed

Kathman, Steven J; Potts, Ryan J; Ayres, Paul H; Harp, Paul R; Wilson, Cody L; Garner, Charles D

2010-10-01

The mouse dermal assay has long been used to assess the dermal tumorigenicity of cigarette smoke condensate (CSC). This mouse skin model has been developed for use in carcinogenicity testing utilizing the SENCAR mouse as the standard strain. Though the model has limitations, it remains as the most relevant method available to study the dermal tumor promoting potential of mainstream cigarette smoke. In the typical SENCAR mouse CSC bioassay, CSC is applied for 29 weeks following the application of a tumor initiator such as 7,12-dimethylbenz[a]anthracene (DMBA). Several endpoints are considered for analysis including: the percentage of animals with at least one mass, latency, and number of masses per animal. In this paper, a relatively straightforward analytic model and procedure is presented for analyzing the time course of the incidence of masses. The procedure considered here takes advantage of Bayesian statistical techniques, which provide powerful methods for model fitting and simulation. Two datasets are analyzed to illustrate how the model fits the data, how well the model may perform in predicting data from such trials, and how the model may be used as a decision tool when comparing the dermal tumorigenicity of cigarette smoke condensate from multiple cigarette types. The analysis presented here was developed as a statistical decision tool for differentiating between two or more prototype products based on the dermal tumorigenicity. Copyright (c) 2010 Elsevier Inc. All rights reserved.
The Problem of Auto-Correlation in Parasitology

PubMed Central

Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick

2012-01-01

Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Uncertainty quantification for nuclear density functional theory and information content of new measurements.

PubMed

McDonnell, J D; Schunck, N; Higdon, D; Sarich, J; Wild, S M; Nazarewicz, W

2015-03-27

Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squares optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. The example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.
Predicting High Health Care Resource Utilization in a Single-payer Public Health Care System: Development and Validation of the High Resource User Population Risk Tool (HRUPoRT).

PubMed

Rosella, Laura C; Kornas, Kathy; Yao, Zhan; Manuel, Douglas G; Bornbaum, Catherine; Fransoo, Randall; Stukel, Therese

2017-11-17

A large proportion of health care spending is incurred by a small proportion of the population. Population-based health planning tools that consider both the clinical and upstream determinants of high resource users (HRU) of the health system are lacking. To develop and validate the High Resource User Population Risk Tool (HRUPoRT), a predictive model of adults that will become the top 5% of health care users over a 5-year period, based on self-reported clinical, sociodemographic, and health behavioral predictors in population survey data. The HRUPoRT model was developed in a prospective cohort design using the combined 2005 and 2007/2008 Canadian Community Health Surveys (CCHS) (N=58,617), and validated using the external 2009/2010 CCHS cohort (N=28,721). Health care utilization for each of the 5 years following CCHS interview date were determined by applying a person-centered costing algorithm to the linked health administrative databases. Discrimination and calibration of the model were assessed using c-statistic and Hosmer-Lemeshow (HL) χ statistic. The best prediction model for 5-year transition to HRU status included 12 predictors and had good discrimination (c-statistic=0.8213) and calibration (HL χ=18.71) in the development cohort. The model performed similarly in the validation cohort (c-statistic=0.8171; HL χ=19.95). The strongest predictors in the HRUPoRT model were age, perceived general health, and body mass index. HRUPoRT can accurately project the proportion of individuals in the population that will become a HRU over 5 years. HRUPoRT can be applied to inform health resource planning and prevention strategies at the community level.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. http://creativecommons.org/licenses/by-nc-nd/4.0/.
EEGLAB, SIFT, NFT, BCILAB, and ERICA: new tools for advanced EEG processing.

PubMed

Delorme, Arnaud; Mullen, Tim; Kothe, Christian; Akalin Acar, Zeynep; Bigdely-Shamlo, Nima; Vankov, Andrey; Makeig, Scott

2011-01-01

We describe a set of complementary EEG data collection and processing tools recently developed at the Swartz Center for Computational Neuroscience (SCCN) that connect to and extend the EEGLAB software environment, a freely available and readily extensible processing environment running under Matlab. The new tools include (1) a new and flexible EEGLAB STUDY design facility for framing and performing statistical analyses on data from multiple subjects; (2) a neuroelectromagnetic forward head modeling toolbox (NFT) for building realistic electrical head models from available data; (3) a source information flow toolbox (SIFT) for modeling ongoing or event-related effective connectivity between cortical areas; (4) a BCILAB toolbox for building online brain-computer interface (BCI) models from available data, and (5) an experimental real-time interactive control and analysis (ERICA) environment for real-time production and coordination of interactive, multimodal experiments.
New machine learning tools for predictive vegetation mapping after climate change: Bagging and Random Forest perform better than Regression Tree Analysis

Treesearch

L.R. Iverson; A.M. Prasad; A. Liaw

2004-01-01

More and better machine learning tools are becoming available for landscape ecologists to aid in understanding species-environment relationships and to map probable species occurrence now and potentially into the future. To thal end, we evaluated three statistical models: Regression Tree Analybib (RTA), Bagging Trees (BT) and Random Forest (RF) for their utility in...
Validity of plant fiber length measurement : a review of fiber length measurement based on kenaf as a model

Treesearch

James S. Han; Theodore Mianowski; Yi-yu Lin

1999-01-01

The efficacy of fiber length measurement techniques such as digitizing, the Kajaani procedure, and NIH Image are compared in order to determine the optimal tool. Kenaf bast fibers, aspen, and red pine fibers were collected from different anatomical parts, and the fiber lengths were compared using various analytical tools. A statistical analysis on the validity of the...
Semantic Importance Sampling for Statistical Model Checking

DTIC Science & Technology

2014-10-18

we implement SIS in a tool called osmosis and use it to verify a number of stochastic systems with rare events. Our results indicate that SIS reduces...background definitions and concepts. Section 4 presents SIS, and Section 5 presents our tool osmosis . In Section 6, we present our experiments and results...Syntactic Extraction ∗( ) dReal + Refinement ∗ |∗| , Monte-Carlo , Fig. 5. Architecture of osmosis
Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis

PubMed Central

McDermott, Josh H.; Simoncelli, Eero P.

2014-01-01

Rainstorms, insect swarms, and galloping horses produce “sound textures” – the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures. However, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation. PMID:21903084
Performance of Reclassification Statistics in Comparing Risk Prediction Models

PubMed Central

Paynter, Nina P.

2012-01-01

Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information. PMID:21294152
Implementing multiresolution models and families of models: from entity-level simulation to desktop stochastic models and "repro" models

NASA Astrophysics Data System (ADS)

McEver, Jimmie; Davis, Paul K.; Bigelow, James H.

2000-06-01

We have developed and used families of multiresolution and multiple-perspective models (MRM and MRMPM), both in our substantive analytic work for the Department of Defense and to learn more about how such models can be designed and implemented. This paper is a brief case history of our experience with a particular family of models addressing the use of precision fires in interdicting and halting an invading army. Our models were implemented as closed-form analytic solutions, in spreadsheets, and in the more sophisticated AnalyticaTM environment. We also drew on an entity-level simulation for data. The paper reviews the importance of certain key attributes of development environments (visual modeling, interactive languages, friendly use of array mathematics, facilities for experimental design and configuration control, statistical analysis tools, graphical visualization tools, interactive post-processing, and relational database tools). These can go a long way towards facilitating MRMPM work, but many of these attributes are not yet widely available (or available at all) in commercial model-development tools--especially for use with personal computers. We conclude with some lessons learned from our experience.
On the Benefits of Latent Variable Modeling for Norming Scales: The Case of the "Supports Intensity Scale--Children's Version"

ERIC Educational Resources Information Center

Seo, Hyojeong; Little, Todd D.; Shogren, Karrie A.; Lang, Kyle M.

2016-01-01

Structural equation modeling (SEM) is a powerful and flexible analytic tool to model latent constructs and their relations with observed variables and other constructs. SEM applications offer advantages over classical models in dealing with statistical assumptions and in adjusting for measurement error. So far, however, SEM has not been fully used…
Software tool for data mining and its applications

NASA Astrophysics Data System (ADS)

Yang, Jie; Ye, Chenzhou; Chen, Nianyi

2002-03-01

A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.

Geospatial Analysis Tool Kit for Regional Climate Datasets (GATOR) : An Open-source Tool to Compute Climate Statistic GIS Layers from Argonne Climate Modeling Results

DTIC Science & Technology

2017-08-01

This large repository of climate model results for North America (Wang and Kotamarthi 2013, 2014, 2015) is stored in Network Common Data Form (NetCDF...Network Common Data Form (NetCDF). UCAR/Unidata Program Center, Boulder, CO. Available at: http://www.unidata.ucar.edu/software/netcdf. Accessed on 6/20...emissions diverge from each other regarding fossil fuel use, technology, and other socioeconomic factors. As a result, the estimated emissions for each of
Computer aided drug design

NASA Astrophysics Data System (ADS)

Jain, A.

2017-08-01

Computer based method can help in discovery of leads and can potentially eliminate chemical synthesis and screening of many irrelevant compounds, and in this way, it save time as well as cost. Molecular modeling systems are powerful tools for building, visualizing, analyzing and storing models of complex molecular structure that can help to interpretate structure activity relationship. The use of various techniques of molecular mechanics and dynamics and software in Computer aided drug design along with statistics analysis is powerful tool for the medicinal chemistry to synthesis therapeutic and effective drugs with minimum side effect.
Generating community-built tools for data sharing and analysis in environmental networks

USGS Publications Warehouse

Read, Jordan S.; Gries, Corinna; Read, Emily K.; Klug, Jennifer; Hanson, Paul C.; Hipsey, Matthew R.; Jennings, Eleanor; O'Reilley, Catherine; Winslow, Luke A.; Pierson, Don; McBride, Christopher G.; Hamilton, David

2016-01-01

Rapid data growth in many environmental sectors has necessitated tools to manage and analyze these data. The development of tools often lags behind the proliferation of data, however, which may slow exploratory opportunities and scientific progress. The Global Lake Ecological Observatory Network (GLEON) collaborative model supports an efficient and comprehensive data–analysis–insight life cycle, including implementations of data quality control checks, statistical calculations/derivations, models, and data visualizations. These tools are community-built and openly shared. We discuss the network structure that enables tool development and a culture of sharing, leading to optimized output from limited resources. Specifically, data sharing and a flat collaborative structure encourage the development of tools that enable scientific insights from these data. Here we provide a cross-section of scientific advances derived from global-scale analyses in GLEON. We document enhancements to science capabilities made possible by the development of analytical tools and highlight opportunities to expand this framework to benefit other environmental networks.
Development and validation of a risk assessment tool for gastric cancer in a general Japanese population.

PubMed

Iida, Masahiro; Ikeda, Fumie; Hata, Jun; Hirakawa, Yoichiro; Ohara, Tomoyuki; Mukai, Naoko; Yoshida, Daigo; Yonemoto, Koji; Esaki, Motohiro; Kitazono, Takanari; Kiyohara, Yutaka; Ninomiya, Toshiharu

2018-05-01

There have been very few reports of risk score models for the development of gastric cancer. The aim of this study was to develop and validate a risk assessment tool for discerning future gastric cancer risk in Japanese. A total of 2444 subjects aged 40 years or over were followed up for 14 years from 1988 (derivation cohort), and 3204 subjects of the same age group were followed up for 5 years from 2002 (validation cohort). The weighting (risk score) of each risk factor for predicting future gastric cancer in the risk assessment tool was determined based on the coefficients of a Cox proportional hazards model in the derivation cohort. The goodness of fit of the established risk assessment tool was assessed using the c-statistic and the Hosmer-Lemeshow test in the validation cohort. During the follow-up, gastric cancer developed in 90 subjects in the derivation cohort and 35 subjects in the validation cohort. In the derivation cohort, the risk prediction model for gastric cancer was established using significant risk factors: age, sex, the combination of Helicobacter pylori antibody and pepsinogen status, hemoglobin A1c level, and smoking status. The incidence of gastric cancer increased significantly as the sum of risk scores increased (P trend < 0.001). The risk assessment tool was validated internally and showed good discrimination (c-statistic = 0.76) and calibration (Hosmer-Lemeshow test P = 0.43) in the validation cohort. We developed a risk assessment tool for gastric cancer that provides a useful guide for stratifying an individual's risk of future gastric cancer.
Streamflow Simulations and Percolation Estimates Using the Soil and Water Assessment Tool for Selected Basins in North-Central Nebraska, 1940-2005

USGS Publications Warehouse

Strauch, Kellan R.; Linard, Joshua I.

2009-01-01

The U.S. Geological Survey, in cooperation with the Upper Elkhorn, Lower Elkhorn, Upper Loup, Lower Loup, Middle Niobrara, Lower Niobrara, Lewis and Clark, and Lower Platte North Natural Resources Districts, used the Soil and Water Assessment Tool to simulate streamflow and estimate percolation in north-central Nebraska to aid development of long-term strategies for management of hydrologically connected ground and surface water. Although groundwater models adequately simulate subsurface hydrologic processes, they often are not designed to simulate the hydrologically complex processes occurring at or near the land surface. The use of watershed models such as the Soil and Water Assessment Tool, which are designed specifically to simulate surface and near-subsurface processes, can provide helpful insight into the effects of surface-water hydrology on the groundwater system. The Soil and Water Assessment Tool was calibrated for five stream basins in the Elkhorn-Loup Groundwater Model study area in north-central Nebraska to obtain spatially variable estimates of percolation. Six watershed models were calibrated to recorded streamflow in each subbasin by modifying the adjustment parameters. The calibrated parameter sets were then used to simulate a validation period; the validation period was half of the total streamflow period of record with a minimum requirement of 10 years. If the statistical and water-balance results for the validation period were similar to those for the calibration period, a model was considered satisfactory. Statistical measures of each watershed model's performance were variable. These objective measures included the Nash-Sutcliffe measure of efficiency, the ratio of the root-mean-square error to the standard deviation of the measured data, and an estimate of bias. The model met performance criteria for the bias statistic, but failed to meet statistical adequacy criteria for the other two performance measures when evaluated at a monthly time step. A primary cause of the poor model validation results was the inability of the model to reproduce the sustained base flow and streamflow response to precipitation that was observed in the Sand Hills region. The watershed models also were evaluated based on how well they conformed to the annual mass balance (precipitation equals the sum of evapotranspiration, streamflow/runoff, and deep percolation). The model was able to adequately simulate annual values of evapotranspiration, runoff, and precipitation in comparison to reported values, which indicates the model may provide reasonable estimates of annual percolation. Mean annual percolation estimated by the model as basin averages varied within the study area from a maximum of 12.9 inches in the Loup River Basin to a minimum of 1.5 inches in the Shell Creek Basin. Percolation also varied within the studied basins; basin headwaters tended to have greater percolation rates than downstream areas. This variance in percolation rates was mainly was because of the predominance of sandy, highly permeable soils in the upstream areas of the modeled basins.
Cost Modeling for Space Telescope

NASA Technical Reports Server (NTRS)

Stahl, H. Philip

2011-01-01

Parametric cost models are an important tool for planning missions, compare concepts and justify technology investments. This paper presents on-going efforts to develop single variable and multi-variable cost models for space telescope optical telescope assembly (OTA). These models are based on data collected from historical space telescope missions. Standard statistical methods are used to derive CERs for OTA cost versus aperture diameter and mass. The results are compared with previously published models.
Mainstreaming Modeling and Simulation to Accelerate Public Health Innovation

PubMed Central

Sepulveda, Martin-J.; Mabry, Patricia L.

2014-01-01

Dynamic modeling and simulation are systems science tools that examine behaviors and outcomes resulting from interactions among multiple system components over time. Although there are excellent examples of their application, they have not been adopted as mainstream tools in population health planning and policymaking. Impediments to their use include the legacy and ease of use of statistical approaches that produce estimates with confidence intervals, the difficulty of multidisciplinary collaboration for modeling and simulation, systems scientists’ inability to communicate effectively the added value of the tools, and low funding for population health systems science. Proposed remedies include aggregation of diverse data sets, systems science training for public health and other health professionals, changing research incentives toward collaboration, and increased funding for population health systems science projects. PMID:24832426
Bio-jETI: a service integration, design, and provisioning platform for orchestrated bioinformatics processes.

PubMed

Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard

2008-04-25

With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way.
Non-stationary background intensity and Caribbean seismic events

NASA Astrophysics Data System (ADS)

Valmy, Larissa; Vaillant, Jean

2014-05-01

We consider seismic risk calculation based on models with non-stationary background intensity. The aim is to improve predictive strategies in the framework of seismic risk assessment from models describing at best the seismic activity in the Caribbean arc. Appropriate statistical methods are required for analyzing the volumes of data collected. The focus is on calculating earthquakes occurrences probability and analyzing spatiotemporal evolution of these probabilities. The main modeling tool is the point process theory in order to take into account past history prior to a given date. Thus, the seismic event conditional intensity is expressed by means of the background intensity and the self exciting component. This intensity can be interpreted as the expected event rate per time and / or surface unit. The most popular intensity model in seismology is the ETAS (Epidemic Type Aftershock Sequence) model introduced and then generalized by Ogata [2, 3]. We extended this model and performed a comparison of different probability density functions for the triggered event times [4]. We illustrate our model by considering the CDSA (Centre de Données Sismiques des Antilles) catalog [1] which contains more than 7000 seismic events occurred in the Lesser Antilles arc. Statistical tools for testing the background intensity stationarity and for dynamical segmentation are presented. [1] Bengoubou-Valérius M., Bazin S., Bertil D., Beauducel F. and Bosson A. (2008). CDSA: a new seismological data center for the French Lesser Antilles, Seismol. Res. Lett., 79 (1), 90-102. [2] Ogata Y. (1998). Space-time point-process models for earthquake occurrences, Annals of the Institute of Statistical Mathematics, 50 (2), 379-402. [3] Ogata, Y. (2011). Significant improvements of the space-time ETAS model for forecasting of accurate baseline seismicity, Earth, Planets and Space, 63 (3), 217-229. [4] Valmy L. and Vaillant J. (2013). Statistical models in seismology: Lesser Antilles arc case, Bull. Soc. géol. France, 2013, 184 (1), 61-67.
Operation Reliability Assessment for Cutting Tools by Applying a Proportional Covariate Model to Condition Monitoring Information

PubMed Central

Cai, Gaigai; Chen, Xuefeng; Li, Bing; Chen, Baojia; He, Zhengjia

2012-01-01

The reliability of cutting tools is critical to machining precision and production efficiency. The conventional statistic-based reliability assessment method aims at providing a general and overall estimation of reliability for a large population of identical units under given and fixed conditions. However, it has limited effectiveness in depicting the operational characteristics of a cutting tool. To overcome this limitation, this paper proposes an approach to assess the operation reliability of cutting tools. A proportional covariate model is introduced to construct the relationship between operation reliability and condition monitoring information. The wavelet packet transform and an improved distance evaluation technique are used to extract sensitive features from vibration signals, and a covariate function is constructed based on the proportional covariate model. Ultimately, the failure rate function of the cutting tool being assessed is calculated using the baseline covariate function obtained from a small sample of historical data. Experimental results and a comparative study show that the proposed method is effective for assessing the operation reliability of cutting tools. PMID:23201980
Probabilistic inversion with graph cuts: Application to the Boise Hydrogeophysical Research Site

NASA Astrophysics Data System (ADS)

Pirot, Guillaume; Linde, Niklas; Mariethoz, Grégoire; Bradford, John H.

2017-02-01

Inversion methods that build on multiple-point statistics tools offer the possibility to obtain model realizations that are not only in agreement with field data, but also with conceptual geological models that are represented by training images. A recent inversion approach based on patch-based geostatistical resimulation using graph cuts outperforms state-of-the-art multiple-point statistics methods when applied to synthetic inversion examples featuring continuous and discontinuous property fields. Applications of multiple-point statistics tools to field data are challenging due to inevitable discrepancies between actual subsurface structure and the assumptions made in deriving the training image. We introduce several amendments to the original graph cut inversion algorithm and present a first-ever field application by addressing porosity estimation at the Boise Hydrogeophysical Research Site, Boise, Idaho. We consider both a classical multi-Gaussian and an outcrop-based prior model (training image) that are in agreement with available porosity data. When conditioning to available crosshole ground-penetrating radar data using Markov chain Monte Carlo, we find that the posterior realizations honor overall both the characteristics of the prior models and the geophysical data. The porosity field is inverted jointly with the measurement error and the petrophysical parameters that link dielectric permittivity to porosity. Even though the multi-Gaussian prior model leads to posterior realizations with higher likelihoods, the outcrop-based prior model shows better convergence. In addition, it offers geologically more realistic posterior realizations and it better preserves the full porosity range of the prior.
Modeling forest biomass and growth: Coupling long-term inventory and LiDAR data

Treesearch

Chad Babcock; Andrew O. Finley; Bruce D. Cook; Aaron Weiskittel; Christopher W. Woodall

2016-01-01

Combining spatially-explicit long-term forest inventory and remotely sensed information from Light Detection and Ranging (LiDAR) datasets through statistical models can be a powerful tool for predicting and mapping above-ground biomass (AGB) at a range of geographic scales. We present and examine a novel modeling approach to improve prediction of AGB and estimate AGB...
A Primer on Value-Added Models: Towards a Better Understanding of the Quantitative Analysis of Student Achievement

ERIC Educational Resources Information Center

Nakamura, Yugo

2013-01-01

Value-added models (VAMs) have received considerable attention as a tool to transform our public education system. However, as VAMs are studied by researchers from a broad range of academic disciplines who remain divided over the best methods in analyzing the models and stakeholders without the extensive statistical background have been excluded…
Snug as a Bug: Goodness of Fit and Quality of Models.

PubMed

Jupiter, Daniel C

In elucidating risk factors, or attempting to make predictions about the behavior of subjects in our biomedical studies, we often build statistical models. These models are meant to capture some aspect of reality, or some real-world process underlying the phenomena we are examining. However, no model is perfect, and it is thus important to have tools to assess how accurate models are. In this commentary, we delve into the various roles that our models can play. Then we introduce the notion of the goodness of fit of models and lay the ground work for further study of diagnostic tests for assessing both the fidelity of our models and the statistical assumptions underlying them. Copyright © 2017 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Estimating the Diets of Animals Using Stable Isotopes and a Comprehensive Bayesian Mixing Model

PubMed Central

Hopkins, John B.; Ferguson, Jake M.

2012-01-01

Using stable isotope mixing models (SIMMs) as a tool to investigate the foraging ecology of animals is gaining popularity among researchers. As a result, statistical methods are rapidly evolving and numerous models have been produced to estimate the diets of animals—each with their benefits and their limitations. Deciding which SIMM to use is contingent on factors such as the consumer of interest, its food sources, sample size, the familiarity a user has with a particular framework for statistical analysis, or the level of inference the researcher desires to make (e.g., population- or individual-level). In this paper, we provide a review of commonly used SIMM models and describe a comprehensive SIMM that includes all features commonly used in SIMM analysis and two new features. We used data collected in Yosemite National Park to demonstrate IsotopeR's ability to estimate dietary parameters. We then examined the importance of each feature in the model and compared our results to inferences from commonly used SIMMs. IsotopeR's user interface (in R) will provide researchers a user-friendly tool for SIMM analysis. The model is also applicable for use in paleontology, archaeology, and forensic studies as well as estimating pollution inputs. PMID:22235246
Virtual Model Validation of Complex Multiscale Systems: Applications to Nonlinear Elastostatics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oden, John Tinsley; Prudencio, Ernest E.; Bauman, Paul T.

We propose a virtual statistical validation process as an aid to the design of experiments for the validation of phenomenological models of the behavior of material bodies, with focus on those cases in which knowledge of the fabrication process used to manufacture the body can provide information on the micro-molecular-scale properties underlying macroscale behavior. One example is given by models of elastomeric solids fabricated using polymerization processes. We describe a framework for model validation that involves Bayesian updates of parameters in statistical calibration and validation phases. The process enables the quanti cation of uncertainty in quantities of interest (QoIs) andmore » the determination of model consistency using tools of statistical information theory. We assert that microscale information drawn from molecular models of the fabrication of the body provides a valuable source of prior information on parameters as well as a means for estimating model bias and designing virtual validation experiments to provide information gain over calibration posteriors.« less
Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

PubMed

Campbell, J Q; Petrella, A J

2016-09-06

Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.
Comparative evaluation of spectroscopic models using different multivariate statistical tools in a multicancer scenario

NASA Astrophysics Data System (ADS)

Ghanate, A. D.; Kothiwale, S.; Singh, S. P.; Bertrand, Dominique; Krishna, C. Murali

2011-02-01

Cancer is now recognized as one of the major causes of morbidity and mortality. Histopathological diagnosis, the gold standard, is shown to be subjective, time consuming, prone to interobserver disagreement, and often fails to predict prognosis. Optical spectroscopic methods are being contemplated as adjuncts or alternatives to conventional cancer diagnostics. The most important aspect of these approaches is their objectivity, and multivariate statistical tools play a major role in realizing it. However, rigorous evaluation of the robustness of spectral models is a prerequisite. The utility of Raman spectroscopy in the diagnosis of cancers has been well established. Until now, the specificity and applicability of spectral models have been evaluated for specific cancer types. In this study, we have evaluated the utility of spectroscopic models representing normal and malignant tissues of the breast, cervix, colon, larynx, and oral cavity in a broader perspective, using different multivariate tests. The limit test, which was used in our earlier study, gave high sensitivity but suffered from poor specificity. The performance of other methods such as factorial discriminant analysis and partial least square discriminant analysis are at par with more complex nonlinear methods such as decision trees, but they provide very little information about the classification model. This comparative study thus demonstrates not just the efficacy of Raman spectroscopic models but also the applicability and limitations of different multivariate tools for discrimination under complex conditions such as the multicancer scenario.
Statistical analysis and ANN modeling for predicting hydrological extremes under climate change scenarios: the example of a small Mediterranean agro-watershed.

PubMed

Kourgialas, Nektarios N; Dokou, Zoi; Karatzas, George P

2015-05-01

The purpose of this study was to create a modeling management tool for the simulation of extreme flow events under current and future climatic conditions. This tool is a combination of different components and can be applied in complex hydrogeological river basins, where frequent flood and drought phenomena occur. The first component is the statistical analysis of the available hydro-meteorological data. Specifically, principal components analysis was performed in order to quantify the importance of the hydro-meteorological parameters that affect the generation of extreme events. The second component is a prediction-forecasting artificial neural network (ANN) model that simulates, accurately and efficiently, river flow on an hourly basis. This model is based on a methodology that attempts to resolve a very difficult problem related to the accurate estimation of extreme flows. For this purpose, the available measurements (5 years of hourly data) were divided in two subsets: one for the dry and one for the wet periods of the hydrological year. This way, two ANNs were created, trained, tested and validated for a complex Mediterranean river basin in Crete, Greece. As part of the second management component a statistical downscaling tool was used for the creation of meteorological data according to the higher and lower emission climate change scenarios A2 and B1. These data are used as input in the ANN for the forecasting of river flow for the next two decades. The final component is the application of a meteorological index on the measured and forecasted precipitation and flow data, in order to assess the severity and duration of extreme events. Copyright © 2015 Elsevier Ltd. All rights reserved.
Evolution of Natural Attenuation Evaluation Protocols

EPA Science Inventory

Traditionally the evaluation of the efficacy of natural attenuation was based on changes in contaminant concentrations and mass reduction. Statistical tools and models such as Bioscreen provided evaluation protocols which now are being approached via other vehicles including m...

Habitat classification modelling with incomplete data: Pushing the habitat envelope

Treesearch

Phoebe L. Zarnetske; Thomas C. Edwards; Gretchen G. Moisen

2007-01-01

Habitat classification models (HCMs) are invaluable tools for species conservation, land-use planning, reserve design, and metapopulation assessments, particularly at broad spatial scales. However, species occurrence data are often lacking and typically limited to presence points at broad scales. This lack of absence data precludes the use of many statistical...
Graphical tools for network meta-analysis in STATA.

PubMed

Chaimani, Anna; Higgins, Julian P T; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia

2013-01-01

Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results.
Graphical Tools for Network Meta-Analysis in STATA

PubMed Central

Chaimani, Anna; Higgins, Julian P. T.; Mavridis, Dimitris; Spyridonos, Panagiota; Salanti, Georgia

2013-01-01

Network meta-analysis synthesizes direct and indirect evidence in a network of trials that compare multiple interventions and has the potential to rank the competing treatments according to the studied outcome. Despite its usefulness network meta-analysis is often criticized for its complexity and for being accessible only to researchers with strong statistical and computational skills. The evaluation of the underlying model assumptions, the statistical technicalities and presentation of the results in a concise and understandable way are all challenging aspects in the network meta-analysis methodology. In this paper we aim to make the methodology accessible to non-statisticians by presenting and explaining a series of graphical tools via worked examples. To this end, we provide a set of STATA routines that can be easily employed to present the evidence base, evaluate the assumptions, fit the network meta-analysis model and interpret its results. PMID:24098547
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.

PubMed

Mi, Gu; Di, Yanming; Schafer, Daniel W

2015-01-01

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
GAPIT: genome association and prediction integrated tool.

PubMed

Lipka, Alexander E; Tian, Feng; Wang, Qishan; Peiffer, Jason; Li, Meng; Bradbury, Peter J; Gore, Michael A; Buckler, Edward S; Zhang, Zhiwu

2012-09-15

Software programs that conduct genome-wide association studies and genomic prediction and selection need to use methodologies that maximize statistical power, provide high prediction accuracy and run in a computationally efficient manner. We developed an R package called Genome Association and Prediction Integrated Tool (GAPIT) that implements advanced statistical methods including the compressed mixed linear model (CMLM) and CMLM-based genomic prediction and selection. The GAPIT package can handle large datasets in excess of 10 000 individuals and 1 million single-nucleotide polymorphisms with minimal computational time, while providing user-friendly access and concise tables and graphs to interpret results. http://www.maizegenetics.net/GAPIT. zhiwu.zhang@cornell.edu Supplementary data are available at Bioinformatics online.
An Evaluation Tool for CONUS-Scale Estimates of Components of the Water Balance

NASA Astrophysics Data System (ADS)

Saxe, S.; Hay, L.; Farmer, W. H.; Markstrom, S. L.; Kiang, J. E.

2016-12-01

Numerous research groups are independently developing data products to represent various components of the water balance (e.g. runoff, evapotranspiration, recharge, snow water equivalent, soil moisture, and climate) at the scale of the conterminous United States. These data products are derived from a range of sources, including direct measurement, remotely-sensed measurement, and statistical and deterministic model simulations. An evaluation tool is needed to compare these data products and the components of the water balance they contain in order to identify the gaps in the understanding and representation of continental-scale hydrologic processes. An ideal tool will be an objective, universally agreed upon, framework to address questions related to closing the water balance. This type of generic, model agnostic evaluation tool would facilitate collaboration amongst different hydrologic research groups and improve modeling capabilities with respect to continental-scale water resources. By adopting a comprehensive framework to consider hydrologic modeling in the context of a complete water balance, it is possible to identify weaknesses in process modeling, data product representation and regional hydrologic variation. As part of its National Water Census initiative, the U.S. Geological survey is facilitating this dialogue to developing prototype evaluation tools.
ANEMOS: Development of a next generation wind power forecasting system for the large-scale integration of onshore and offshore wind farms.

NASA Astrophysics Data System (ADS)

Kariniotakis, G.; Anemos Team

2003-04-01

Objectives: Accurate forecasting of the wind energy production up to two days ahead is recognized as a major contribution for reliable large-scale wind power integration. Especially, in a liberalized electricity market, prediction tools enhance the position of wind energy compared to other forms of dispatchable generation. ANEMOS, is a new 3.5 years R&D project supported by the European Commission, that resembles research organizations and end-users with an important experience on the domain. The project aims to develop advanced forecasting models that will substantially outperform current methods. Emphasis is given to situations like complex terrain, extreme weather conditions, as well as to offshore prediction for which no specific tools currently exist. The prediction models will be implemented in a software platform and installed for online operation at onshore and offshore wind farms by the end-users participating in the project. Approach: The paper presents the methodology of the project. Initially, the prediction requirements are identified according to the profiles of the end-users. The project develops prediction models based on both a physical and an alternative statistical approach. Research on physical models gives emphasis to techniques for use in complex terrain and the development of prediction tools based on CFD techniques, advanced model output statistics or high-resolution meteorological information. Statistical models (i.e. based on artificial intelligence) are developed for downscaling, power curve representation, upscaling for prediction at regional or national level, etc. A benchmarking process is set-up to evaluate the performance of the developed models and to compare them with existing ones using a number of case studies. The synergy between statistical and physical approaches is examined to identify promising areas for further improvement of forecasting accuracy. Appropriate physical and statistical prediction models are also developed for offshore wind farms taking into account advances in marine meteorology (interaction between wind and waves, coastal effects). The benefits from the use of satellite radar images for modeling local weather patterns are investigated. A next generation forecasting software, ANEMOS, will be developed to integrate the various models. The tool is enhanced by advanced Information Communication Technology (ICT) functionality and can operate both in stand alone, or remote mode, or be interfaced with standard Energy or Distribution Management Systems (EMS/DMS) systems. Contribution: The project provides an advanced technology for wind resource forecasting applicable in a large scale: at a single wind farm, regional or national level and for both interconnected and island systems. A major milestone is the on-line operation of the developed software by the participating utilities for onshore and offshore wind farms and the demonstration of the economic benefits. The outcome of the ANEMOS project will help consistently the increase of wind integration in two levels; in an operational level due to better management of wind farms, but also, it will contribute to increasing the installed capacity of wind farms. This is because accurate prediction of the resource reduces the risk of wind farm developers, who are then more willing to undertake new wind farm installations especially in a liberalized electricity market environment.
Statistical learning and adaptive decision-making underlie human response time variability in inhibitory control.

PubMed

Ma, Ning; Yu, Angela J

2015-01-01

Response time (RT) is an oft-reported behavioral measure in psychological and neurocognitive experiments, but the high level of observed trial-to-trial variability in this measure has often limited its usefulness. Here, we combine computational modeling and psychophysics to examine the hypothesis that fluctuations in this noisy measure reflect dynamic computations in human statistical learning and corresponding cognitive adjustments. We present data from the stop-signal task (SST), in which subjects respond to a go stimulus on each trial, unless instructed not to by a subsequent, infrequently presented stop signal. We model across-trial learning of stop signal frequency, P(stop), and stop-signal onset time, SSD (stop-signal delay), with a Bayesian hidden Markov model, and within-trial decision-making with an optimal stochastic control model. The combined model predicts that RT should increase with both expected P(stop) and SSD. The human behavioral data (n = 20) bear out this prediction, showing P(stop) and SSD both to be significant, independent predictors of RT, with P(stop) being a more prominent predictor in 75% of the subjects, and SSD being more prominent in the remaining 25%. The results demonstrate that humans indeed readily internalize environmental statistics and adjust their cognitive/behavioral strategy accordingly, and that subtle patterns in RT variability can serve as a valuable tool for validating models of statistical learning and decision-making. More broadly, the modeling tools presented in this work can be generalized to a large body of behavioral paradigms, in order to extract insights about cognitive and neural processing from apparently quite noisy behavioral measures. We also discuss how this behaviorally validated model can then be used to conduct model-based analysis of neural data, in order to help identify specific brain areas for representing and encoding key computational quantities in learning and decision-making.
Statistical Issues for Uncontrolled Reentry Hazards

NASA Technical Reports Server (NTRS)

Matney, Mark

2008-01-01

A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. This inevitably involves assumptions that simplify the problem and make it tractable, but it is often difficult to test the accuracy and applicability of these assumptions. This paper looks at a number of these theoretical assumptions, examining the mathematical basis for the hazard calculations, and outlining the conditions under which the simplifying assumptions hold. In addition, this paper will also outline some new tools for assessing ground hazard risk in useful ways. Also, this study is able to make use of a database of known uncontrolled reentry locations measured by the United States Department of Defense. By using data from objects that were in orbit more than 30 days before reentry, sufficient time is allowed for the orbital parameters to be randomized in the way the models are designed to compute. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors - including the effects of gravitational harmonics, the effects of the Earth's equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and change the ground footprints. The measured latitude and longitude distributions of these objects provide data that can be directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
TEAMS Model Analyzer

NASA Technical Reports Server (NTRS)

Tijidjian, Raffi P.

2010-01-01

The TEAMS model analyzer is a supporting tool developed to work with models created with TEAMS (Testability, Engineering, and Maintenance System), which was developed by QSI. In an effort to reduce the time spent in the manual process that each TEAMS modeler must perform in the preparation of reporting for model reviews, a new tool has been developed as an aid to models developed in TEAMS. The software allows for the viewing, reporting, and checking of TEAMS models that are checked into the TEAMS model database. The software allows the user to selectively model in a hierarchical tree outline view that displays the components, failure modes, and ports. The reporting features allow the user to quickly gather statistics about the model, and generate an input/output report pertaining to all of the components. Rules can be automatically validated against the model, with a report generated containing resulting inconsistencies. In addition to reducing manual effort, this software also provides an automated process framework for the Verification and Validation (V&V) effort that will follow development of these models. The aid of such an automated tool would have a significant impact on the V&V process.
Dose-Response Calculator for ArcGIS

USGS Publications Warehouse

Hanser, Steven E.; Aldridge, Cameron L.; Leu, Matthias; Nielsen, Scott E.

2011-01-01

The Dose-Response Calculator for ArcGIS is a tool that extends the Environmental Systems Research Institute (ESRI) ArcGIS 10 Desktop application to aid with the visualization of relationships between two raster GIS datasets. A dose-response curve is a line graph commonly used in medical research to examine the effects of different dosage rates of a drug or chemical (for example, carcinogen) on an outcome of interest (for example, cell mutations) (Russell and others, 1982). Dose-response curves have recently been used in ecological studies to examine the influence of an explanatory dose variable (for example, percentage of habitat cover, distance to disturbance) on a predicted response (for example, survival, probability of occurrence, abundance) (Aldridge and others, 2008). These dose curves have been created by calculating the predicted response value from a statistical model at different levels of the explanatory dose variable while holding values of other explanatory variables constant. Curves (plots) developed using the Dose-Response Calculator overcome the need to hold variables constant by using values extracted from the predicted response surface of a spatially explicit statistical model fit in a GIS, which include the variation of all explanatory variables, to visualize the univariate response to the dose variable. Application of the Dose-Response Calculator can be extended beyond the assessment of statistical model predictions and may be used to visualize the relationship between any two raster GIS datasets (see example in tool instructions). This tool generates tabular data for use in further exploration of dose-response relationships and a graph of the dose-response curve.
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

PubMed

Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

2010-11-01

The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
EEGLAB, SIFT, NFT, BCILAB, and ERICA: New Tools for Advanced EEG Processing

PubMed Central

Delorme, Arnaud; Mullen, Tim; Kothe, Christian; Akalin Acar, Zeynep; Bigdely-Shamlo, Nima; Vankov, Andrey; Makeig, Scott

2011-01-01

We describe a set of complementary EEG data collection and processing tools recently developed at the Swartz Center for Computational Neuroscience (SCCN) that connect to and extend the EEGLAB software environment, a freely available and readily extensible processing environment running under Matlab. The new tools include (1) a new and flexible EEGLAB STUDY design facility for framing and performing statistical analyses on data from multiple subjects; (2) a neuroelectromagnetic forward head modeling toolbox (NFT) for building realistic electrical head models from available data; (3) a source information flow toolbox (SIFT) for modeling ongoing or event-related effective connectivity between cortical areas; (4) a BCILAB toolbox for building online brain-computer interface (BCI) models from available data, and (5) an experimental real-time interactive control and analysis (ERICA) environment for real-time production and coordination of interactive, multimodal experiments. PMID:21687590
Investigating market efficiency through a forecasting model based on differential equations

NASA Astrophysics Data System (ADS)

de Resende, Charlene C.; Pereira, Adriano C. M.; Cardoso, Rodrigo T. N.; de Magalhães, A. R. Bosco

2017-05-01

A new differential equation based model for stock price trend forecast is proposed as a tool to investigate efficiency in an emerging market. Its predictive power showed statistically to be higher than the one of a completely random model, signaling towards the presence of arbitrage opportunities. Conditions for accuracy to be enhanced are investigated, and application of the model as part of a trading strategy is discussed.
Uncovering Local Trends in Genetic Effects of Multiple Phenotypes via Functional Linear Models.

PubMed

Vsevolozhskaya, Olga A; Zaykin, Dmitri V; Barondess, David A; Tong, Xiaoren; Jadhav, Sneha; Lu, Qing

2016-04-01

Recent technological advances equipped researchers with capabilities that go beyond traditional genotyping of loci known to be polymorphic in a general population. Genetic sequences of study participants can now be assessed directly. This capability removed technology-driven bias toward scoring predominantly common polymorphisms and let researchers reveal a wealth of rare and sample-specific variants. Although the relative contributions of rare and common polymorphisms to trait variation are being debated, researchers are faced with the need for new statistical tools for simultaneous evaluation of all variants within a region. Several research groups demonstrated flexibility and good statistical power of the functional linear model approach. In this work we extend previous developments to allow inclusion of multiple traits and adjustment for additional covariates. Our functional approach is unique in that it provides a nuanced depiction of effects and interactions for the variables in the model by representing them as curves varying over a genetic region. We demonstrate flexibility and competitive power of our approach by contrasting its performance with commonly used statistical tools and illustrate its potential for discovery and characterization of genetic architecture of complex traits using sequencing data from the Dallas Heart Study. Published 2016. This article is a U.S. Government work and is in the public domain in the USA.
Virtual Beach 3: User's Guide

EPA Science Inventory

Virtual Beach version 3 (VB3) is a decision support tool that constructs site-specific statistical models to predict fecal indicator bacteria (FIB) concentrations at recreational beaches. VB3 is primarily designed for beach managers responsible for making decisions regarding beac...
Mediator effect of statistical process control between Total Quality Management (TQM) and business performance in Malaysian Automotive Industry

NASA Astrophysics Data System (ADS)

Ahmad, M. F.; Rasi, R. Z.; Zakuan, N.; Hisyamudin, M. N. N.

2015-12-01

In today's highly competitive market, Total Quality Management (TQM) is vital management tool in ensuring a company can success in their business. In order to survive in the global market with intense competition amongst regions and enterprises, the adoption of tools and techniques are essential in improving business performance. There are consistent results between TQM and business performance. However, only few previous studies have examined the mediator effect namely statistical process control (SPC) between TQM and business performance. A mediator is a third variable that changes the association between an independent variable and an outcome variable. This study present research proposed a TQM performance model with mediator effect of SPC with structural equation modelling, which is a more comprehensive model for developing countries, specifically for Malaysia. A questionnaire was prepared and sent to 1500 companies from automotive industry and the related vendors in Malaysia, giving a 21.8 per cent rate. Attempts were made at findings significant impact of mediator between TQM practices and business performance showed that SPC is important tools and techniques in TQM implementation. The result concludes that SPC is partial correlation between and TQM and BP with indirect effect (IE) is 0.25 which can be categorised as high moderator effect.
Causal modelling applied to the risk assessment of a wastewater discharge.

PubMed

Paul, Warren L; Rokahr, Pat A; Webb, Jeff M; Rees, Gavin N; Clune, Tim S

2016-03-01

Bayesian networks (BNs), or causal Bayesian networks, have become quite popular in ecological risk assessment and natural resource management because of their utility as a communication and decision-support tool. Since their development in the field of artificial intelligence in the 1980s, however, Bayesian networks have evolved and merged with structural equation modelling (SEM). Unlike BNs, which are constrained to encode causal knowledge in conditional probability tables, SEMs encode this knowledge in structural equations, which is thought to be a more natural language for expressing causal information. This merger has clarified the causal content of SEMs and generalised the method such that it can now be performed using standard statistical techniques. As it was with BNs, the utility of this new generation of SEM in ecological risk assessment will need to be demonstrated with examples to foster an understanding and acceptance of the method. Here, we applied SEM to the risk assessment of a wastewater discharge to a stream, with a particular focus on the process of translating a causal diagram (conceptual model) into a statistical model which might then be used in the decision-making and evaluation stages of the risk assessment. The process of building and testing a spatial causal model is demonstrated using data from a spatial sampling design, and the implications of the resulting model are discussed in terms of the risk assessment. It is argued that a spatiotemporal causal model would have greater external validity than the spatial model, enabling broader generalisations to be made regarding the impact of a discharge, and greater value as a tool for evaluating the effects of potential treatment plant upgrades. Suggestions are made on how the causal model could be augmented to include temporal as well as spatial information, including suggestions for appropriate statistical models and analyses.
Comparative Investigation on Tool Wear during End Milling of AISI H13 Steel with Different Tool Path Strategies

NASA Astrophysics Data System (ADS)

Adesta, Erry Yulian T.; Riza, Muhammad; Avicena

2018-03-01

Tool wear prediction plays a significant role in machining industry for proper planning and control machining parameters and optimization of cutting conditions. This paper aims to investigate the effect of tool path strategies that are contour-in and zigzag tool path strategies applied on tool wear during pocket milling process. The experiments were carried out on CNC vertical machining centre by involving PVD coated carbide inserts. Cutting speed, feed rate and depth of cut were set to vary. In an experiment with three factors at three levels, Response Surface Method (RSM) design of experiment with a standard called Central Composite Design (CCD) was employed. Results obtained indicate that tool wear increases significantly at higher range of feed per tooth compared to cutting speed and depth of cut. This result of this experimental work is then proven statistically by developing empirical model. The prediction model for the response variable of tool wear for contour-in strategy developed in this research shows a good agreement with experimental work.
Linking Plasma Conditions in the Magnetosphere with Ionospheric Signatures

NASA Technical Reports Server (NTRS)

Rastaetter, Lutz; Kozyra, Janet; Kuznetsova, Maria M.; Berrios, David H.

2012-01-01

Modeling of the full magnetosphere, ring current and ionosphere system has become an indispensable tool in analyzing the series of events that occur during geomagnetic storms. The CCMC has a full model suite available for the magnetosphere, together with visualization tools that allow a user to perform a large variety of analyses. The January, 21, 2005 storm was a moderate-size storm that has been found to feature a large penetration electric field and unusually large polar caps (low-latitude precipitation patterns) that are otherwise found in super storms. Based on simulations runs at CCMC we can outline the likely causes of this behavior. Using visualization tools available to the online user we compare results from different magnetosphere models and present connections found between features in the magnetosphere and the ionosphere that are connected magnetically. The range of magnetic mappings found with different models can be compared with statistical models (Tsyganenko) and the model's fidelity can be verified with observations from low earth orbiting satellites such as DMSP and TIMED.

New generation of exploration tools: interactive modeling software and microcomputers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krajewski, S.A.

1986-08-01

Software packages offering interactive modeling techniques are now available for use on microcomputer hardware systems. These packages are reasonably priced for both company and independent explorationists; they do not require users to have high levels of computer literacy; they are capable of rapidly completing complex ranges of sophisticated geologic and geophysical modeling tasks; and they can produce presentation-quality output for comparison with real-world data. For example, interactive packages are available for mapping, log analysis, seismic modeling, reservoir studies, and financial projects as well as for applying a variety of statistical and geostatistical techniques to analysis of exploration data. More importantly,more » these packages enable explorationists to directly apply their geologic expertise when developing and fine-tuning models for identifying new prospects and for extending producing fields. As a result of these features, microcomputers and interactive modeling software are becoming common tools in many exploration offices. Gravity and magnetics software programs illustrate some of the capabilities of such exploration tools.« less
Web-based tools for modelling and analysis of multivariate data: California ozone pollution activity

PubMed Central

Dinov, Ivo D.; Christou, Nicolas

2014-01-01

This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges. PMID:24465054
Web-based tools for modelling and analysis of multivariate data: California ozone pollution activity.

PubMed

Dinov, Ivo D; Christou, Nicolas

2011-09-01

This article presents a hands-on web-based activity motivated by the relation between human health and ozone pollution in California. This case study is based on multivariate data collected monthly at 20 locations in California between 1980 and 2006. Several strategies and tools for data interrogation and exploratory data analysis, model fitting and statistical inference on these data are presented. All components of this case study (data, tools, activity) are freely available online at: http://wiki.stat.ucla.edu/socr/index.php/SOCR_MotionCharts_CAOzoneData. Several types of exploratory (motion charts, box-and-whisker plots, spider charts) and quantitative (inference, regression, analysis of variance (ANOVA)) data analyses tools are demonstrated. Two specific human health related questions (temporal and geographic effects of ozone pollution) are discussed as motivational challenges.
A biological compression model and its applications.

PubMed

Cao, Minh Duc; Dix, Trevor I; Allison, Lloyd

2011-01-01

A biological compression model, expert model, is presented which is superior to existing compression algorithms in both compression performance and speed. The model is able to compress whole eukaryotic genomes. Most importantly, the model provides a framework for knowledge discovery from biological data. It can be used for repeat element discovery, sequence alignment and phylogenetic analysis. We demonstrate that the model can handle statistically biased sequences and distantly related sequences where conventional knowledge discovery tools often fail.
Uncertainty quantification for nuclear density functional theory and information content of new measurements

DOE PAGES

McDonnell, J. D.; Schunck, N.; Higdon, D.; ...

2015-03-24

Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squaresmore » optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. In addition, the example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.« less
Improved nucleic acid descriptors for siRNA efficacy prediction.

PubMed

Sciabola, Simone; Cao, Qing; Orozco, Modesto; Faustino, Ignacio; Stanton, Robert V

2013-02-01

Although considerable progress has been made recently in understanding how gene silencing is mediated by the RNAi pathway, the rational design of effective sequences is still a challenging task. In this article, we demonstrate that including three-dimensional descriptors improved the discrimination between active and inactive small interfering RNAs (siRNAs) in a statistical model. Five descriptor types were used: (i) nucleotide position along the siRNA sequence, (ii) nucleotide composition in terms of presence/absence of specific combinations of di- and trinucleotides, (iii) nucleotide interactions by means of a modified auto- and cross-covariance function, (iv) nucleotide thermodynamic stability derived by the nearest neighbor model representation and (v) nucleic acid structure flexibility. The duplex flexibility descriptors are derived from extended molecular dynamics simulations, which are able to describe the sequence-dependent elastic properties of RNA duplexes, even for non-standard oligonucleotides. The matrix of descriptors was analysed using three statistical packages in R (partial least squares, random forest, and support vector machine), and the most predictive model was implemented in a modeling tool we have made publicly available through SourceForge. Our implementation of new RNA descriptors coupled with appropriate statistical algorithms resulted in improved model performance for the selection of siRNA candidates when compared with publicly available siRNA prediction tools and previously published test sets. Additional validation studies based on in-house RNA interference projects confirmed the robustness of the scoring procedure in prospective studies.
Uncertainty quantification for nuclear density functional theory and information content of new measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDonnell, J. D.; Schunck, N.; Higdon, D.

2015-03-24

Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squaresmore » optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. As a result, the example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.« less
A Parametric Model of Shoulder Articulation for Virtual Assessment of Space Suit Fit

NASA Technical Reports Server (NTRS)

Kim, K. Han; Young, Karen S.; Bernal, Yaritza; Boppana, Abhishektha; Vu, Linh Q.; Benson, Elizabeth A.; Jarvis, Sarah; Rajulu, Sudhakar L.

2016-01-01

Shoulder injury is one of the most severe risks that have the potential to impair crewmembers' performance and health in long duration space flight. Overall, 64% of crewmembers experience shoulder pain after extra-vehicular training in a space suit, and 14% of symptomatic crewmembers require surgical repair (Williams & Johnson, 2003). Suboptimal suit fit, in particular at the shoulder region, has been identified as one of the predominant risk factors. However, traditional suit fit assessments and laser scans represent only a single person's data, and thus may not be generalized across wide variations of body shapes and poses. The aim of this work is to develop a software tool based on a statistical analysis of a large dataset of crewmember body shapes. This tool can accurately predict the skin deformation and shape variations for any body size and shoulder pose for a target population, from which the geometry can be exported and evaluated against suit models in commercial CAD software. A preliminary software tool was developed by statistically analyzing 150 body shapes matched with body dimension ranges specified in the Human-Systems Integration Requirements of NASA ("baseline model"). Further, the baseline model was incorporated with shoulder joint articulation ("articulation model"), using additional subjects scanned in a variety of shoulder poses across a pre-specified range of motion. Scan data was cleaned and aligned using body landmarks. The skin deformation patterns were dimensionally reduced and the co-variation with shoulder angles was analyzed. A software tool is currently in development and will be presented in the final proceeding. This tool would allow suit engineers to parametrically generate body shapes in strategically targeted anthropometry dimensions and shoulder poses. This would also enable virtual fit assessments, with which the contact volume and clearance between the suit and body surface can be predictively quantified at reduced time and cost.
Statistical strategy for anisotropic adventitia modelling in IVUS.

PubMed

Gil, Debora; Hernández, Aura; Rodriguez, Oriol; Mauri, Josepa; Radeva, Petia

2006-06-01

Vessel plaque assessment by analysis of intravascular ultrasound sequences is a useful tool for cardiac disease diagnosis and intervention. Manual detection of luminal (inner) and media-adventitia (external) vessel borders is the main activity of physicians in the process of lumen narrowing (plaque) quantification. Difficult definition of vessel border descriptors, as well as, shades, artifacts, and blurred signal response due to ultrasound physical properties trouble automated adventitia segmentation. In order to efficiently approach such a complex problem, we propose blending advanced anisotropic filtering operators and statistical classification techniques into a vessel border modelling strategy. Our systematic statistical analysis shows that the reported adventitia detection achieves an accuracy in the range of interobserver variability regardless of plaque nature, vessel geometry, and incomplete vessel borders.
Bio-jETI: a service integration, design, and provisioning platform for orchestrated bioinformatics processes

PubMed Central

Margaria, Tiziana; Kubczak, Christian; Steffen, Bernhard

2008-01-01

Background With Bio-jETI, we introduce a service platform for interdisciplinary work on biological application domains and illustrate its use in a concrete application concerning statistical data processing in R and xcms for an LC/MS analysis of FAAH gene knockout. Methods Bio-jETI uses the jABC environment for service-oriented modeling and design as a graphical process modeling tool and the jETI service integration technology for remote tool execution. Conclusions As a service definition and provisioning platform, Bio-jETI has the potential to become a core technology in interdisciplinary service orchestration and technology transfer. Domain experts, like biologists not trained in computer science, directly define complex service orchestrations as process models and use efficient and complex bioinformatics tools in a simple and intuitive way. PMID:18460173
Comparison of four modeling tools for the prediction of potential distribution for non-indigenous weeds in the United States

USGS Publications Warehouse

Magarey, Roger; Newton, Leslie; Hong, Seung C.; Takeuchi, Yu; Christie, Dave; Jarnevich, Catherine S.; Kohl, Lisa; Damus, Martin; Higgins, Steven I.; Miller, Leah; Castro, Karen; West, Amanda; Hastings, John; Cook, Gericke; Kartesz, John; Koop, Anthony

2018-01-01

This study compares four models for predicting the potential distribution of non-indigenous weed species in the conterminous U.S. The comparison focused on evaluating modeling tools and protocols as currently used for weed risk assessment or for predicting the potential distribution of invasive weeds. We used six weed species (three highly invasive and three less invasive non-indigenous species) that have been established in the U.S. for more than 75 years. The experiment involved providing non-U. S. location data to users familiar with one of the four evaluated techniques, who then developed predictive models that were applied to the United States without knowing the identity of the species or its U.S. distribution. We compared a simple GIS climate matching technique known as Proto3, a simple climate matching tool CLIMEX Match Climates, the correlative model MaxEnt, and a process model known as the Thornley Transport Resistance (TTR) model. Two experienced users ran each modeling tool except TTR, which had one user. Models were trained with global species distribution data excluding any U.S. data, and then were evaluated using the current known U.S. distribution. The influence of weed species identity and modeling tool on prevalence and sensitivity effects was compared using a generalized linear mixed model. Each modeling tool itself had a low statistical significance, while weed species alone accounted for 69.1 and 48.5% of the variance for prevalence and sensitivity, respectively. These results suggest that simple modeling tools might perform as well as complex ones in the case of predicting potential distribution for a weed not yet present in the United States. Considerations of model accuracy should also be balanced with those of reproducibility and ease of use. More important than the choice of modeling tool is the construction of robust protocols and testing both new and experienced users under blind test conditions that approximate operational conditions.
Pre-selection and assessment of green organic solvents by clustering chemometric tools.

PubMed

Tobiszewski, Marek; Nedyalkova, Miroslava; Madurga, Sergio; Pena-Pereira, Francisco; Namieśnik, Jacek; Simeonov, Vasil

2018-01-01

The study presents the result of the application of chemometric tools for selection of physicochemical parameters of solvents for predicting missing variables - bioconcentration factors, water-octanol and octanol-air partitioning constants. EPI Suite software was successfully applied to predict missing values for solvents commonly considered as "green". Values for logBCF, logK OW and logK OA were modelled for 43 rather nonpolar solvents and 69 polar ones. Application of multivariate statistics was also proved to be useful in the assessment of the obtained modelling results. The presented approach can be one of the first steps and support tools in the assessment of chemicals in terms of their greenness. Copyright © 2017 Elsevier Inc. All rights reserved.
Analysis methodology and development of a statistical tool for biodistribution data from internal contamination with actinides.

PubMed

Lamart, Stephanie; Griffiths, Nina M; Tchitchek, Nicolas; Angulo, Jaime F; Van der Meeren, Anne

2017-03-01

The aim of this work was to develop a computational tool that integrates several statistical analysis features for biodistribution data from internal contamination experiments. These data represent actinide levels in biological compartments as a function of time and are derived from activity measurements in tissues and excreta. These experiments aim at assessing the influence of different contamination conditions (e.g. intake route or radioelement) on the biological behavior of the contaminant. The ever increasing number of datasets and diversity of experimental conditions make the handling and analysis of biodistribution data difficult. This work sought to facilitate the statistical analysis of a large number of datasets and the comparison of results from diverse experimental conditions. Functional modules were developed using the open-source programming language R to facilitate specific operations: descriptive statistics, visual comparison, curve fitting, and implementation of biokinetic models. In addition, the structure of the datasets was harmonized using the same table format. Analysis outputs can be written in text files and updated data can be written in the consistent table format. Hence, a data repository is built progressively, which is essential for the optimal use of animal data. Graphical representations can be automatically generated and saved as image files. The resulting computational tool was applied using data derived from wound contamination experiments conducted under different conditions. In facilitating biodistribution data handling and statistical analyses, this computational tool ensures faster analyses and a better reproducibility compared with the use of multiple office software applications. Furthermore, re-analysis of archival data and comparison of data from different sources is made much easier. Hence this tool will help to understand better the influence of contamination characteristics on actinide biokinetics. Our approach can aid the optimization of treatment protocols and therefore contribute to the improvement of the medical response after internal contamination with actinides.
Statistical aspects of modeling the labor curve.

PubMed

Zhang, Jun; Troendle, James; Grantz, Katherine L; Reddy, Uma M

2015-06-01

In a recent review by Cohen and Friedman, several statistical questions on modeling labor curves were raised. This article illustrates that asking data to fit a preconceived model or letting a sufficiently flexible model fit observed data is the main difference in principles of statistical modeling between the original Friedman curve and our average labor curve. An evidence-based approach to construct a labor curve and establish normal values should allow the statistical model to fit observed data. In addition, the presence of the deceleration phase in the active phase of an average labor curve was questioned. Forcing a deceleration phase to be part of the labor curve may have artificially raised the speed of progression in the active phase with a particularly large impact on earlier labor between 4 and 6 cm. Finally, any labor curve is illustrative and may not be instructive in managing labor because of variations in individual labor pattern and large errors in measuring cervical dilation. With the tools commonly available, it may be more productive to establish a new partogram that takes the physiology of labor and contemporary obstetric population into account. Copyright © 2015 Elsevier Inc. All rights reserved.
Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

DOE PAGES

Blanc, Élodie

2017-01-26

This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less
Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanc, Élodie

This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less
Score tests for independence in semiparametric competing risks models.

PubMed

Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul

2009-12-01

A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
BCM: toolkit for Bayesian analysis of Computational Models using samplers.

PubMed

Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

2016-10-21

Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

PubMed

Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

2016-06-17

Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.
Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

PubMed Central

Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

2016-01-01

Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

Uncertainty Analysis of Inertial Model Attitude Sensor Calibration and Application with a Recommended New Calibration Method

NASA Technical Reports Server (NTRS)

Tripp, John S.; Tcheng, Ping

1999-01-01

Statistical tools, previously developed for nonlinear least-squares estimation of multivariate sensor calibration parameters and the associated calibration uncertainty analysis, have been applied to single- and multiple-axis inertial model attitude sensors used in wind tunnel testing to measure angle of attack and roll angle. The analysis provides confidence and prediction intervals of calibrated sensor measurement uncertainty as functions of applied input pitch and roll angles. A comparative performance study of various experimental designs for inertial sensor calibration is presented along with corroborating experimental data. The importance of replicated calibrations over extended time periods has been emphasized; replication provides independent estimates of calibration precision and bias uncertainties, statistical tests for calibration or modeling bias uncertainty, and statistical tests for sensor parameter drift over time. A set of recommendations for a new standardized model attitude sensor calibration method and usage procedures is included. The statistical information provided by these procedures is necessary for the uncertainty analysis of aerospace test results now required by users of industrial wind tunnel test facilities.
fMRI paradigm designing and post-processing tools

PubMed Central

James, Jija S; Rajesh, PG; Chandran, Anuvitha VS; Kesavadas, Chandrasekharan

2014-01-01

In this article, we first review some aspects of functional magnetic resonance imaging (fMRI) paradigm designing for major cognitive functions by using stimulus delivery systems like Cogent, E-Prime, Presentation, etc., along with their technical aspects. We also review the stimulus presentation possibilities (block, event-related) for visual or auditory paradigms and their advantage in both clinical and research setting. The second part mainly focus on various fMRI data post-processing tools such as Statistical Parametric Mapping (SPM) and Brain Voyager, and discuss the particulars of various preprocessing steps involved (realignment, co-registration, normalization, smoothing) in these software and also the statistical analysis principles of General Linear Modeling for final interpretation of a functional activation result. PMID:24851001
Statistical Issues for Uncontrolled Reentry Hazards Empirical Tests of the Predicted Footprint for Uncontrolled Satellite Reentry Hazards

NASA Technical Reports Server (NTRS)

Matney, Mark

2011-01-01

A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, material, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. Because this information is used in making policy and engineering decisions, it is important that these assumptions be tested using empirical data. This study uses the latest database of known uncontrolled reentry locations measured by the United States Department of Defense. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors in the final stages of reentry - including the effects of gravitational harmonics, the effects of the Earth s equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and possibly change the probability of reentering over a given location. In this paper, the measured latitude and longitude distributions of these objects are directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
Construction of estimated flow- and load-duration curves for Kentucky using the Water Availability Tool for Environmental Resources (WATER)

USGS Publications Warehouse

Unthank, Michael D.; Newson, Jeremy K.; Williamson, Tanja N.; Nelson, Hugh L.

2012-01-01

Flow- and load-duration curves were constructed from the model outputs of the U.S. Geological Survey's Water Availability Tool for Environmental Resources (WATER) application for streams in Kentucky. The WATER application was designed to access multiple geospatial datasets to generate more than 60 years of statistically based streamflow data for Kentucky. The WATER application enables a user to graphically select a site on a stream and generate an estimated hydrograph and flow-duration curve for the watershed upstream of that point. The flow-duration curves are constructed by calculating the exceedance probability of the modeled daily streamflows. User-defined water-quality criteria and (or) sampling results can be loaded into the WATER application to construct load-duration curves that are based on the modeled streamflow results. Estimates of flow and streamflow statistics were derived from TOPographically Based Hydrological MODEL (TOPMODEL) simulations in the WATER application. A modified TOPMODEL code, SDP-TOPMODEL (Sinkhole Drainage Process-TOPMODEL) was used to simulate daily mean discharges over the period of record for 5 karst and 5 non-karst watersheds in Kentucky in order to verify the calibrated model. A statistical evaluation of the model's verification simulations show that calibration criteria, established by previous WATER application reports, were met thus insuring the model's ability to provide acceptably accurate estimates of discharge at gaged and ungaged sites throughout Kentucky. Flow-duration curves are constructed in the WATER application by calculating the exceedence probability of the modeled daily flow values. The flow-duration intervals are expressed as a percentage, with zero corresponding to the highest stream discharge in the streamflow record. Load-duration curves are constructed by applying the loading equation (Load = Flow*Water-quality criterion) at each flow interval.
On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

PubMed Central

Calcagno, Cristina; Coppo, Mario

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed. PMID:25050327
THE CAUSAL ANALYSIS / DIAGNOSIS DECISION ...

EPA Pesticide Factsheets

CADDIS is an on-line decision support system that helps investigators in the regions, states and tribes find, access, organize, use and share information to produce causal evaluations in aquatic systems. It is based on the US EPA's Stressor Identification process which is a formal method for identifying causes of impairments in aquatic systems. CADDIS 2007 increases access to relevant information useful for causal analysis and provides methods and tools that practitioners can use to analyze their own data. The new Candidate Cause section provides overviews of commonly encountered causes of impairments to aquatic systems: metals, sediments, nutrients, flow alteration, temperature, ionic strength, and low dissolved oxygen. CADDIS includes new Conceptual Models that illustrate the relationships from sources to stressors to biological effects. An Interactive Conceptual Model for phosphorus links the diagram with supporting literature citations. The new Analyzing Data section helps practitioners analyze their data sets and interpret and use those results as evidence within the USEPA causal assessment process. Downloadable tools include a graphical user interface statistical package (CADStat), and programs for use with the freeware R statistical package, and a Microsoft Excel template. These tools can be used to quantify associations between causes and biological impairments using innovative methods such as species-sensitivity distributions, biological inferenc
On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

PubMed

Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo

2014-01-01

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.
A network-base analysis of CMIP5 "historical" experiments

NASA Astrophysics Data System (ADS)

Bracco, A.; Foudalis, I.; Dovrolis, C.

2012-12-01

In computer science, "complex network analysis" refers to a set of metrics, modeling tools and algorithms commonly used in the study of complex nonlinear dynamical systems. Its main premise is that the underlying topology or network structure of a system has a strong impact on its dynamics and evolution. By allowing to investigate local and non-local statistical interaction, network analysis provides a powerful, but only marginally explored, framework to validate climate models and investigate teleconnections, assessing their strength, range, and impacts on the climate system. In this work we propose a new, fast, robust and scalable methodology to examine, quantify, and visualize climate sensitivity, while constraining general circulation models (GCMs) outputs with observations. The goal of our novel approach is to uncover relations in the climate system that are not (or not fully) captured by more traditional methodologies used in climate science and often adopted from nonlinear dynamical systems analysis, and to explain known climate phenomena in terms of the network structure or its metrics. Our methodology is based on a solid theoretical framework and employs mathematical and statistical tools, exploited only tentatively in climate research so far. Suitably adapted to the climate problem, these tools can assist in visualizing the trade-offs in representing global links and teleconnections among different data sets. Here we present the methodology, and compare network properties for different reanalysis data sets and a suite of CMIP5 coupled GCM outputs. With an extensive model intercomparison in terms of the climate network that each model leads to, we quantify how each model reproduces major teleconnections, rank model performances, and identify common or specific errors in comparing model outputs and observations.
Bulk tank somatic cell counts analyzed by statistical process control tools to identify and monitor subclinical mastitis incidence.

PubMed

Lukas, J M; Hawkins, D M; Kinsel, M L; Reneau, J K

2005-11-01

The objective of this study was to examine the relationship between monthly Dairy Herd Improvement (DHI) subclinical mastitis and new infection rate estimates and daily bulk tank somatic cell count (SCC) summarized by statistical process control tools. Dairy Herd Improvement Association test-day subclinical mastitis and new infection rate estimates along with daily or every other day bulk tank SCC data were collected for 12 mo of 2003 from 275 Upper Midwest dairy herds. Herds were divided into 5 herd production categories. A linear score [LNS = ln(BTSCC/100,000)/0.693147 + 3] was calculated for each individual bulk tank SCC. For both the raw SCC and the transformed data, the mean and sigma were calculated using the statistical quality control individual measurement and moving range chart procedure of Statistical Analysis System. One hundred eighty-three herds of the 275 herds from the study data set were then randomly selected and the raw (method 1) and transformed (method 2) bulk tank SCC mean and sigma were used to develop models for predicting subclinical mastitis and new infection rate estimates. Herd production category was also included in all models as 5 dummy variables. Models were validated by calculating estimates of subclinical mastitis and new infection rates for the remaining 92 herds and plotting them against observed values of each of the dependents. Only herd production category and bulk tank SCC mean were significant and remained in the final models. High R2 values (0.83 and 0.81 for methods 1 and 2, respectively) indicated a strong correlation between the bulk tank SCC and herd's subclinical mastitis prevalence. The standard errors of the estimate were 4.02 and 4.28% for methods 1 and 2, respectively, and decreased with increasing herd production. As a case study, Shewhart Individual Measurement Charts were plotted from the bulk tank SCC to identify shifts in mastitis incidence. Four of 5 charts examined signaled a change in bulk tank SCC before the DHI test day identified the change in subclinical mastitis prevalence. It can be concluded that applying statistical process control tools to daily bulk tank SCC can be used to estimate subclinical mastitis prevalence in the herd and observe for change in the subclinical mastitis status. Single DHI test day estimates of new infection rate were insufficient to accurately describe its dynamics.
Overview of the SAMSI year-long program on Statistical, Mathematical and Computational Methods for Astronomy

NASA Astrophysics Data System (ADS)

Jogesh Babu, G.

2017-01-01

A year-long research (Aug 2016- May 2017) program on `Statistical, Mathematical and Computational Methods for Astronomy (ASTRO)’ is well under way at Statistical and Applied Mathematical Sciences Institute (SAMSI), a National Science Foundation research institute in Research Triangle Park, NC. This program has brought together astronomers, computer scientists, applied mathematicians and statisticians. The main aims of this program are: to foster cross-disciplinary activities; to accelerate the adoption of modern statistical and mathematical tools into modern astronomy; and to develop new tools needed for important astronomical research problems. The program provides multiple avenues for cross-disciplinary interactions, including several workshops, long-term visitors, and regular teleconferences, so participants can continue collaborations, even if they can only spend limited time in residence at SAMSI. The main program is organized around five working groups:i) Uncertainty Quantification and Astrophysical Emulationii) Synoptic Time Domain Surveysiii) Multivariate and Irregularly Sampled Time Seriesiv) Astrophysical Populationsv) Statistics, computation, and modeling in cosmology.A brief description of each of the work under way by these groups will be given. Overlaps among various working groups will also be highlighted. How the wider astronomy community can both participate and benefit from the activities, will be briefly mentioned.
Defense Small Business Innovation Research Program (SBIR), Volume 4, Defense Agencies Abstracts of Phase 1 Awards 1991

DTIC Science & Technology

1991-01-01

EXPERIENCE IN DEVELOPING INTEGRATED OPTICAL DEVICES, NONLINEAR MAGNETIC-OPTIC MATERIALS, HIGH FREQUENCY MODULATORS, COMPUTER-AIDED MODELING AND SOPHISTICATED... HIGH -LEVEL PRESENTATION AND DISTRIBUTED CONTROL MODELS FOR INTEGRATING HETEROGENEOUS MECHANICAL ENGINEERING APPLICATIONS AND TOOLS. THE DESIGN IS FOCUSED...STATISTICALLY ACCURATE WORST CASE DEVICE MODELS FOR CIRCUIT SIMULATION. PRESENT METHODS OF WORST CASE DEVICE DESIGN ARE AD HOC AND DO NOT ALLOW THE
Replica analysis of overfitting in regression models for time-to-event data

NASA Astrophysics Data System (ADS)

Coolen, A. C. C.; Barrett, J. E.; Paga, P.; Perez-Vicente, C. J.

2017-09-01

Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitting, and even for Cox’s proportional hazards model (the main tool of medical statisticians), one finds in literature only rules of thumb on the number of samples required to avoid overfitting. In this paper we present a mathematical theory of overfitting in regression models for time-to-event data, which aims to increase our quantitative understanding of the problem and provide practical tools with which to correct regression outcomes for the impact of overfitting. It is based on the replica method, a statistical mechanical technique for the analysis of heterogeneous many-variable systems that has been used successfully for several decades in physics, biology, and computer science, but not yet in medical statistics. We develop the theory initially for arbitrary regression models for time-to-event data, and verify its predictions in detail for the popular Cox model.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Apte, A; Veeraraghavan, H; Oh, J

Purpose: To present an open source and free platform to facilitate radiomics research — The “Radiomics toolbox” in CERR. Method: There is scarcity of open source tools that support end-to-end modeling of image features to predict patient outcomes. The “Radiomics toolbox” strives to fill the need for such a software platform. The platform supports (1) import of various kinds of image modalities like CT, PET, MR, SPECT, US. (2) Contouring tools to delineate structures of interest. (3) Extraction and storage of image based features like 1st order statistics, gray-scale co-occurrence and zonesize matrix based texture features and shape features andmore » (4) Statistical Analysis. Statistical analysis of the extracted features is supported with basic functionality that includes univariate correlations, Kaplan-Meir curves and advanced functionality that includes feature reduction and multivariate modeling. The graphical user interface and the data management are performed with Matlab for the ease of development and readability of code and features for wide audience. Open-source software developed with other programming languages is integrated to enhance various components of this toolbox. For example: Java-based DCM4CHE for import of DICOM, R for statistical analysis. Results: The Radiomics toolbox will be distributed as an open source, GNU copyrighted software. The toolbox was prototyped for modeling Oropharyngeal PET dataset at MSKCC. The analysis will be presented in a separate paper. Conclusion: The Radiomics Toolbox provides an extensible platform for extracting and modeling image features. To emphasize new uses of CERR for radiomics and image-based research, we have changed the name from the “Computational Environment for Radiotherapy Research” to the “Computational Environment for Radiological Research”.« less
Extra-Tropical Cyclones at Climate Scales: Comparing Models to Observations

NASA Astrophysics Data System (ADS)

Tselioudis, G.; Bauer, M.; Rossow, W.

2009-04-01

Climate is often defined as the accumulation of weather, and weather is not the concern of climate models. Justification for this latter sentiment has long been hidden behind coarse model resolutions and blunt validation tools based on climatological maps. The spatial-temporal resolutions of today's climate models and observations are converging onto meteorological scales, however, which means that with the correct tools we can test the largely unproven assumption that climate model weather is correct enough that its accumulation results in a robust climate simulation. Towards this effort we introduce a new tool for extracting detailed cyclone statistics from observations and climate model output. These include the usual cyclone characteristics (centers, tracks), but also adaptive cyclone-centric composites. We have created a novel dataset, the MAP Climatology of Mid-latitude Storminess (MCMS), which provides a detailed 6 hourly assessment of the areas under the influence of mid-latitude cyclones, using a search algorithm that delimits the boundaries of each system from the outer-most closed SLP contour. Using this we then extract composites of cloud, radiation, and precipitation properties from sources such as ISCCP and GPCP to create a large comparative dataset for climate model validation. A demonstration of the potential usefulness of these tools in process-based climate model evaluation studies will be shown.
Modeling Psychological Contract Violation using Dual Regime Models: An Event-based Approach.

PubMed

Hofmans, Joeri

2017-01-01

A good understanding of the dynamics of psychological contract violation requires theories, research methods and statistical models that explicitly recognize that violation feelings follow from an event that violates one's acceptance limits, after which interpretative processes are set into motion, determining the intensity of these violation feelings. Whereas theories-in the form of the dynamic model of the psychological contract-and research methods-in the form of daily diary research and experience sampling research-are available by now, the statistical tools to model such a two-stage process are still lacking. The aim of the present paper is to fill this gap in the literature by introducing two statistical models-the Zero-Inflated model and the Hurdle model-that closely mimic the theoretical process underlying the elicitation violation feelings via two model components: a binary distribution that models whether violation has occurred or not, and a count distribution that models how severe the negative impact is. Moreover, covariates can be included for both model components separately, which yields insight into their unique and shared antecedents. By doing this, the present paper offers a methodological-substantive synergy, showing how sophisticated methodology can be used to examine an important substantive issue.
The potential of statistical shape modelling for geometric morphometric analysis of human teeth in archaeological research

PubMed Central

Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia

2017-01-01

This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199
Strengthen forensic entomology in court--the need for data exploration and the validation of a generalised additive mixed model.

PubMed

Baqué, Michèle; Amendt, Jens

2013-01-01

Developmental data of juvenile blow flies (Diptera: Calliphoridae) are typically used to calculate the age of immature stages found on or around a corpse and thus to estimate a minimum post-mortem interval (PMI(min)). However, many of those data sets don't take into account that immature blow flies grow in a non-linear fashion. Linear models do not supply a sufficient reliability on age estimates and may even lead to an erroneous determination of the PMI(min). According to the Daubert standard and the need for improvements in forensic science, new statistic tools like smoothing methods and mixed models allow the modelling of non-linear relationships and expand the field of statistical analyses. The present study introduces into the background and application of these statistical techniques by analysing a model which describes the development of the forensically important blow fly Calliphora vicina at different temperatures. The comparison of three statistical methods (linear regression, generalised additive modelling and generalised additive mixed modelling) clearly demonstrates that only the latter provided regression parameters that reflect the data adequately. We focus explicitly on both the exploration of the data--to assure their quality and to show the importance of checking it carefully prior to conducting the statistical tests--and the validation of the resulting models. Hence, we present a common method for evaluating and testing forensic entomological data sets by using for the first time generalised additive mixed models.
Wave and Wind Model Performance Metrics Tools

NASA Astrophysics Data System (ADS)

Choi, J. K.; Wang, D. W.

2016-02-01

Continual improvements and upgrades of Navy ocean wave and wind models are essential to the assurance of battlespace environment predictability of ocean surface wave and surf conditions in support of Naval global operations. Thus, constant verification and validation of model performance is equally essential to assure the progress of model developments and maintain confidence in the predictions. Global and regional scale model evaluations may require large areas and long periods of time. For observational data to compare against, altimeter winds and waves along the tracks from past and current operational satellites as well as moored/drifting buoys can be used for global and regional coverage. Using data and model runs in previous trials such as the planned experiment, the Dynamics of the Adriatic in Real Time (DART), we demonstrated the use of accumulated altimeter wind and wave data over several years to obtain an objective evaluation of the performance the SWAN (Simulating Waves Nearshore) model running in the Adriatic Sea. The assessment provided detailed performance of wind and wave models by using cell-averaged statistical variables maps with spatial statistics including slope, correlation, and scatter index to summarize model performance. Such a methodology is easily generalized to other regions and at global scales. Operational technology currently used by subject matter experts evaluating the Navy Coastal Ocean Model and the Hybrid Coordinate Ocean Model can be expanded to evaluate wave and wind models using tools developed for ArcMAP, a GIS application developed by ESRI. Recent inclusion of altimeter and buoy data into a format through the Naval Oceanographic Office's (NAVOCEANO) quality control system and the netCDF standards applicable to all model output makes it possible for the fusion of these data and direct model verification. Also, procedures were developed for the accumulation of match-ups of modelled and observed parameters to form a data base with which statistics are readily calculated, for the short or long term. Such a system has potential for a quick transition to operations at NAVOCEANO.
Identifying and Investigating Unexpected Response to Treatment: A Diabetes Case Study.

PubMed

Ozery-Flato, Michal; Ein-Dor, Liat; Parush-Shear-Yashuv, Naama; Aharonov, Ranit; Neuvirth, Hani; Kohn, Martin S; Hu, Jianying

2016-09-01

The availability of electronic health records creates fertile ground for developing computational models of various medical conditions. We present a new approach for detecting and analyzing patients with unexpected responses to treatment, building on machine learning and statistical methodology. Given a specific patient, we compute a statistical score for the deviation of the patient's response from responses observed in other patients having similar characteristics and medication regimens. These scores are used to define cohorts of patients showing deviant responses. Statistical tests are then applied to identify clinical features that correlate with these cohorts. We implement this methodology in a tool that is designed to assist researchers in the pharmaceutical field to uncover new features associated with reduced response to a treatment. It can also aid physicians by flagging patients who are not responding to treatment as expected and hence deserve more attention. The tool provides comprehensive visualizations of the analysis results and the supporting data, both at the cohort level and at the level of individual patients. We demonstrate the utility of our methodology and tool in a population of type II diabetic patients, treated with antidiabetic drugs, and monitored by the HbA1C test.
Integration of Advanced Statistical Analysis Tools and Geophysical Modeling

DTIC Science & Technology

2012-08-01

Carin Duke University Douglas Oldenburg University of British Columbia Stephen Billings Leonard Pasion Laurens Beran Sky Research...data processing for UXO discrimination is the time (or frequency) dependent dipole model (Bell and Barrow (2001), Pasion and Oldenburg (2001), Zhang...described by a bimodal distribution (i.e. two Gaussians, see Pasion (2007)). Data features are nonetheless useful when data quality is not sufficient

A (210)Pb-based chronological model for recent sediments with random entries of mass and activities: Model development.

PubMed

Abril Hernández, José-María

2016-01-01

Unsupported (210)Pb ((210)Pbexc) vs. mass depth profiles do not contain enough information as to extract a unique chronology when both, (210)Pbexc fluxes and mass sediment accumulation rates (SAR) independently vary with time. Restrictive assumptions are needed to develop a suitable dating tool. A statistical correlation between fluxes and SAR seems to be a quite general rule. This paper builds up a new (210)Pb-based dating tool by using such a statistical correlation. It operates with SAR and initial activities that closely follow normal distributions, what leads to the expected correlation between fluxes and SAR. An intelligent algorithm solves their best arrangement downcore to fit the experimental (210)Pbexc vs. mass depth profile, generating then solutions for the chronological line, and for the histories of SAR and fluxes. Parametric maps of a χ-function serve to find out the solution and to support error estimates. Optionally, the model's answers can be better constrained through the use of time markers. The performance of the model is illustrated with a synthetic core, and with real cases using published data for varved sediment cores. Copyright © 2015 Elsevier Ltd. All rights reserved.
EEG and MEG data analysis in SPM8.

PubMed

Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl

2011-01-01

SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools.
EEG and MEG Data Analysis in SPM8

PubMed Central

Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl

2011-01-01

SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools. PMID:21437221
Semantic Importance Sampling for Statistical Model Checking

DTIC Science & Technology

2015-01-16

SMT calls while maintaining correctness. Finally, we implement SIS in a tool called osmosis and use it to verify a number of stochastic systems with...2 surveys related work. Section 3 presents background definitions and concepts. Section 4 presents SIS, and Section 5 presents our tool osmosis . In...which I∗M|=Φ(x) = 1. We do this by first randomly selecting a cube c from C∗ with uniform probability since each cube has equal probability 9 5. OSMOSIS
Study of the Effect of Lubricant Emulsion Percentage and Tool Material on Surface Roughness in Machining of EN-AC 48000 Alloy

NASA Astrophysics Data System (ADS)

Soltani, E.; Shahali, H.; Zarepour, H.

2011-01-01

In this paper, the effect of machining parameters, namely, lubricant emulsion percentage and tool material on surface roughness has been studied in machining process of EN-AC 48000 aluminum alloy. EN-AC 48000 aluminum alloy is an important alloy in industries. Machining of this alloy is of vital importance due to built-up edge and tool wear. A L9 Taguchi standard orthogonal array has been applied as experimental design to investigate the effect of the factors and their interaction. Nine machining tests have been carried out with three random replications resulting in 27 experiments. Three type of cutting tools including coated carbide (CD1810), uncoated carbide (H10), and polycrystalline diamond (CD10) have been used in this research. Emulsion percentage of lubricant is selected at three levels including 3%, 5% and 10%. Statistical analysis has been employed to study the effect of factors and their interactions using ANOVA method. Moreover, the optimal factors level has been achieved through signal to noise ratio (S/N) analysis. Also, a regression model has been provided to predict the surface roughness. Finally, the results of the confirmation tests have been presented to verify the adequacy of the predictive model. In this research, surface quality was improved by 9% using lubricant and statistical optimization method.
Maximum entropy models as a tool for building precise neural controls.

PubMed

Savin, Cristina; Tkačik, Gašper

2017-10-01

Neural responses are highly structured, with population activity restricted to a small subset of the astronomical range of possible activity patterns. Characterizing these statistical regularities is important for understanding circuit computation, but challenging in practice. Here we review recent approaches based on the maximum entropy principle used for quantifying collective behavior in neural activity. We highlight recent models that capture population-level statistics of neural data, yielding insights into the organization of the neural code and its biological substrate. Furthermore, the MaxEnt framework provides a general recipe for constructing surrogate ensembles that preserve aspects of the data, but are otherwise maximally unstructured. This idea can be used to generate a hierarchy of controls against which rigorous statistical tests are possible. Copyright © 2017 Elsevier Ltd. All rights reserved.
Uterine Cancer Statistics

MedlinePlus

... Doing AMIGAS Stay Informed Cancer Home Uterine Cancer Statistics Language: English (US) Español (Spanish) Recommend on Facebook ... the most commonly diagnosed gynecologic cancer. U.S. Cancer Statistics Data Visualizations Tool The Data Visualizations tool makes ...
Light-weight Parallel Python Tools for Earth System Modeling Workflows

NASA Astrophysics Data System (ADS)

Mickelson, S. A.; Paul, K.; Xu, H.; Dennis, J.; Brown, D. I.

2015-12-01

With the growth in computing power over the last 30 years, earth system modeling codes have become increasingly data-intensive. As an example, it is expected that the data required for the next Intergovernmental Panel on Climate Change (IPCC) Assessment Report (AR6) will increase by more than 10x to an expected 25PB per climate model. Faced with this daunting challenge, developers of the Community Earth System Model (CESM) have chosen to change the format of their data for long-term storage from time-slice to time-series, in order to reduce the required download bandwidth needed for later analysis and post-processing by climate scientists. Hence, efficient tools are required to (1) perform the transformation of the data from time-slice to time-series format and to (2) compute climatology statistics, needed for many diagnostic computations, on the resulting time-series data. To address the first of these two challenges, we have developed a parallel Python tool for converting time-slice model output to time-series format. To address the second of these challenges, we have developed a parallel Python tool to perform fast time-averaging of time-series data. These tools are designed to be light-weight, be easy to install, have very few dependencies, and can be easily inserted into the Earth system modeling workflow with negligible disruption. In this work, we present the motivation, approach, and testing results of these two light-weight parallel Python tools, as well as our plans for future research and development.
A new scoring system in Cystic Fibrosis: statistical tools for database analysis - a preliminary report.

PubMed

Hafen, G M; Hurst, C; Yearwood, J; Smith, J; Dzalilov, Z; Robinson, P J

2008-10-05

Cystic fibrosis is the most common fatal genetic disorder in the Caucasian population. Scoring systems for assessment of Cystic fibrosis disease severity have been used for almost 50 years, without being adapted to the milder phenotype of the disease in the 21st century. The aim of this current project is to develop a new scoring system using a database and employing various statistical tools. This study protocol reports the development of the statistical tools in order to create such a scoring system. The evaluation is based on the Cystic Fibrosis database from the cohort at the Royal Children's Hospital in Melbourne. Initially, unsupervised clustering of the all data records was performed using a range of clustering algorithms. In particular incremental clustering algorithms were used. The clusters obtained were characterised using rules from decision trees and the results examined by clinicians. In order to obtain a clearer definition of classes expert opinion of each individual's clinical severity was sought. After data preparation including expert-opinion of an individual's clinical severity on a 3 point-scale (mild, moderate and severe disease), two multivariate techniques were used throughout the analysis to establish a method that would have a better success in feature selection and model derivation: 'Canonical Analysis of Principal Coordinates' and 'Linear Discriminant Analysis'. A 3-step procedure was performed with (1) selection of features, (2) extracting 5 severity classes out of a 3 severity class as defined per expert-opinion and (3) establishment of calibration datasets. (1) Feature selection: CAP has a more effective "modelling" focus than DA.(2) Extraction of 5 severity classes: after variables were identified as important in discriminating contiguous CF severity groups on the 3-point scale as mild/moderate and moderate/severe, Discriminant Function (DF) was used to determine the new groups mild, intermediate moderate, moderate, intermediate severe and severe disease. (3) Generated confusion tables showed a misclassification rate of 19.1% for males and 16.5% for females, with a majority of misallocations into adjacent severity classes particularly for males. Our preliminary data show that using CAP for detection of selection features and Linear DA to derive the actual model in a CF database might be helpful in developing a scoring system. However, there are several limitations, particularly more data entry points are needed to finalize a score and the statistical tools have further to be refined and validated, with re-running the statistical methods in the larger dataset.
Clinical implementation of a knowledge based planning tool for prostate VMAT.

PubMed

Powis, Richard; Bird, Andrew; Brennan, Matthew; Hinks, Susan; Newman, Hannah; Reed, Katie; Sage, John; Webster, Gareth

2017-05-08

A knowledge based planning tool has been developed and implemented for prostate VMAT radiotherapy plans providing a target average rectum dose value based on previously achievable values for similar rectum/PTV overlap. The purpose of this planning tool is to highlight sub-optimal clinical plans and to improve plan quality and consistency. A historical cohort of 97 VMAT prostate plans was interrogated using a RayStation script and used to develop a local model for predicting optimum average rectum dose based on individual anatomy. A preliminary validation study was performed whereby historical plans identified as "optimal" and "sub-optimal" by the local model were replanned in a blinded study by four experienced planners and compared to the original clinical plan to assess whether any improvement in rectum dose was observed. The predictive model was then incorporated into a RayStation script and used as part of the clinical planning process. Planners were asked to use the script during planning to provide a patient specific prediction for optimum average rectum dose and to optimise the plan accordingly. Plans identified as "sub-optimal" in the validation study observed a statistically significant improvement in average rectum dose compared to the clinical plan when replanned whereas plans that were identified as "optimal" observed no improvement when replanned. This provided confidence that the local model can identify plans that were suboptimal in terms of rectal sparing. Clinical implementation of the knowledge based planning tool reduced the population-averaged mean rectum dose by 5.6Gy. There was a small but statistically significant increase in total MU and femoral head dose and a reduction in conformity index. These did not affect the clinical acceptability of the plans and no significant changes to other plan quality metrics were observed. The knowledge-based planning tool has enabled substantial reductions in population-averaged mean rectum dose for prostate VMAT patients. This suggests plans are improved when planners receive quantitative feedback on plan quality against historical data.
The Impact of New Technology on Accounting Education.

ERIC Educational Resources Information Center

Shaoul, Jean

The introduction of computers in the Department of Accounting and Finance at Manchester University is described. General background outlining the increasing need for microcomputers in the accounting curriculum (including financial modelling tools and decision support systems such as linear programming, statistical packages, and simulation) is…
Primary Sclerosing Cholangitis Risk Estimate Tool (PREsTo) Predicts Outcomes in PSC: A Derivation & Validation Study Using Machine Learning.

PubMed

Eaton, John E; Vesterhus, Mette; McCauley, Bryan M; Atkinson, Elizabeth J; Schlicht, Erik M; Juran, Brian D; Gossard, Andrea A; LaRusso, Nicholas F; Gores, Gregory J; Karlsen, Tom H; Lazaridis, Konstantinos N

2018-05-09

Improved methods are needed to risk stratify and predict outcomes in patients with primary sclerosing cholangitis (PSC). Therefore, we sought to derive and validate a new prediction model and compare its performance to existing surrogate markers. The model was derived using 509 subjects from a multicenter North American cohort and validated in an international multicenter cohort (n=278). Gradient boosting, a machine based learning technique, was used to create the model. The endpoint was hepatic decompensation (ascites, variceal hemorrhage or encephalopathy). Subjects with advanced PSC or cholangiocarcinoma at baseline were excluded. The PSC risk estimate tool (PREsTo) consists of 9 variables: bilirubin, albumin, serum alkaline phosphatase (SAP) times the upper limit of normal (ULN), platelets, AST, hemoglobin, sodium, patient age and the number of years since PSC was diagnosed. Validation in an independent cohort confirms PREsTo accurately predicts decompensation (C statistic 0.90, 95% confidence interval (CI) 0.84-0.95) and performed well compared to MELD score (C statistic 0.72, 95% CI 0.57-0.84), Mayo PSC risk score (C statistic 0.85, 95% CI 0.77-0.92) and SAP < 1.5x ULN (C statistic 0.65, 95% CI 0.55-0.73). PREsTo continued to be accurate among individuals with a bilirubin < 2.0 mg/dL (C statistic 0.90, 95% CI 0.82-0.96) and when the score was re-applied at a later course in the disease (C statistic 0.82, 95% CI 0.64-0.95). PREsTo accurately predicts hepatic decompensation in PSC and exceeds the performance among other widely available, noninvasive prognostic scoring systems. This article is protected by copyright. All rights reserved. © 2018 by the American Association for the Study of Liver Diseases.
System and Software Reliability (C103)

NASA Technical Reports Server (NTRS)

Wallace, Dolores

2003-01-01

Within the last decade better reliability models (hardware. software, system) than those currently used have been theorized and developed but not implemented in practice. Previous research on software reliability has shown that while some existing software reliability models are practical, they are no accurate enough. New paradigms of development (e.g. OO) have appeared and associated reliability models have been proposed posed but not investigated. Hardware models have been extensively investigated but not integrated into a system framework. System reliability modeling is the weakest of the three. NASA engineers need better methods and tools to demonstrate that the products meet NASA requirements for reliability measurement. For the new models for the software component of the last decade, there is a great need to bring them into a form that they can be used on software intensive systems. The Statistical Modeling and Estimation of Reliability Functions for Systems (SMERFS'3) tool is an existing vehicle that may be used to incorporate these new modeling advances. Adapting some existing software reliability modeling changes to accommodate major changes in software development technology may also show substantial improvement in prediction accuracy. With some additional research, the next step is to identify and investigate system reliability. System reliability models could then be incorporated in a tool such as SMERFS'3. This tool with better models would greatly add value in assess in GSFC projects.
The impact of nutritional status and longitudinal recovery of motor and cognitive milestones in internationally adopted children.

PubMed

Park, Hyun; Bothe, Denise; Holsinger, Eva; Kirchner, H Lester; Olness, Karen; Mandalakas, Anna

2011-01-01

Internationally adopted children often arrive from institutional settings where they have experienced medical, nutritional and psychosocial deprivation. This study uses a validated research assessment tool to prospectively assess the impact of baseline (immediately post adoption) nutritional status on fifty-eight children as measured by weight-for-age, height-for-age, weight-for-height and head circumference-for-age z scores, as a determinant of cognitive (MDI) and psychomotor development (PDI) scores longitudinally. A statistical model was developed to allow for different ages at time of initial assessment as well as variable intervals between follow up visits. The study results show that both acute and chronic measures of malnutrition significantly affect baseline developmental status as well as the rate of improvement in both MDI and PDI scores. This study contributes to the body of literature with its prospective nature, unique statistical model for longitudinal evaluation, and use of a validated assessment tool to assess outcomes.
External validation of the Probability of repeated admission (Pra) risk prediction tool in older community-dwelling people attending general practice: a prospective cohort study.

PubMed

Wallace, Emma; McDowell, Ronald; Bennett, Kathleen; Fahey, Tom; Smith, Susan M

2016-11-14

Emergency admission is associated with the potential for adverse events in older people and risk prediction models are available to identify those at highest risk of admission. The aim of this study was to externally validate and compare the performance of the Probability of repeated admission (Pra) risk model and a modified version (incorporating a multimorbidity measure) in predicting emergency admission in older community-dwelling people. 15 general practices (GPs) in the Republic of Ireland. n=862, ≥70 years, community-dwelling people prospectively followed up for 2 years (2010-2012). Pra risk model (original and modified) calculated for baseline year where ≥0.5 denoted high risk (patient questionnaire, GP medical record review) of future emergency admission. Emergency admission over 1 year (GP medical record review). descriptive statistics, model discrimination (c-statistic) and calibration (Hosmer-Lemeshow statistic). Of 862 patients, a total of 154 (18%) had ≥1 emergency admission(s) in the follow-up year. 63 patients (7%) were classified as high risk by the original Pra and of these 26 (41%) were admitted. The modified Pra classified 391 (45%) patients as high risk and 103 (26%) were subsequently admitted. Both models demonstrated only poor discrimination (original Pra: c-statistic 0.65 (95% CI 0.61 to 0.70); modified Pra: c-statistic 0.67 (95% CI 0.62 to 0.72)). When categorised according to risk-category model, specificity was highest for the original Pra at cut-point of ≥0.5 denoting high risk (95%), and for the modified Pra at cut-point of ≥0.7 (95%). Both models overestimated the number of admissions across all risk strata. While the original Pra model demonstrated poor discrimination, model specificity was high and a small number of patients identified as high risk. Future validation studies should examine higher cut-points denoting high risk for the modified Pra, which has practical advantages in terms of application in GP. The original Pra tool may have a role in identifying higher-risk community-dwelling older people for inclusion in future trials aiming to reduce emergency admissions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Equilibrium statistical-thermal models in high-energy physics

NASA Astrophysics Data System (ADS)

Tawfik, Abdel Nasser

2014-05-01

We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.
Quantifying, displaying and accounting for heterogeneity in the meta-analysis of RCTs using standard and generalised Q statistics

PubMed Central

2011-01-01

Background Clinical researchers have often preferred to use a fixed effects model for the primary interpretation of a meta-analysis. Heterogeneity is usually assessed via the well known Q and I2 statistics, along with the random effects estimate they imply. In recent years, alternative methods for quantifying heterogeneity have been proposed, that are based on a 'generalised' Q statistic. Methods We review 18 IPD meta-analyses of RCTs into treatments for cancer, in order to quantify the amount of heterogeneity present and also to discuss practical methods for explaining heterogeneity. Results Differing results were obtained when the standard Q and I2 statistics were used to test for the presence of heterogeneity. The two meta-analyses with the largest amount of heterogeneity were investigated further, and on inspection the straightforward application of a random effects model was not deemed appropriate. Compared to the standard Q statistic, the generalised Q statistic provided a more accurate platform for estimating the amount of heterogeneity in the 18 meta-analyses. Conclusions Explaining heterogeneity via the pre-specification of trial subgroups, graphical diagnostic tools and sensitivity analyses produced a more desirable outcome than an automatic application of the random effects model. Generalised Q statistic methods for quantifying and adjusting for heterogeneity should be incorporated as standard into statistical software. Software is provided to help achieve this aim. PMID:21473747
Multi-region statistical shape model for cochlear implantation

NASA Astrophysics Data System (ADS)

Romera, Jordi; Kjer, H. Martin; Piella, Gemma; Ceresa, Mario; González Ballester, Miguel A.

2016-03-01

Statistical shape models are commonly used to analyze the variability between similar anatomical structures and their use is established as a tool for analysis and segmentation of medical images. However, using a global model to capture the variability of complex structures is not enough to achieve the best results. The complexity of a proper global model increases even more when the amount of data available is limited to a small number of datasets. Typically, the anatomical variability between structures is associated to the variability of their physiological regions. In this paper, a complete pipeline is proposed for building a multi-region statistical shape model to study the entire variability from locally identified physiological regions of the inner ear. The proposed model, which is based on an extension of the Point Distribution Model (PDM), is built for a training set of 17 high-resolution images (24.5 μm voxels) of the inner ear. The model is evaluated according to its generalization ability and specificity. The results are compared with the ones of a global model built directly using the standard PDM approach. The evaluation results suggest that better accuracy can be achieved using a regional modeling of the inner ear.
Modeling Psychological Contract Violation using Dual Regime Models: An Event-based Approach

PubMed Central

Hofmans, Joeri

2017-01-01

A good understanding of the dynamics of psychological contract violation requires theories, research methods and statistical models that explicitly recognize that violation feelings follow from an event that violates one's acceptance limits, after which interpretative processes are set into motion, determining the intensity of these violation feelings. Whereas theories—in the form of the dynamic model of the psychological contract—and research methods—in the form of daily diary research and experience sampling research—are available by now, the statistical tools to model such a two-stage process are still lacking. The aim of the present paper is to fill this gap in the literature by introducing two statistical models—the Zero-Inflated model and the Hurdle model—that closely mimic the theoretical process underlying the elicitation violation feelings via two model components: a binary distribution that models whether violation has occurred or not, and a count distribution that models how severe the negative impact is. Moreover, covariates can be included for both model components separately, which yields insight into their unique and shared antecedents. By doing this, the present paper offers a methodological-substantive synergy, showing how sophisticated methodology can be used to examine an important substantive issue. PMID:29163316
Ship detection using STFT sea background statistical modeling for large-scale oceansat remote sensing image

NASA Astrophysics Data System (ADS)

Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan

2018-03-01

Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.

Statistical model with two order parameters for ductile and soft fiber bundles in nanoscience and biomaterials.

PubMed

Rinaldi, Antonio

2011-04-01

Traditional fiber bundles models (FBMs) have been an effective tool to understand brittle heterogeneous systems. However, fiber bundles in modern nano- and bioapplications demand a new generation of FBM capturing more complex deformation processes in addition to damage. In the context of loose bundle systems and with reference to time-independent plasticity and soft biomaterials, we formulate a generalized statistical model for ductile fracture and nonlinear elastic problems capable of handling more simultaneous deformation mechanisms by means of two order parameters (as opposed to one). As the first rational FBM for coupled damage problems, it may be the cornerstone for advanced statistical models of heterogeneous systems in nanoscience and materials design, especially to explore hierarchical and bio-inspired concepts in the arena of nanobiotechnology. Applicative examples are provided for illustrative purposes at last, discussing issues in inverse analysis (i.e., nonlinear elastic polymer fiber and ductile Cu submicron bars arrays) and direct design (i.e., strength prediction).
On entropy, financial markets and minority games

NASA Astrophysics Data System (ADS)

Zapart, Christopher A.

2009-04-01

The paper builds upon an earlier statistical analysis of financial time series with Shannon information entropy, published in [L. Molgedey, W. Ebeling, Local order, entropy and predictability of financial time series, European Physical Journal B-Condensed Matter and Complex Systems 15/4 (2000) 733-737]. A novel generic procedure is proposed for making multistep-ahead predictions of time series by building a statistical model of entropy. The approach is first demonstrated on the chaotic Mackey-Glass time series and later applied to Japanese Yen/US dollar intraday currency data. The paper also reinterprets Minority Games [E. Moro, The minority game: An introductory guide, Advances in Condensed Matter and Statistical Physics (2004)] within the context of physical entropy, and uses models derived from minority game theory as a tool for measuring the entropy of a model in response to time series. This entropy conditional upon a model is subsequently used in place of information-theoretic entropy in the proposed multistep prediction algorithm.
Modeling to Optimize Terminal Stem Cell Differentiation

PubMed Central

Gallicano, G. Ian

2013-01-01

Embryonic stem cell (ESC), iPCs, and adult stem cells (ASCs) all are among the most promising potential treatments for heart failure, spinal cord injury, neurodegenerative diseases, and diabetes. However, considerable uncertainty in the production of ESC-derived terminally differentiated cell types has limited the efficiency of their development. To address this uncertainty, we and other investigators have begun to employ a comprehensive statistical model of ESC differentiation for determining the role of intracellular pathways (e.g., STAT3) in ESC differentiation and determination of germ layer fate. The approach discussed here applies the Baysian statistical model to cell/developmental biology combining traditional flow cytometry methodology and specific morphological observations with advanced statistical and probabilistic modeling and experimental design. The final result of this study is a unique tool and model that enhances the understanding of how and when specific cell fates are determined during differentiation. This model provides a guideline for increasing the production efficiency of therapeutically viable ESCs/iPSCs/ASC derived neurons or any other cell type and will eventually lead to advances in stem cell therapy. PMID:24278782
Software for Data Analysis with Graphical Models

NASA Technical Reports Server (NTRS)

Buntine, Wray L.; Roy, H. Scott

1994-01-01

Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
Data Analysis with Graphical Models: Software Tools

NASA Technical Reports Server (NTRS)

Buntine, Wray L.

1994-01-01

Probabilistic graphical models (directed and undirected Markov fields, and combined in chain graphs) are used widely in expert systems, image processing and other areas as a framework for representing and reasoning with probabilities. They come with corresponding algorithms for performing probabilistic inference. This paper discusses an extension to these models by Spiegelhalter and Gilks, plates, used to graphically model the notion of a sample. This offers a graphical specification language for representing data analysis problems. When combined with general methods for statistical inference, this also offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper outlines the framework and then presents some basic tools for the task: a graphical version of the Pitman-Koopman Theorem for the exponential family, problem decomposition, and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software.
An analysis, sensitivity and prediction of winter fog events using FASP model over Indo-Gangetic plains, India

NASA Astrophysics Data System (ADS)

Srivastava, S. K., Sr.; Sharma, D. A.; Sachdeva, K.

2017-12-01

Indo-Gangetic plains of India experience severe fog conditions during the peak winter months of December and January every year. In this paper an attempt has been to analyze the spatial and temporal variability of winter fog over Indo-Gangetic plains. Further, an attempt has also been made to configure an efficient meso-scale numerical weather prediction model using different parameterization schemes and develop a forecasting tool for prediction of fog during winter months over Indo-Gangetic plains. The study revealed that an alarming increasing positive trend of fog frequency prevails over many locations of IGP. Hot spot and cluster analysis were conducted to identify the high fog prone zones using GIS and inferential statistical tools respectively. Hot spots on an average experiences fog on 68.27% days, it is followed by moderate and cold spots with 48.03% and 21.79% respectively. The study proposes a new FASP (Fog Analysis, sensitivity and prediction) Model for overall analysis and prediction of fog at a particular location and period over IGP. In the first phase of this model long term climatological fog data of a location is analyzed to determine its characteristics and prevailing trend using various advanced statistical techniques. During a second phase a sensitivity test is conducted with different combination of parameterization schemes to determine the most suitable combination for fog simulation over a particular location and period and in the third and final phase, first ARIMA model is used to predict the number of fog days in future . Thereafter, Numerical model is used to predict the various meteorological parameters favourable for fog forecast. Finally, Hybrid model is used for fog forecast over the study location. The results of the FASP model are validated with actual ground based fog data using statistical tools. Forecast Fog-gram generated using hybrid model during Jan 2017 shows highly encouraging results for fog occurrence/Non occurrence between 25 hrs to 72 hours forecast. The model predicted the fog occurrences/Non occurrence with more than 85 % accuracy over most of the locations across the study area. The minimum visibility departure is within 500 m on 90% occasions over the central IGP and within 1000m on more than 80 % occasions over most of the locations across Indo-Gangetic plains.
Physics-based statistical learning approach to mesoscopic model selection.

PubMed

Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab

2015-11-01

In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
Evaluating statistical consistency in the ocean model component of the Community Earth System Model (pyCECT v2.0)

NASA Astrophysics Data System (ADS)

Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen

2016-07-01

The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.
Statistical power analysis of cardiovascular safety pharmacology studies in conscious rats.

PubMed

Bhatt, Siddhartha; Li, Dingzhou; Flynn, Declan; Wisialowski, Todd; Hemkens, Michelle; Steidl-Nichols, Jill

2016-01-01

Cardiovascular (CV) toxicity and related attrition are a major challenge for novel therapeutic entities and identifying CV liability early is critical for effective derisking. CV safety pharmacology studies in rats are a valuable tool for early investigation of CV risk. Thorough understanding of data analysis techniques and statistical power of these studies is currently lacking and is imperative for enabling sound decision-making. Data from 24 crossover and 12 parallel design CV telemetry rat studies were used for statistical power calculations. Average values of telemetry parameters (heart rate, blood pressure, body temperature, and activity) were logged every 60s (from 1h predose to 24h post-dose) and reduced to 15min mean values. These data were subsequently binned into super intervals for statistical analysis. A repeated measure analysis of variance was used for statistical analysis of crossover studies and a repeated measure analysis of covariance was used for parallel studies. Statistical power analysis was performed to generate power curves and establish relationships between detectable CV (blood pressure and heart rate) changes and statistical power. Additionally, data from a crossover CV study with phentolamine at 4, 20 and 100mg/kg are reported as a representative example of data analysis methods. Phentolamine produced a CV profile characteristic of alpha adrenergic receptor antagonism, evidenced by a dose-dependent decrease in blood pressure and reflex tachycardia. Detectable blood pressure changes at 80% statistical power for crossover studies (n=8) were 4-5mmHg. For parallel studies (n=8), detectable changes at 80% power were 6-7mmHg. Detectable heart rate changes for both study designs were 20-22bpm. Based on our results, the conscious rat CV model is a sensitive tool to detect and mitigate CV risk in early safety studies. Furthermore, these results will enable informed selection of appropriate models and study design for early stage CV studies. Copyright © 2016 Elsevier Inc. All rights reserved.
Statistical mechanics of competitive resource allocation using agent-based models

NASA Astrophysics Data System (ADS)

Chakraborti, Anirban; Challet, Damien; Chatterjee, Arnab; Marsili, Matteo; Zhang, Yi-Cheng; Chakrabarti, Bikas K.

2015-01-01

Demand outstrips available resources in most situations, which gives rise to competition, interaction and learning. In this article, we review a broad spectrum of multi-agent models of competition (El Farol Bar problem, Minority Game, Kolkata Paise Restaurant problem, Stable marriage problem, Parking space problem and others) and the methods used to understand them analytically. We emphasize the power of concepts and tools from statistical mechanics to understand and explain fully collective phenomena such as phase transitions and long memory, and the mapping between agent heterogeneity and physical disorder. As these methods can be applied to any large-scale model of competitive resource allocation made up of heterogeneous adaptive agent with non-linear interaction, they provide a prospective unifying paradigm for many scientific disciplines.
An automated process for building reliable and optimal in vitro/in vivo correlation models based on Monte Carlo simulations.

PubMed

Sutton, Steven C; Hu, Mingxiu

2006-05-05

Many mathematical models have been proposed for establishing an in vitro/in vivo correlation (IVIVC). The traditional IVIVC model building process consists of 5 steps: deconvolution, model fitting, convolution, prediction error evaluation, and cross-validation. This is a time-consuming process and typically a few models at most are tested for any given data set. The objectives of this work were to (1) propose a statistical tool to screen models for further development of an IVIVC, (2) evaluate the performance of each model under different circumstances, and (3) investigate the effectiveness of common statistical model selection criteria for choosing IVIVC models. A computer program was developed to explore which model(s) would be most likely to work well with a random variation from the original formulation. The process used Monte Carlo simulation techniques to build IVIVC models. Data-based model selection criteria (Akaike Information Criteria [AIC], R2) and the probability of passing the Food and Drug Administration "prediction error" requirement was calculated. To illustrate this approach, several real data sets representing a broad range of release profiles are used to illustrate the process and to demonstrate the advantages of this automated process over the traditional approach. The Hixson-Crowell and Weibull models were often preferred over the linear. When evaluating whether a Level A IVIVC model was possible, the model selection criteria AIC generally selected the best model. We believe that the approach we proposed may be a rapid tool to determine which IVIVC model (if any) is the most applicable.
RIPARIAN SHADE CONTROLS ON STREAM TEMPERATURE NOW AND IN THE FUTURE ACROSS TRIBUTARIES OF THE COLUMBIA RIVER, USA

EPA Science Inventory

Future climates may warm stream temperatures altering aquatic communities and threatening socioeconomically-important species. These impacts will vary across large spatial extents and require special evaluation tools. Statistical stream network models (SSNs) account for spatial a...
A STATISTICAL MODELING METHODOLOGY FOR THE DETECTION, QUANTIFICATION, AND PREDICTION OF ECOLOGICAL THRESHOLDS

EPA Science Inventory

This study will provide a general methodology for integrating threshold information from multiple species ecological metrics, allow for prediction of changes of alternative stable states, and provide a risk assessment tool that can be applied to adaptive management. The integr...
Predicting Fecal Indicator Bacteria Concentrations in the South Fork Broad River Watershed Using Virtual Beach

EPA Science Inventory

Virtual Beach (VB) is a decision support tool that constructs site-specific statistical models to predict fecal indicator bacteria (FIB) at recreational beaches. Although primarily designed for making decisions regarding beach closures or issuance of swimming advisories based on...
Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: An example from a vertigo phase III study with longitudinal count data as primary endpoint

PubMed Central

2012-01-01

Background A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. Methods We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). Results The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. Conclusions The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint. PMID:22962944
Bayesian model selection techniques as decision support for shaping a statistical analysis plan of a clinical trial: an example from a vertigo phase III study with longitudinal count data as primary endpoint.

PubMed

Adrion, Christine; Mansmann, Ulrich

2012-09-10

A statistical analysis plan (SAP) is a critical link between how a clinical trial is conducted and the clinical study report. To secure objective study results, regulatory bodies expect that the SAP will meet requirements in pre-specifying inferential analyses and other important statistical techniques. To write a good SAP for model-based sensitivity and ancillary analyses involves non-trivial decisions on and justification of many aspects of the chosen setting. In particular, trials with longitudinal count data as primary endpoints pose challenges for model choice and model validation. In the random effects setting, frequentist strategies for model assessment and model diagnosis are complex and not easily implemented and have several limitations. Therefore, it is of interest to explore Bayesian alternatives which provide the needed decision support to finalize a SAP. We focus on generalized linear mixed models (GLMMs) for the analysis of longitudinal count data. A series of distributions with over- and under-dispersion is considered. Additionally, the structure of the variance components is modified. We perform a simulation study to investigate the discriminatory power of Bayesian tools for model criticism in different scenarios derived from the model setting. We apply the findings to the data from an open clinical trial on vertigo attacks. These data are seen as pilot data for an ongoing phase III trial. To fit GLMMs we use a novel Bayesian computational approach based on integrated nested Laplace approximations (INLAs). The INLA methodology enables the direct computation of leave-one-out predictive distributions. These distributions are crucial for Bayesian model assessment. We evaluate competing GLMMs for longitudinal count data according to the deviance information criterion (DIC) or probability integral transform (PIT), and by using proper scoring rules (e.g. the logarithmic score). The instruments under study provide excellent tools for preparing decisions within the SAP in a transparent way when structuring the primary analysis, sensitivity or ancillary analyses, and specific analyses for secondary endpoints. The mean logarithmic score and DIC discriminate well between different model scenarios. It becomes obvious that the naive choice of a conventional random effects Poisson model is often inappropriate for real-life count data. The findings are used to specify an appropriate mixed model employed in the sensitivity analyses of an ongoing phase III trial. The proposed Bayesian methods are not only appealing for inference but notably provide a sophisticated insight into different aspects of model performance, such as forecast verification or calibration checks, and can be applied within the model selection process. The mean of the logarithmic score is a robust tool for model ranking and is not sensitive to sample size. Therefore, these Bayesian model selection techniques offer helpful decision support for shaping sensitivity and ancillary analyses in a statistical analysis plan of a clinical trial with longitudinal count data as the primary endpoint.
From data to the decision: A software architecture to integrate predictive modelling in clinical settings.

PubMed

Martinez-Millana, A; Fernandez-Llatas, C; Sacchi, L; Segagni, D; Guillen, S; Bellazzi, R; Traver, V

2015-08-01

The application of statistics and mathematics over large amounts of data is providing healthcare systems with new tools for screening and managing multiple diseases. Nonetheless, these tools have many technical and clinical limitations as they are based on datasets with concrete characteristics. This proposition paper describes a novel architecture focused on providing a validation framework for discrimination and prediction models in the screening of Type 2 diabetes. For that, the architecture has been designed to gather different data sources under a common data structure and, furthermore, to be controlled by a centralized component (Orchestrator) in charge of directing the interaction flows among data sources, models and graphical user interfaces. This innovative approach aims to overcome the data-dependency of the models by providing a validation framework for the models as they are used within clinical settings.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

PubMed

Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

2016-12-01

In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Adaptive Error Estimation in Linearized Ocean General Circulation Models

NASA Technical Reports Server (NTRS)

Chechelnitsky, Michael Y.

1999-01-01

Data assimilation methods are routinely used in oceanography. The statistics of the model and measurement errors need to be specified a priori. This study addresses the problem of estimating model and measurement error statistics from observations. We start by testing innovation based methods of adaptive error estimation with low-dimensional models in the North Pacific (5-60 deg N, 132-252 deg E) to TOPEX/POSEIDON (TIP) sea level anomaly data, acoustic tomography data from the ATOC project, and the MIT General Circulation Model (GCM). A reduced state linear model that describes large scale internal (baroclinic) error dynamics is used. The methods are shown to be sensitive to the initial guess for the error statistics and the type of observations. A new off-line approach is developed, the covariance matching approach (CMA), where covariance matrices of model-data residuals are "matched" to their theoretical expectations using familiar least squares methods. This method uses observations directly instead of the innovations sequence and is shown to be related to the MT method and the method of Fu et al. (1993). Twin experiments using the same linearized MIT GCM suggest that altimetric data are ill-suited to the estimation of internal GCM errors, but that such estimates can in theory be obtained using acoustic data. The CMA is then applied to T/P sea level anomaly data and a linearization of a global GFDL GCM which uses two vertical modes. We show that the CMA method can be used with a global model and a global data set, and that the estimates of the error statistics are robust. We show that the fraction of the GCM-T/P residual variance explained by the model error is larger than that derived in Fukumori et al.(1999) with the method of Fu et al.(1993). Most of the model error is explained by the barotropic mode. However, we find that impact of the change in the error statistics on the data assimilation estimates is very small. This is explained by the large representation error, i.e. the dominance of the mesoscale eddies in the T/P signal, which are not part of the 21 by 1" GCM. Therefore, the impact of the observations on the assimilation is very small even after the adjustment of the error statistics. This work demonstrates that simult&neous estimation of the model and measurement error statistics for data assimilation with global ocean data sets and linearized GCMs is possible. However, the error covariance estimation problem is in general highly underdetermined, much more so than the state estimation problem. In other words there exist a very large number of statistical models that can be made consistent with the available data. Therefore, methods for obtaining quantitative error estimates, powerful though they may be, cannot replace physical insight. Used in the right context, as a tool for guiding the choice of a small number of model error parameters, covariance matching can be a useful addition to the repertory of tools available to oceanographers.
A quality improvement management model for renal care.

PubMed

Vlchek, D L; Day, L M

1991-04-01

The purpose of this article is to explore the potential for applying the theory and tools of quality improvement (total quality management) in the renal care setting. We believe that the coupling of the statistical techniques used in the Deming method of quality improvement, with modern approaches to outcome and process analysis, will provide the renal care community with powerful tools, not only for improved quality (i.e., reduced morbidity and mortality), but also for technology evaluation and resource allocation.

Real-time forecasts of tomorrow's earthquakes in California: a new mapping tool

USGS Publications Warehouse

Gerstenberger, Matt; Wiemer, Stefan; Jones, Lucy

2004-01-01

We have derived a multi-model approach to calculate time-dependent earthquake hazard resulting from earthquake clustering. This file report explains the theoretical background behind the approach, the specific details that are used in applying the method to California, as well as the statistical testing to validate the technique. We have implemented our algorithm as a real-time tool that has been automatically generating short-term hazard maps for California since May of 2002, at http://step.wr.usgs.gov
P-MartCancer: A New Online Platform to Access CPTAC Datasets and Enable New Analyses | Office of Cancer Clinical Proteomics Research

Cancer.gov

The November 1, 2017 issue of Cancer Research is dedicated to a collection of computational resource papers in genomics, proteomics, animal models, imaging, and clinical subjects for non-bioinformaticists looking to incorporate computing tools into their work. Scientists at Pacific Northwest National Laboratory have developed P-MartCancer, an open, web-based interactive software tool that enables statistical analyses of peptide or protein data generated from mass-spectrometry (MS)-based global proteomics experiments.
Integration of Advanced Statistical Analysis Tools and Geophysical Modeling

DTIC Science & Technology

2010-12-01

Carin Duke University Douglas Oldenburg University of British Columbia Stephen Billings, Leonard Pasion Laurens Beran Sky Research...means and covariances estimated for each class [5]. For this study, dipole polarizabilities were fit with a Pasion -Oldenburg parameterization of 8 −1...model for unexploded ordnance classification with EMI data,” IEEE Geosci. Remote Sensing Letters, vol. 4, pp. 629–633, 2007. [4] L. R. Pasion
Proceedings: USACERL/ASCE First Joint Conference on Expert Systems, 29-30 June 1988

DTIC Science & Technology

1989-01-01

Wong KOWLEDGE -BASED GRAPHIC DIALOGUES . o ...................... .... 80 D. L Mw 4 CONTENTS (Cont’d) ABSTRACTS ACCEPTED FOR PUBLICATION MAD, AN EXPERT...methodology of inductive shallow modeling was developed. Inductive systems may become powerful shallow modeling tools applicable to a large class of...analysis was conducted using a statistical package, Trajectories. Four different types of relationships were analyzed: linear, logarithmic, power , and
The Cryosphere Model Comparison Tool (CmCt): Ice Sheet Model Validation and Comparison Tool for Greenland and Antarctica

NASA Astrophysics Data System (ADS)

Simon, E.; Nowicki, S.; Neumann, T.; Tyahla, L.; Saba, J. L.; Guerber, J. R.; Bonin, J. A.; DiMarzio, J. P.

2017-12-01

The Cryosphere model Comparison tool (CmCt) is a web based ice sheet model validation tool that is being developed by NASA to facilitate direct comparison between observational data and various ice sheet models. The CmCt allows the user to take advantage of several decades worth of observations from Greenland and Antarctica. Currently, the CmCt can be used to compare ice sheet models provided by the user with remotely sensed satellite data from ICESat (Ice, Cloud, and land Elevation Satellite) laser altimetry, GRACE (Gravity Recovery and Climate Experiment) satellite, and radar altimetry (ERS-1, ERS-2, and Envisat). One or more models can be uploaded through the CmCt website and compared with observational data, or compared to each other or other models. The CmCt calculates statistics on the differences between the model and observations, and other quantitative and qualitative metrics, which can be used to evaluate the different model simulations against the observations. The qualitative metrics consist of a range of visual outputs and the quantitative metrics consist of several whole-ice-sheet scalar values that can be used to assign an overall score to a particular simulation. The comparison results from CmCt are useful in quantifying improvements within a specific model (or within a class of models) as a result of differences in model dynamics (e.g., shallow vs. higher-order dynamics approximations), model physics (e.g., representations of ice sheet rheological or basal processes), or model resolution (mesh resolution and/or changes in the spatial resolution of input datasets). The framework and metrics could also be used for use as a model-to-model intercomparison tool, simply by swapping outputs from another model as the observational datasets. Future versions of the tool will include comparisons with other datasets that are of interest to the modeling community, such as ice velocity, ice thickness, and surface mass balance.
Final Report: Quantification of Uncertainty in Extreme Scale Computations (QUEST)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marzouk, Youssef; Conrad, Patrick; Bigoni, Daniele

QUEST (\\url{www.quest-scidac.org}) is a SciDAC Institute that is focused on uncertainty quantification (UQ) in large-scale scientific computations. Our goals are to (1) advance the state of the art in UQ mathematics, algorithms, and software; and (2) provide modeling, algorithmic, and general UQ expertise, together with software tools, to other SciDAC projects, thereby enabling and guiding a broad range of UQ activities in their respective contexts. QUEST is a collaboration among six institutions (Sandia National Laboratories, Los Alamos National Laboratory, the University of Southern California, Massachusetts Institute of Technology, the University of Texas at Austin, and Duke University) with a historymore » of joint UQ research. Our vision encompasses all aspects of UQ in leadership-class computing. This includes the well-founded setup of UQ problems; characterization of the input space given available data/information; local and global sensitivity analysis; adaptive dimensionality and order reduction; forward and inverse propagation of uncertainty; handling of application code failures, missing data, and hardware/software fault tolerance; and model inadequacy, comparison, validation, selection, and averaging. The nature of the UQ problem requires the seamless combination of data, models, and information across this landscape in a manner that provides a self-consistent quantification of requisite uncertainties in predictions from computational models. Accordingly, our UQ methods and tools span an interdisciplinary space across applied math, information theory, and statistics. The MIT QUEST effort centers on statistical inference and methods for surrogate or reduced-order modeling. MIT personnel have been responsible for the development of adaptive sampling methods, methods for approximating computationally intensive models, and software for both forward uncertainty propagation and statistical inverse problems. A key software product of the MIT QUEST effort is the MIT Uncertainty Quantification library, called MUQ (\\url{muq.mit.edu}).« less
AA9int: SNP Interaction Pattern Search Using Non-Hierarchical Additive Model Set.

PubMed

Lin, Hui-Yi; Huang, Po-Yu; Chen, Dung-Tsa; Tung, Heng-Yuan; Sellers, Thomas A; Pow-Sang, Julio; Eeles, Rosalind; Easton, Doug; Kote-Jarai, Zsofia; Amin Al Olama, Ali; Benlloch, Sara; Muir, Kenneth; Giles, Graham G; Wiklund, Fredrik; Gronberg, Henrik; Haiman, Christopher A; Schleutker, Johanna; Nordestgaard, Børge G; Travis, Ruth C; Hamdy, Freddie; Neal, David E; Pashayan, Nora; Khaw, Kay-Tee; Stanford, Janet L; Blot, William J; Thibodeau, Stephen N; Maier, Christiane; Kibel, Adam S; Cybulski, Cezary; Cannon-Albright, Lisa; Brenner, Hermann; Kaneva, Radka; Batra, Jyotsna; Teixeira, Manuel R; Pandha, Hardev; Lu, Yong-Jie; Park, Jong Y

2018-06-07

The use of single nucleotide polymorphism (SNP) interactions to predict complex diseases is getting more attention during the past decade, but related statistical methods are still immature. We previously proposed the SNP Interaction Pattern Identifier (SIPI) approach to evaluate 45 SNP interaction patterns/patterns. SIPI is statistically powerful but suffers from a large computation burden. For large-scale studies, it is necessary to use a powerful and computation-efficient method. The objective of this study is to develop an evidence-based mini-version of SIPI as the screening tool or solitary use and to evaluate the impact of inheritance mode and model structure on detecting SNP-SNP interactions. We tested two candidate approaches: the 'Five-Full' and 'AA9int' method. The Five-Full approach is composed of the five full interaction models considering three inheritance modes (additive, dominant and recessive). The AA9int approach is composed of nine interaction models by considering non-hierarchical model structure and the additive mode. Our simulation results show that AA9int has similar statistical power compared to SIPI and is superior to the Five-Full approach, and the impact of the non-hierarchical model structure is greater than that of the inheritance mode in detecting SNP-SNP interactions. In summary, it is recommended that AA9int is a powerful tool to be used either alone or as the screening stage of a two-stage approach (AA9int+SIPI) for detecting SNP-SNP interactions in large-scale studies. The 'AA9int' and 'parAA9int' functions (standard and parallel computing version) are added in the SIPI R package, which is freely available at https://linhuiyi.github.io/LinHY_Software/. hlin1@lsuhsc.edu. Supplementary data are available at Bioinformatics online.
Application of the GEM Inventory Data Capture Tools for Dynamic Vulnerability Assessment and Recovery Modelling

NASA Astrophysics Data System (ADS)

Verrucci, Enrica; Bevington, John; Vicini, Alessandro

2014-05-01

A set of open-source tools to create building exposure datasets for seismic risk assessment was developed from 2010-13 by the Inventory Data Capture Tools (IDCT) Risk Global Component of the Global Earthquake Model (GEM). The tools were designed to integrate data derived from remotely-sensed imagery, statistically-sampled in-situ field data of buildings to generate per-building and regional exposure data. A number of software tools were created to aid the development of these data, including mobile data capture tools for in-field structural assessment, and the Spatial Inventory Data Developer (SIDD) for creating "mapping schemes" - statistically-inferred distributions of building stock applied to areas of homogeneous urban land use. These tools were made publically available in January 2014. Exemplar implementations in Europe and Central Asia during the IDCT project highlighted several potential application areas beyond the original scope of the project. These are investigated here. We describe and demonstrate how the GEM-IDCT suite can be used extensively within the framework proposed by the EC-FP7 project SENSUM (Framework to integrate Space-based and in-situ sENSing for dynamic vUlnerability and recovery Monitoring). Specifically, applications in the areas of 1) dynamic vulnerability assessment (pre-event), and 2) recovery monitoring and evaluation (post-event) are discussed. Strategies for using the IDC Tools for these purposes are discussed. The results demonstrate the benefits of using advanced technology tools for data capture, especially in a systematic fashion using the taxonomic standards set by GEM. Originally designed for seismic risk assessment, it is clear the IDCT tools have relevance for multi-hazard risk assessment. When combined with a suitable sampling framework and applied to multi-temporal recovery monitoring, data generated from the tools can reveal spatio-temporal patterns in the quality of recovery activities and resilience trends can be inferred. Lastly, this work draws attention to the use of the IDCT suite as an education resource for inspiring and training new students and engineers in the field of disaster risk reduction.
External Validation of a Tool Predicting 7-Year Risk of Developing Cardiovascular Disease, Type 2 Diabetes or Chronic Kidney Disease.

PubMed

Rauh, Simone P; Rutters, Femke; van der Heijden, Amber A W A; Luimes, Thomas; Alssema, Marjan; Heymans, Martijn W; Magliano, Dianna J; Shaw, Jonathan E; Beulens, Joline W; Dekker, Jacqueline M

2018-02-01

Chronic cardiometabolic diseases, including cardiovascular disease (CVD), type 2 diabetes (T2D) and chronic kidney disease (CKD), share many modifiable risk factors and can be prevented using combined prevention programs. Valid risk prediction tools are needed to accurately identify individuals at risk. We aimed to validate a previously developed non-invasive risk prediction tool for predicting the combined 7-year-risk for chronic cardiometabolic diseases. The previously developed tool is stratified for sex and contains the predictors age, BMI, waist circumference, use of antihypertensives, smoking, family history of myocardial infarction/stroke, and family history of diabetes. This tool was externally validated, evaluating model performance using area under the receiver operating characteristic curve (AUC)-assessing discrimination-and Hosmer-Lemeshow goodness-of-fit (HL) statistics-assessing calibration. The intercept was recalibrated to improve calibration performance. The risk prediction tool was validated in 3544 participants from the Australian Diabetes, Obesity and Lifestyle Study (AusDiab). Discrimination was acceptable, with an AUC of 0.78 (95% CI 0.75-0.81) in men and 0.78 (95% CI 0.74-0.81) in women. Calibration was poor (HL statistic: p < 0.001), but improved considerably after intercept recalibration. Examination of individual outcomes showed that in men, AUC was highest for CKD (0.85 [95% CI 0.78-0.91]) and lowest for T2D (0.69 [95% CI 0.65-0.74]). In women, AUC was highest for CVD (0.88 [95% CI 0.83-0.94)]) and lowest for T2D (0.71 [95% CI 0.66-0.75]). Validation of our previously developed tool showed robust discriminative performance across populations. Model recalibration is recommended to account for different disease rates. Our risk prediction tool can be useful in large-scale prevention programs for identifying those in need of further risk profiling because of their increased risk for chronic cardiometabolic diseases.
Finding the Root Causes of Statistical Inconsistency in Community Earth System Model Output

NASA Astrophysics Data System (ADS)

Milroy, D.; Hammerling, D.; Baker, A. H.

2017-12-01

Baker et al (2015) developed the Community Earth System Model Ensemble Consistency Test (CESM-ECT) to provide a metric for software quality assurance by determining statistical consistency between an ensemble of CESM outputs and new test runs. The test has proved useful for detecting statistical difference caused by compiler bugs and errors in physical modules. However, detection is only the necessary first step in finding the causes of statistical difference. The CESM is a vastly complex model comprised of millions of lines of code which is developed and maintained by a large community of software engineers and scientists. Any root cause analysis is correspondingly challenging. We propose a new capability for CESM-ECT: identifying the sections of code that cause statistical distinguishability. The first step is to discover CESM variables that cause CESM-ECT to classify new runs as statistically distinct, which we achieve via Randomized Logistic Regression. Next we use a tool developed to identify CESM components that define or compute the variables found in the first step. Finally, we employ the application Kernel GENerator (KGEN) created in Kim et al (2016) to detect fine-grained floating point differences. We demonstrate an example of the procedure and advance a plan to automate this process in our future work.
Statistical fluctuations in pedestrian evacuation times and the effect of social contagion

NASA Astrophysics Data System (ADS)

Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.

2016-08-01

Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.
A new statistical methodology predicting chip failure probability considering electromigration

NASA Astrophysics Data System (ADS)

Sun, Ted

In this research thesis, we present a new approach to analyze chip reliability subject to electromigration (EM) whose fundamental causes and EM phenomenon happened in different materials are presented in this thesis. This new approach utilizes the statistical nature of EM failure in order to assess overall EM risk. It includes within-die temperature variations from the chip's temperature map extracted by an Electronic Design Automation (EDA) tool to estimate the failure probability of a design. Both the power estimation and thermal analysis are performed in the EDA flow. We first used the traditional EM approach to analyze the design with a single temperature across the entire chip that involves 6 metal and 5 via layers. Next, we used the same traditional approach but with a realistic temperature map. The traditional EM analysis approach and that coupled with a temperature map and the comparison between the results of considering and not considering temperature map are presented in in this research. A comparison between these two results confirms that using a temperature map yields a less pessimistic estimation of the chip's EM risk. Finally, we employed the statistical methodology we developed considering a temperature map and different use-condition voltages and frequencies to estimate the overall failure probability of the chip. The statistical model established considers the scaling work with the usage of traditional Black equation and four major conditions. The statistical result comparisons are within our expectations. The results of this statistical analysis confirm that the chip level failure probability is higher i) at higher use-condition frequencies for all use-condition voltages, and ii) when a single temperature instead of a temperature map across the chip is considered. In this thesis, I start with an overall review on current design types, common flows, and necessary verifications and reliability checking steps used in this IC design industry. Furthermore, the important concepts about "Scripting Automation" which is used in all the integration of using diversified EDA tools in this research work are also described in detail with several examples and my completed coding works are also put in the appendix for your reference. Hopefully, this construction of my thesis will give readers a thorough understanding about my research work from the automation of EDA tools to the statistical data generation, from the nature of EM to the statistical model construction, and the comparisons among the traditional EM analysis and the statistical EM analysis approaches.
Application of a Lifestyle-Based Tool to Estimate Premature Cardiovascular Disease Events in Young Adults: The Coronary Artery Risk Development in Young Adults (CARDIA) Study.

PubMed

Gooding, Holly C; Ning, Hongyan; Gillman, Matthew W; Shay, Christina; Allen, Norrina; Goff, David C; Lloyd-Jones, Donald; Chiuve, Stephanie

2017-09-01

Few tools exist for assessing the risk for early atherosclerotic cardiovascular disease (ASCVD) events in young adults. To assess the performance of the Healthy Heart Score (HHS), a lifestyle-based tool that estimates ASCVD events in older adults, for ASCVD events occurring before 55 years of age. This prospective cohort study included 4893 US adults aged 18 to 30 years from the Coronary Artery Risk Development in Young Adults (CARDIA) study. Participants underwent measurement of lifestyle factors from March 25, 1985, through June 7, 1986, and were followed up for a median of 27.1 years (interquartile range, 26.9-27.2 years). Data for this study were analyzed from February 24 through December 12, 2016. The HHS includes age, smoking status, body mass index, alcohol intake, exercise, and a diet score composed of self-reported daily intake of cereal fiber, fruits and/or vegetables, nuts, sugar-sweetened beverages, and red and/or processed meats. The HHS in the CARDIA study was calculated using sex-specific equations produced by its derivation cohorts. The ability of the HHS to assess the 25-year risk for ASCVD (death from coronary heart disease, nonfatal myocardial infarction, and fatal or nonfatal ischemic stroke) in the total sample, in race- and sex-specific subgroups, and in those with and without clinical ASCVD risk factors at baseline. Model discrimination was assessed with the Harrell C statistic; model calibration, with Greenwood-Nam-D'Agostino statistics. The study population of 4893 participants included 2205 men (45.1%) and 2688 women (54.9%) with a mean (SD) age at baseline of 24.8 (3.6) years; 2483 (50.7%) were black; and 427 (8.7%) had at least 1 clinical ASCVD risk factor (hypertension, hyperlipidemia, or diabetes types 1 and 2). Among these participants, 64 premature ASCVD events occurred in women and 99 in men. The HHS showed moderate discrimination for ASCVD risk assessment in this diverse population of mostly healthy young adults (C statistic, 0.71; 95% CI, 0.66-0.76); it performed better in men (C statistic, 0.74; 95% CI, 0.68-0.79) than in women (C statistic, 0.69; 95% CI, 0.62-0.75); in white (C statistic, 0.77; 95% CI, 0.71-0.84) than in black (C statistic, 0.66; 95% CI, 0.60-0.72) participants; and in those without (C statistic, 0.71; 95% CI, 0.66-0.76) vs with (C statistic, 0.64; 95% CI, 0.55-0.73) clinical risk factors at baseline. The HHS was adequately calibrated overall and within each subgroup. The HHS, when measured in younger persons without ASCVD risk factors, performs moderately well in assessing risk for ASCVD events by early middle age. Its reliance on self-reported, modifiable lifestyle factors makes it an attractive tool for risk assessment and counseling for early ASCVD prevention.
Towards a General Turbulence Model for Planetary Boundary Layers Based on Direct Statistical Simulation

NASA Astrophysics Data System (ADS)

Skitka, J.; Marston, B.; Fox-Kemper, B.

2016-02-01

Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.
ACCELERATED FAILURE TIME MODELS PROVIDE A USEFUL STATISTICAL FRAMEWORK FOR AGING RESEARCH

PubMed Central

Swindell, William R.

2009-01-01

Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model “deceleration factor”. AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data. PMID:19007875
Accelerated failure time models provide a useful statistical framework for aging research.

PubMed

Swindell, William R

2009-03-01

Survivorship experiments play a central role in aging research and are performed to evaluate whether interventions alter the rate of aging and increase lifespan. The accelerated failure time (AFT) model is seldom used to analyze survivorship data, but offers a potentially useful statistical approach that is based upon the survival curve rather than the hazard function. In this study, AFT models were used to analyze data from 16 survivorship experiments that evaluated the effects of one or more genetic manipulations on mouse lifespan. Most genetic manipulations were found to have a multiplicative effect on survivorship that is independent of age and well-characterized by the AFT model "deceleration factor". AFT model deceleration factors also provided a more intuitive measure of treatment effect than the hazard ratio, and were robust to departures from modeling assumptions. Age-dependent treatment effects, when present, were investigated using quantile regression modeling. These results provide an informative and quantitative summary of survivorship data associated with currently known long-lived mouse models. In addition, from the standpoint of aging research, these statistical approaches have appealing properties and provide valuable tools for the analysis of survivorship data.
Uncertainty visualisation in the Model Web

NASA Astrophysics Data System (ADS)

Gerharz, L. E.; Autermann, C.; Hopmann, H.; Stasch, C.; Pebesma, E.

2012-04-01

Visualisation of geospatial data as maps is a common way to communicate spatially distributed information. If temporal and furthermore uncertainty information are included in the data, efficient visualisation methods are required. For uncertain spatial and spatio-temporal data, numerous visualisation methods have been developed and proposed, but only few tools for visualisation of data in a standardised way exist. Furthermore, usually they are realised as thick clients, and lack functionality of handling data coming from web services as it is envisaged in the Model Web. We present an interactive web tool for visualisation of uncertain spatio-temporal data developed in the UncertWeb project. The client is based on the OpenLayers JavaScript library. OpenLayers provides standard map windows and navigation tools, i.e. pan, zoom in/out, to allow interactive control for the user. Further interactive methods are implemented using jStat, a JavaScript library for statistics plots developed in UncertWeb, and flot. To integrate the uncertainty information into existing standards for geospatial data, the Uncertainty Markup Language (UncertML) was applied in combination with OGC Observations&Measurements 2.0 and JavaScript Object Notation (JSON) encodings for vector and NetCDF for raster data. The client offers methods to visualise uncertain vector and raster data with temporal information. Uncertainty information considered for the tool are probabilistic and quantified attribute uncertainties which can be provided as realisations or samples, full probability distributions functions and statistics. Visualisation is supported for uncertain continuous and categorical data. In the client, the visualisation is realised using a combination of different methods. Based on previously conducted usability studies, a differentiation between expert (in statistics or mapping) and non-expert users has been indicated as useful. Therefore, two different modes are realised together in the tool: (i) adjacent maps showing data and uncertainty separately, and (ii) multidimensional mapping providing different visualisation methods in combination to explore the spatial, temporal and uncertainty distribution of the data. Adjacent maps allow a simpler visualisation by separating value and uncertainty maps for non-experts and a first overview. The multidimensional approach allows a more complex exploration of the data for experts by browsing through the different dimensions. It offers the visualisation of maps, statistic plots and time series in different windows and sliders to interactively move through time, space and uncertainty (thresholds).
Statistical Analysis of CO 2 Exposed Wells to Predict Long Term Leakage through the Development of an Integrated Neural-Genetic Algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guo, Boyun; Duguid, Andrew; Nygaard, Ronar

The objective of this project is to develop a computerized statistical model with the Integrated Neural-Genetic Algorithm (INGA) for predicting the probability of long-term leak of wells in CO 2 sequestration operations. This object has been accomplished by conducting research in three phases: 1) data mining of CO 2-explosed wells, 2) INGA computer model development, and 3) evaluation of the predictive performance of the computer model with data from field tests. Data mining was conducted for 510 wells in two CO 2 sequestration projects in the Texas Gulf Coast region. They are the Hasting West field and Oyster Bayou fieldmore » in the Southern Texas. Missing wellbore integrity data were estimated using an analytical and Finite Element Method (FEM) model. The INGA was first tested for performances of convergence and computing efficiency with the obtained data set of high dimension. It was concluded that the INGA can handle the gathered data set with good accuracy and reasonable computing time after a reduction of dimension with a grouping mechanism. A computerized statistical model with the INGA was then developed based on data pre-processing and grouping. Comprehensive training and testing of the model were carried out to ensure that the model is accurate and efficient enough for predicting the probability of long-term leak of wells in CO 2 sequestration operations. The Cranfield in the southern Mississippi was select as the test site. Observation wells CFU31F2 and CFU31F3 were used for pressure-testing, formation-logging, and cement-sampling. Tools run in the wells include Isolation Scanner, Slim Cement Mapping Tool (SCMT), Cased Hole Formation Dynamics Tester (CHDT), and Mechanical Sidewall Coring Tool (MSCT). Analyses of the obtained data indicate no leak of CO 2 cross the cap zone while it is evident that the well cement sheath was invaded by the CO 2 from the storage zone. This observation is consistent with the result predicted by the INGA model which indicates the well has a CO 2 leak-safe probability of 72%. This comparison implies that the developed INGA model is valid for future use in predicting well leak probability.« less
Deformed Materials: Towards a Theory of Materials Morphology Dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sethna, James P

This grant supported work on the response of crystals to external stress. Our primary work described how disordered structural materials break in two (statistical models of fracture in disordered materials), studied models of deformation bursts (avalanches) that mediate deformation on the microscale, and developed continuum dislocation dynamics models for plastic deformation (as when scooping ice cream bends a spoon, Fig. 9). Glass is brittle -- it breaks with almost atomically smooth fracture surfaces. Many metals are ductile -- when they break, the fracture surface is locally sheared and stretched, and it is this damage that makes them hard to break.more » Bone and seashells are made of brittle material, but they are strong because they are disordered -- lots of little cracks form as they are sheared and near the fracture surface, diluting the external force. We have studied materials like bone and seashells using simulations, mathematical tools, and statistical mechanics models from physics. In particular, we studied the extreme values of fracture strengths (how likely will a beam in a bridge break far below its design strength), and found that the traditional engineering tools could be improved greatly. We also studied fascinating crackling-noise precursors -- systems which formed microcracks of a broad range of sizes before they broke. Ductile metals under stress undergo irreversible plastic deformation -- the planes of atoms must slide across one another (through the motion of dislocations) to change the overall shape in response to the external force. Microscopically, the dislocations in crystals move in bursts of a broad range of sizes (termed 'avalanches' in the statistical mechanics community, whose motion is deemed 'crackling noise'). In this grant period, we resolved a longstanding mystery about the average shape of avalanches of fixed duration (using tools related to an emergent scale invariance), we developed the fundamental theory describing the shapes of avalanches and how they are affected by the edges of the microscope viewing window, we found that slow creep of dislocations can trigger an oscillating response explaining recent experiments, we explained avalanches under external voltage, and we have studied how avalanches in experiments on the microscale relate to deformation of large samples. Inside the crystals forming the metal, the dislocations arrange into mysterious cellular structures, usually ignored in theories of plasticity. Writing a natural continuum theory for dislocation dynamics, we found that it spontaneously formed walls -- much like models of traffic jams and sonic booms. These walls formed rather realistic cellular structures, which we examined in great detail -- our walls formed fractal structures with fascinating scaling properties, related to those found in turbulent fluids. We found, however, that the numerical and mathematical tools available to solve our equations were not flexible enough to incorporate materials-specific information, and our models did not show the dislocation avalanches seen experimentally. In the last year of this grant, we wrote an invited review article, explaining how plastic flow in metals shares features with other stressed materials, and how tools of statistical physics used in these other systems might be crucial for understanding plasticity.« less
Syndromic surveillance models using Web data: the case of scarlet fever in the UK.

PubMed

Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel

2012-03-01

Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.

QSAR models for anti-malarial activity of 4-aminoquinolines.

PubMed

Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

2014-03-01

In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.
Automatic identification of bacterial types using statistical imaging methods

NASA Astrophysics Data System (ADS)

Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

2003-05-01

The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.
Assessment of the long-lead probabilistic prediction for the Asian summer monsoon precipitation (1983-2011) based on the APCC multimodel system and a statistical model

NASA Astrophysics Data System (ADS)

Sohn, Soo-Jin; Min, Young-Mi; Lee, June-Yi; Tam, Chi-Yung; Kang, In-Sik; Wang, Bin; Ahn, Joong-Bae; Yamagata, Toshio

2012-02-01

The performance of the probabilistic multimodel prediction (PMMP) system of the APEC Climate Center (APCC) in predicting the Asian summer monsoon (ASM) precipitation at a four-month lead (with February initial condition) was compared with that of a statistical model using hindcast data for 1983-2005 and real-time forecasts for 2006-2011. Particular attention was paid to probabilistic precipitation forecasts for the boreal summer after the mature phase of El Niño and Southern Oscillation (ENSO). Taking into account the fact that coupled models' skill for boreal spring and summer precipitation mainly comes from their ability to capture ENSO teleconnection, we developed the statistical model using linear regression with the preceding winter ENSO condition as the predictor. Our results reveal several advantages and disadvantages in both forecast systems. First, the PMMP appears to have higher skills for both above- and below-normal categories in the six-year real-time forecast period, whereas the cross-validated statistical model has higher skills during the 23-year hindcast period. This implies that the cross-validated statistical skill may be overestimated. Second, the PMMP is the better tool for capturing atypical ENSO (or non-canonical ENSO related) teleconnection, which has affected the ASM precipitation during the early 1990s and in the recent decade. Third, the statistical model is more sensitive to the ENSO phase and has an advantage in predicting the ASM precipitation after the mature phase of La Niña.
Estimating time since infection in early homogeneous HIV-1 samples using a poisson model

PubMed Central

2010-01-01

Background The occurrence of a genetic bottleneck in HIV sexual or mother-to-infant transmission has been well documented. This results in a majority of new infections being homogeneous, i.e., initiated by a single genetic strain. Early after infection, prior to the onset of the host immune response, the viral population grows exponentially. In this simple setting, an approach for estimating evolutionary and demographic parameters based on comparison of diversity measures is a feasible alternative to the existing Bayesian methods (e.g., BEAST), which are instead based on the simulation of genealogies. Results We have devised a web tool that analyzes genetic diversity in acutely infected HIV-1 patients by comparing it to a model of neutral growth. More specifically, we consider a homogeneous infection (i.e., initiated by a unique genetic strain) prior to the onset of host-induced selection, where we can assume a random accumulation of mutations. Previously, we have shown that such a model successfully describes about 80% of sexual HIV-1 transmissions provided the samples are drawn early enough in the infection. Violation of the model is an indicator of either heterogeneous infections or the initiation of selection. Conclusions When the underlying assumptions of our model (homogeneous infection prior to selection and fast exponential growth) are met, we are under a very particular scenario for which we can use a forward approach (instead of backwards in time as provided by coalescent methods). This allows for more computationally efficient methods to derive the time since the most recent common ancestor. Furthermore, the tool performs statistical tests on the Hamming distance frequency distribution, and outputs summary statistics (mean of the best fitting Poisson distribution, goodness of fit p-value, etc). The tool runs within minutes and can readily accommodate the tens of thousands of sequences generated through new ultradeep pyrosequencing technologies. The tool is available on the LANL website. PMID:20973976
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Studying Individual Differences in Predictability with Gamma Regression and Nonlinear Multilevel Models

ERIC Educational Resources Information Center

Culpepper, Steven Andrew

2010-01-01

Statistical prediction remains an important tool for decisions in a variety of disciplines. An equally important issue is identifying factors that contribute to more or less accurate predictions. The time series literature includes well developed methods for studying predictability and volatility over time. This article develops…
Predicting thermal regimes of stream networks across the northeast United States: Natural and anthropogenic influences

EPA Science Inventory

We used STARS (Spatial Tools for the Analysis of River Systems), an ArcGIS geoprocessing toolbox, to create spatial stream networks. We then developed and assessed spatial statistical models for each of these metrics, incorporating spatial autocorrelation based on both distance...
Predicting E. Coli and Enterococci Concentrations in the South Fork Broad River Watershed Using Virtual Beach

EPA Science Inventory

Virtual Beach (VB) is a decision support tool that constructs site-specific statistical models to predict fecal indicator bacteria (FIB) at locations of exposure. Although primarily designed for making decisions regarding beach closures or issuance of swimming advisories based on...
Structural Equation Modeling: Possibilities for Language Learning Researchers

ERIC Educational Resources Information Center

Hancock, Gregory R.; Schoonen, Rob

2015-01-01

Although classical statistical techniques have been a valuable tool in second language (L2) research, L2 research questions have started to grow beyond those techniques' capabilities, and indeed are often limited by them. Questions about how complex constructs relate to each other or to constituent subskills, about longitudinal development in…
A simple rapid process for semi-automated brain extraction from magnetic resonance images of the whole mouse head.

PubMed

Delora, Adam; Gonzales, Aaron; Medina, Christopher S; Mitchell, Adam; Mohed, Abdul Faheem; Jacobs, Russell E; Bearer, Elaine L

2016-01-15

Magnetic resonance imaging (MRI) is a well-developed technique in neuroscience. Limitations in applying MRI to rodent models of neuropsychiatric disorders include the large number of animals required to achieve statistical significance, and the paucity of automation tools for the critical early step in processing, brain extraction, which prepares brain images for alignment and voxel-wise statistics. This novel timesaving automation of template-based brain extraction ("skull-stripping") is capable of quickly and reliably extracting the brain from large numbers of whole head images in a single step. The method is simple to install and requires minimal user interaction. This method is equally applicable to different types of MR images. Results were evaluated with Dice and Jacquard similarity indices and compared in 3D surface projections with other stripping approaches. Statistical comparisons demonstrate that individual variation of brain volumes are preserved. A downloadable software package not otherwise available for extraction of brains from whole head images is included here. This software tool increases speed, can be used with an atlas or a template from within the dataset, and produces masks that need little further refinement. Our new automation can be applied to any MR dataset, since the starting point is a template mask generated specifically for that dataset. The method reliably and rapidly extracts brain images from whole head images, rendering them useable for subsequent analytical processing. This software tool will accelerate the exploitation of mouse models for the investigation of human brain disorders by MRI. Copyright © 2015 Elsevier B.V. All rights reserved.
S2O - A software tool for integrating research data from general purpose statistic software into electronic data capture systems.

PubMed

Bruland, Philipp; Dugas, Martin

2017-01-07

Data capture for clinical registries or pilot studies is often performed in spreadsheet-based applications like Microsoft Excel or IBM SPSS. Usually, data is transferred into statistic software, such as SAS, R or IBM SPSS Statistics, for analyses afterwards. Spreadsheet-based solutions suffer from several drawbacks: It is generally not possible to ensure a sufficient right and role management; it is not traced who has changed data when and why. Therefore, such systems are not able to comply with regulatory requirements for electronic data capture in clinical trials. In contrast, Electronic Data Capture (EDC) software enables a reliable, secure and auditable collection of data. In this regard, most EDC vendors support the CDISC ODM standard to define, communicate and archive clinical trial meta- and patient data. Advantages of EDC systems are support for multi-user and multicenter clinical trials as well as auditable data. Migration from spreadsheet based data collection to EDC systems is labor-intensive and time-consuming at present. Hence, the objectives of this research work are to develop a mapping model and implement a converter between the IBM SPSS and CDISC ODM standard and to evaluate this approach regarding syntactic and semantic correctness. A mapping model between IBM SPSS and CDISC ODM data structures was developed. SPSS variables and patient values can be mapped and converted into ODM. Statistical and display attributes from SPSS are not corresponding to any ODM elements; study related ODM elements are not available in SPSS. The S2O converting tool was implemented as command-line-tool using the SPSS internal Java plugin. Syntactic and semantic correctness was validated with different ODM tools and reverse transformation from ODM into SPSS format. Clinical data values were also successfully transformed into the ODM structure. Transformation between the spreadsheet format IBM SPSS and the ODM standard for definition and exchange of trial data is feasible. S2O facilitates migration from Excel- or SPSS-based data collections towards reliable EDC systems. Thereby, advantages of EDC systems like reliable software architecture for secure and traceable data collection and particularly compliance with regulatory requirements are achievable.
Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods.

PubMed

Martínez, María Jimena; Ponzoni, Ignacio; Díaz, Mónica F; Vazquez, Gustavo E; Soto, Axel J

2015-01-01

The design of QSAR/QSPR models is a challenging problem, where the selection of the most relevant descriptors constitutes a key step of the process. Several feature selection methods that address this step are concentrated on statistical associations among descriptors and target properties, whereas the chemical knowledge is left out of the analysis. For this reason, the interpretability and generality of the QSAR/QSPR models obtained by these feature selection methods are drastically affected. Therefore, an approach for integrating domain expert's knowledge in the selection process is needed for increase the confidence in the final set of descriptors. In this paper a software tool, which we named Visual and Interactive DEscriptor ANalysis (VIDEAN), that combines statistical methods with interactive visualizations for choosing a set of descriptors for predicting a target property is proposed. Domain expertise can be added to the feature selection process by means of an interactive visual exploration of data, and aided by statistical tools and metrics based on information theory. Coordinated visual representations are presented for capturing different relationships and interactions among descriptors, target properties and candidate subsets of descriptors. The competencies of the proposed software were assessed through different scenarios. These scenarios reveal how an expert can use this tool to choose one subset of descriptors from a group of candidate subsets or how to modify existing descriptor subsets and even incorporate new descriptors according to his or her own knowledge of the target property. The reported experiences showed the suitability of our software for selecting sets of descriptors with low cardinality, high interpretability, low redundancy and high statistical performance in a visual exploratory way. Therefore, it is possible to conclude that the resulting tool allows the integration of a chemist's expertise in the descriptor selection process with a low cognitive effort in contrast with the alternative of using an ad-hoc manual analysis of the selected descriptors. Graphical abstractVIDEAN allows the visual analysis of candidate subsets of descriptors for QSAR/QSPR. In the two panels on the top, users can interactively explore numerical correlations as well as co-occurrences in the candidate subsets through two interactive graphs.
Large-scale gene function analysis with the PANTHER classification system.

PubMed

Mi, Huaiyu; Muruganujan, Anushya; Casagrande, John T; Thomas, Paul D

2013-08-01

The PANTHER (protein annotation through evolutionary relationship) classification system (http://www.pantherdb.org/) is a comprehensive system that combines gene function, ontology, pathways and statistical analysis tools that enable biologists to analyze large-scale, genome-wide data from sequencing, proteomics or gene expression experiments. The system is built with 82 complete genomes organized into gene families and subfamilies, and their evolutionary relationships are captured in phylogenetic trees, multiple sequence alignments and statistical models (hidden Markov models or HMMs). Genes are classified according to their function in several different ways: families and subfamilies are annotated with ontology terms (Gene Ontology (GO) and PANTHER protein class), and sequences are assigned to PANTHER pathways. The PANTHER website includes a suite of tools that enable users to browse and query gene functions, and to analyze large-scale experimental data with a number of statistical tests. It is widely used by bench scientists, bioinformaticians, computer scientists and systems biologists. In the 2013 release of PANTHER (v.8.0), in addition to an update of the data content, we redesigned the website interface to improve both user experience and the system's analytical capability. This protocol provides a detailed description of how to analyze genome-wide experimental data with the PANTHER classification system.
Statistical comparison of a hybrid approach with approximate and exact inference models for Fusion 2+

NASA Astrophysics Data System (ADS)

Lee, K. David; Wiesenfeld, Eric; Gelfand, Andrew

2007-04-01

One of the greatest challenges in modern combat is maintaining a high level of timely Situational Awareness (SA). In many situations, computational complexity and accuracy considerations make the development and deployment of real-time, high-level inference tools very difficult. An innovative hybrid framework that combines Bayesian inference, in the form of Bayesian Networks, and Possibility Theory, in the form of Fuzzy Logic systems, has recently been introduced to provide a rigorous framework for high-level inference. In previous research, the theoretical basis and benefits of the hybrid approach have been developed. However, lacking is a concrete experimental comparison of the hybrid framework with traditional fusion methods, to demonstrate and quantify this benefit. The goal of this research, therefore, is to provide a statistical analysis on the comparison of the accuracy and performance of hybrid network theory, with pure Bayesian and Fuzzy systems and an inexact Bayesian system approximated using Particle Filtering. To accomplish this task, domain specific models will be developed under these different theoretical approaches and then evaluated, via Monte Carlo Simulation, in comparison to situational ground truth to measure accuracy and fidelity. Following this, a rigorous statistical analysis of the performance results will be performed, to quantify the benefit of hybrid inference to other fusion tools.
Online and offline tools for head movement compensation in MEG.

PubMed

Stolk, Arjen; Todorovic, Ana; Schoffelen, Jan-Mathijs; Oostenveld, Robert

2013-03-01

Magnetoencephalography (MEG) is measured above the head, which makes it sensitive to variations of the head position with respect to the sensors. Head movements blur the topography of the neuronal sources of the MEG signal, increase localization errors, and reduce statistical sensitivity. Here we describe two novel and readily applicable methods that compensate for the detrimental effects of head motion on the statistical sensitivity of MEG experiments. First, we introduce an online procedure that continuously monitors head position. Second, we describe an offline analysis method that takes into account the head position time-series. We quantify the performance of these methods in the context of three different experimental settings, involving somatosensory, visual and auditory stimuli, assessing both individual and group-level statistics. The online head localization procedure allowed for optimal repositioning of the subjects over multiple sessions, resulting in a 28% reduction of the variance in dipole position and an improvement of up to 15% in statistical sensitivity. Offline incorporation of the head position time-series into the general linear model resulted in improvements of group-level statistical sensitivity between 15% and 29%. These tools can substantially reduce the influence of head movement within and between sessions, increasing the sensitivity of many cognitive neuroscience experiments. Copyright © 2012 Elsevier Inc. All rights reserved.
Open-source Software for Demand Forecasting of Clinical Laboratory Test Volumes Using Time-series Analysis.

PubMed

Mohammed, Emad A; Naugler, Christopher

2017-01-01

Demand forecasting is the area of predictive analytics devoted to predicting future volumes of services or consumables. Fair understanding and estimation of how demand will vary facilitates the optimal utilization of resources. In a medical laboratory, accurate forecasting of future demand, that is, test volumes, can increase efficiency and facilitate long-term laboratory planning. Importantly, in an era of utilization management initiatives, accurately predicted volumes compared to the realized test volumes can form a precise way to evaluate utilization management initiatives. Laboratory test volumes are often highly amenable to forecasting by time-series models; however, the statistical software needed to do this is generally either expensive or highly technical. In this paper, we describe an open-source web-based software tool for time-series forecasting and explain how to use it as a demand forecasting tool in clinical laboratories to estimate test volumes. This tool has three different models, that is, Holt-Winters multiplicative, Holt-Winters additive, and simple linear regression. Moreover, these models are ranked and the best one is highlighted. This tool will allow anyone with historic test volume data to model future demand.
Open-source Software for Demand Forecasting of Clinical Laboratory Test Volumes Using Time-series Analysis

PubMed Central

Mohammed, Emad A.; Naugler, Christopher

2017-01-01

Background: Demand forecasting is the area of predictive analytics devoted to predicting future volumes of services or consumables. Fair understanding and estimation of how demand will vary facilitates the optimal utilization of resources. In a medical laboratory, accurate forecasting of future demand, that is, test volumes, can increase efficiency and facilitate long-term laboratory planning. Importantly, in an era of utilization management initiatives, accurately predicted volumes compared to the realized test volumes can form a precise way to evaluate utilization management initiatives. Laboratory test volumes are often highly amenable to forecasting by time-series models; however, the statistical software needed to do this is generally either expensive or highly technical. Method: In this paper, we describe an open-source web-based software tool for time-series forecasting and explain how to use it as a demand forecasting tool in clinical laboratories to estimate test volumes. Results: This tool has three different models, that is, Holt-Winters multiplicative, Holt-Winters additive, and simple linear regression. Moreover, these models are ranked and the best one is highlighted. Conclusion: This tool will allow anyone with historic test volume data to model future demand. PMID:28400996
SiGe BiCMOS manufacturing platform for mmWave applications

NASA Astrophysics Data System (ADS)

Kar-Roy, Arjun; Howard, David; Preisler, Edward; Racanelli, Marco; Chaudhry, Samir; Blaschke, Volker

2010-10-01

TowerJazz offers high volume manufacturable commercial SiGe BiCMOS technology platforms to address the mmWave market. In this paper, first, the SiGe BiCMOS process technology platforms such as SBC18 and SBC13 are described. These manufacturing platforms integrate 200 GHz fT/fMAX SiGe NPN with deep trench isolation into 0.18μm and 0.13μm node CMOS processes along with high density 5.6fF/μm2 stacked MIM capacitors, high value polysilicon resistors, high-Q metal resistors, lateral PNP transistors, and triple well isolation using deep n-well for mixed-signal integration, and, multiple varactors and compact high-Q inductors for RF needs. Second, design enablement tools that maximize performance and lowers costs and time to market such as scalable PSP and HICUM models, statistical and Xsigma models, reliability modeling tools, process control model tools, inductor toolbox and transmission line models are described. Finally, demonstrations in silicon for mmWave applications in the areas of optical networking, mobile broadband, phased array radar, collision avoidance radar and W-band imaging are listed.
Analyzing Human-Landscape Interactions: Tools That Integrate

NASA Astrophysics Data System (ADS)

Zvoleff, Alex; An, Li

2014-01-01

Humans have transformed much of Earth's land surface, giving rise to loss of biodiversity, climate change, and a host of other environmental issues that are affecting human and biophysical systems in unexpected ways. To confront these problems, environmental managers must consider human and landscape systems in integrated ways. This means making use of data obtained from a broad range of methods (e.g., sensors, surveys), while taking into account new findings from the social and biophysical science literatures. New integrative methods (including data fusion, simulation modeling, and participatory approaches) have emerged in recent years to address these challenges, and to allow analysts to provide information that links qualitative and quantitative elements for policymakers. This paper brings attention to these emergent tools while providing an overview of the tools currently in use for analysis of human-landscape interactions. Analysts are now faced with a staggering array of approaches in the human-landscape literature—in an attempt to bring increased clarity to the field, we identify the relative strengths of each tool, and provide guidance to analysts on the areas to which each tool is best applied. We discuss four broad categories of tools: statistical methods (including survival analysis, multi-level modeling, and Bayesian approaches), GIS and spatial analysis methods, simulation approaches (including cellular automata, agent-based modeling, and participatory modeling), and mixed-method techniques (such as alternative futures modeling and integrated assessment). For each tool, we offer an example from the literature of its application in human-landscape research. Among these tools, participatory approaches are gaining prominence for analysts to make the broadest possible array of information available to researchers, environmental managers, and policymakers. Further development of new approaches of data fusion and integration across sites or disciplines pose an important challenge for future work in integrating human and landscape components.
Diagnostic index: an open-source tool to classify TMJ OA condyles

NASA Astrophysics Data System (ADS)

Paniagua, Beatriz; Pascal, Laura; Prieto, Juan; Vimort, Jean Baptiste; Gomes, Liliane; Yatabe, Marilia; Ruellas, Antonio Carlos; Budin, Francois; Pieper, Steve; Styner, Martin; Benavides, Erika; Cevidanes, Lucia

2017-03-01

Osteoarthritis (OA) of temporomandibular joints (TMJ) occurs in about 40% of the patients who present TMJ disorders. Despite its prevalence, OA diagnosis and treatment remain controversial since there are no clear symptoms of the disease, especially in early stages. Quantitative tools based on 3D imaging of the TMJ condyle have the potential to help characterize TMJ OA changes. The goals of the tools proposed in this study are to ultimately develop robust imaging markers for diagnosis and assessment of treatment efficacy. This work proposes to identify differences among asymptomatic controls and different clinical phenotypes of TMJ OA by means of Statistical Shape Modeling (SSM), obtained via clinical expert consensus. From three different grouping schemes (with 3, 5 and 7 groups), our best results reveal that that the majority (74.5%) of the classifications occur in agreement with the groups assigned by consensus between our clinical experts. Our findings suggest the existence of different disease-based phenotypic morphologies in TMJ OA. Our preliminary findings with statistical shape modeling based biomarkers may provide a quantitative staging of the disease. The methodology used in this study is included in an open source image analysis toolbox, to ensure reproducibility and appropriate distribution and dissemination of the solution proposed.

Uncertainty in projected point precipitation extremes for hydrological impact analysis of climate change

NASA Astrophysics Data System (ADS)

Van Uytven, Els; Willems, Patrick

2017-04-01

Current trends in the hydro-meteorological variables indicate the potential impact of climate change on hydrological extremes. Therefore, they trigger an increased importance climate adaptation strategies in water management. The impact of climate change on hydro-meteorological and hydrological extremes is, however, highly uncertain. This is due to uncertainties introduced by the climate models, the internal variability inherent to the climate system, the greenhouse gas scenarios and the statistical downscaling methods. In view of the need to define sustainable climate adaptation strategies, there is a need to assess these uncertainties. This is commonly done by means of ensemble approaches. Because more and more climate models and statistical downscaling methods become available, there is a need to facilitate the climate impact and uncertainty analysis. A Climate Perturbation Tool has been developed for that purpose, which combines a set of statistical downscaling methods including weather typing, weather generator, transfer function and advanced perturbation based approaches. By use of an interactive interface, climate impact modelers can apply these statistical downscaling methods in a semi-automatic way to an ensemble of climate model runs. The tool is applicable to any region, but has been demonstrated so far to cases in Belgium, Suriname, Vietnam and Bangladesh. Time series representing future local-scale precipitation, temperature and potential evapotranspiration (PET) conditions were obtained, starting from time series of historical observations. Uncertainties on the future meteorological conditions are represented in two different ways: through an ensemble of time series, and a reduced set of synthetic scenarios. The both aim to span the full uncertainty range as assessed from the ensemble of climate model runs and downscaling methods. For Belgium, for instance, use was made of 100-year time series of 10-minutes precipitation observations and daily temperature and PET observations at Uccle and a large ensemble of 160 global climate model runs (CMIP5). They cover all four representative concentration pathway based greenhouse gas scenarios. While evaluating the downscaled meteorological series, particular attention was given to the performance of extreme value metrics (e.g. for precipitation, by means of intensity-duration-frequency statistics). Moreover, the total uncertainty was decomposed in the fractional uncertainties for each of the uncertainty sources considered. Research assessing the additional uncertainty due to parameter and structural uncertainties of the hydrological impact model is ongoing.
Paleomagnetism.org: An online multi-platform open source environment for paleomagnetic data analysis

NASA Astrophysics Data System (ADS)

Koymans, Mathijs R.; Langereis, Cor G.; Pastor-Galán, Daniel; van Hinsbergen, Douwe J. J.

2016-08-01

This contribution provides an overview of Paleomagnetism.org, an open-source, multi-platform online environment for paleomagnetic data analysis. Paleomagnetism.org provides an interactive environment where paleomagnetic data can be interpreted, evaluated, visualized, and exported. The Paleomagnetism.org application is split in to an interpretation portal, a statistics portal, and a portal for miscellaneous paleomagnetic tools. In the interpretation portal, principle component analysis can be performed on visualized demagnetization diagrams. Interpreted directions and great circles can be combined to find great circle solutions. These directions can be used in the statistics portal, or exported as data and figures. The tools in the statistics portal cover standard Fisher statistics for directions and VGPs, including other statistical parameters used as reliability criteria. Other available tools include an eigenvector approach foldtest, two reversal test including a Monte Carlo simulation on mean directions, and a coordinate bootstrap on the original data. An implementation is included for the detection and correction of inclination shallowing in sediments following TK03.GAD. Finally we provide a module to visualize VGPs and expected paleolatitudes, declinations, and inclinations relative to widely used global apparent polar wander path models in coordinates of major continent-bearing plates. The tools in the miscellaneous portal include a net tectonic rotation (NTR) analysis to restore a body to its paleo-vertical and a bootstrapped oroclinal test using linear regressive techniques, including a modified foldtest around a vertical axis. Paleomagnetism.org provides an integrated approach for researchers to work with visualized (e.g. hemisphere projections, Zijderveld diagrams) paleomagnetic data. The application constructs a custom exportable file that can be shared freely and included in public databases. This exported file contains all data and can later be imported to the application by other researchers. The accessibility and simplicity through which paleomagnetic data can be interpreted, analyzed, visualized, and shared makes Paleomagnetism.org of interest to the community.
A new in silico classification model for ready biodegradability, based on molecular fragments.

PubMed

Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina

2014-08-01

Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Tooth-size discrepancy: A comparison between manual and digital methods

PubMed Central

Correia, Gabriele Dória Cabral; Habib, Fernando Antonio Lima; Vogel, Carlos Jorge

2014-01-01

Introduction Technological advances in Dentistry have emerged primarily in the area of diagnostic tools. One example is the 3D scanner, which can transform plaster models into three-dimensional digital models. Objective This study aimed to assess the reliability of tooth size-arch length discrepancy analysis measurements performed on three-dimensional digital models, and compare these measurements with those obtained from plaster models. Material and Methods To this end, plaster models of lower dental arches and their corresponding three-dimensional digital models acquired with a 3Shape R700T scanner were used. All of them had lower permanent dentition. Four different tooth size-arch length discrepancy calculations were performed on each model, two of which by manual methods using calipers and brass wire, and two by digital methods using linear measurements and parabolas. Results Data were statistically assessed using Friedman test and no statistically significant differences were found between the two methods (P > 0.05), except for values found by the linear digital method which revealed a slight, non-significant statistical difference. Conclusions Based on the results, it is reasonable to assert that any of these resources used by orthodontists to clinically assess tooth size-arch length discrepancy can be considered reliable. PMID:25279529
Statistical Learning Theory for High Dimensional Prediction: Application to Criterion-Keyed Scale Development

PubMed Central

Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul

2016-01-01

Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Modeling synthetic lethality

PubMed Central

Le Meur, Nolwenn; Gentleman, Robert

2008-01-01

Background Synthetic lethality defines a genetic interaction where the combination of mutations in two or more genes leads to cell death. The implications of synthetic lethal screens have been discussed in the context of drug development as synthetic lethal pairs could be used to selectively kill cancer cells, but leave normal cells relatively unharmed. A challenge is to assess genome-wide experimental data and integrate the results to better understand the underlying biological processes. We propose statistical and computational tools that can be used to find relationships between synthetic lethality and cellular organizational units. Results In Saccharomyces cerevisiae, we identified multi-protein complexes and pairs of multi-protein complexes that share an unusually high number of synthetic genetic interactions. As previously predicted, we found that synthetic lethality can arise from subunits of an essential multi-protein complex or between pairs of multi-protein complexes. Finally, using multi-protein complexes allowed us to take into account the pleiotropic nature of the gene products. Conclusions Modeling synthetic lethality using current estimates of the yeast interactome is an efficient approach to disentangle some of the complex molecular interactions that drive a cell. Our model in conjunction with applied statistical methods and computational methods provides new tools to better characterize synthetic genetic interactions. PMID:18789146
Kernel methods and flexible inference for complex stochastic dynamics

NASA Astrophysics Data System (ADS)

Capobianco, Enrico

2008-07-01

Approximation theory suggests that series expansions and projections represent standard tools for random process applications from both numerical and statistical standpoints. Such instruments emphasize the role of both sparsity and smoothness for compression purposes, the decorrelation power achieved in the expansion coefficients space compared to the signal space, and the reproducing kernel property when some special conditions are met. We consider these three aspects central to the discussion in this paper, and attempt to analyze the characteristics of some known approximation instruments employed in a complex application domain such as financial market time series. Volatility models are often built ad hoc, parametrically and through very sophisticated methodologies. But they can hardly deal with stochastic processes with regard to non-Gaussianity, covariance non-stationarity or complex dependence without paying a big price in terms of either model mis-specification or computational efficiency. It is thus a good idea to look at other more flexible inference tools; hence the strategy of combining greedy approximation and space dimensionality reduction techniques, which are less dependent on distributional assumptions and more targeted to achieve computationally efficient performances. Advantages and limitations of their use will be evaluated by looking at algorithmic and model building strategies, and by reporting statistical diagnostics.
Artificial neural networks in gynaecological diseases: current and potential future applications.

PubMed

Siristatidis, Charalampos S; Chrelias, Charalampos; Pouliakis, Abraham; Katsimanis, Evangelos; Kassanos, Dimitrios

2010-10-01

Current (and probably future) practice of medicine is mostly associated with prediction and accurate diagnosis. Especially in clinical practice, there is an increasing interest in constructing and using valid models of diagnosis and prediction. Artificial neural networks (ANNs) are mathematical systems being used as a prospective tool for reliable, flexible and quick assessment. They demonstrate high power in evaluating multifactorial data, assimilating information from multiple sources and detecting subtle and complex patterns. Their capability and difference from other statistical techniques lies in performing nonlinear statistical modelling. They represent a new alternative to logistic regression, which is the most commonly used method for developing predictive models for outcomes resulting from partitioning in medicine. In combination with the other non-algorithmic artificial intelligence techniques, they provide useful software engineering tools for the development of systems in quantitative medicine. Our paper first presents a brief introduction to ANNs, then, using what we consider the best available evidence through paradigms, we evaluate the ability of these networks to serve as first-line detection and prediction techniques in some of the most crucial fields in gynaecology. Finally, through the analysis of their current application, we explore their dynamics for future use.
Identifying currents in the gene pool for bacterial populations using an integrative approach.

PubMed

Tang, Jing; Hanage, William P; Fraser, Christophe; Corander, Jukka

2009-08-01

The evolution of bacterial populations has recently become considerably better understood due to large-scale sequencing of population samples. It has become clear that DNA sequences from a multitude of genes, as well as a broad sample coverage of a target population, are needed to obtain a relatively unbiased view of its genetic structure and the patterns of ancestry connected to the strains. However, the traditional statistical methods for evolutionary inference, such as phylogenetic analysis, are associated with several difficulties under such an extensive sampling scenario, in particular when a considerable amount of recombination is anticipated to have taken place. To meet the needs of large-scale analyses of population structure for bacteria, we introduce here several statistical tools for the detection and representation of recombination between populations. Also, we introduce a model-based description of the shape of a population in sequence space, in terms of its molecular variability and affinity towards other populations. Extensive real data from the genus Neisseria are utilized to demonstrate the potential of an approach where these population genetic tools are combined with an phylogenetic analysis. The statistical tools introduced here are freely available in BAPS 5.2 software, which can be downloaded from http://web.abo.fi/fak/mnf/mate/jc/software/baps.html.
Ensemble engineering and statistical modeling for parameter calibration towards optimal design of microbial fuel cells

NASA Astrophysics Data System (ADS)

Sun, Hongyue; Luo, Shuai; Jin, Ran; He, Zhen

2017-07-01

Mathematical modeling is an important tool to investigate the performance of microbial fuel cell (MFC) towards its optimized design. To overcome the shortcoming of traditional MFC models, an ensemble model is developed through integrating both engineering model and statistical analytics for the extrapolation scenarios in this study. Such an ensemble model can reduce laboring effort in parameter calibration and require fewer measurement data to achieve comparable accuracy to traditional statistical model under both the normal and extreme operation regions. Based on different weight between current generation and organic removal efficiency, the ensemble model can give recommended input factor settings to achieve the best current generation and organic removal efficiency. The model predicts a set of optimal design factors for the present tubular MFCs including the anode flow rate of 3.47 mL min-1, organic concentration of 0.71 g L-1, and catholyte pumping flow rate of 14.74 mL min-1 to achieve the peak current at 39.2 mA. To maintain 100% organic removal efficiency, the anode flow rate and organic concentration should be controlled lower than 1.04 mL min-1 and 0.22 g L-1, respectively. The developed ensemble model can be potentially modified to model other types of MFCs or bioelectrochemical systems.
Simple statistical bias correction techniques greatly improve moderate resolution air quality forecast at station level

NASA Astrophysics Data System (ADS)

Curci, Gabriele; Falasca, Serena

2017-04-01

Deterministic air quality forecast is routinely carried out at many local Environmental Agencies in Europe and throughout the world by means of eulerian chemistry-transport models. The skill of these models in predicting the ground-level concentrations of relevant pollutants (ozone, nitrogen dioxide, particulate matter) a few days ahead has greatly improved in recent years, but it is not yet always compliant with the required quality level for decision making (e.g. the European Commission has set a maximum uncertainty of 50% on daily values of relevant pollutants). Post-processing of deterministic model output is thus still regarded as a useful tool to make the forecast more reliable. In this work, we test several bias correction techniques applied to a long-term dataset of air quality forecasts over Europe and Italy. We used the WRF-CHIMERE modelling system, which provides operational experimental chemical weather forecast at CETEMPS (http://pumpkin.aquila.infn.it/forechem/), to simulate the years 2008-2012 at low resolution over Europe (0.5° x 0.5°) and moderate resolution over Italy (0.15° x 0.15°). We compared the simulated dataset with available observation from the European Environmental Agency database (AirBase) and characterized model skill and compliance with EU legislation using the Delta tool from FAIRMODE project (http://fairmode.jrc.ec.europa.eu/). The bias correction techniques adopted are, in order of complexity: (1) application of multiplicative factors calculated as the ratio of model-to-observed concentrations averaged over the previous days; (2) correction of the statistical distribution of model forecasts, in order to make it similar to that of the observations; (3) development and application of Model Output Statistics (MOS) regression equations. We illustrate differences and advantages/disadvantages of the three approaches. All the methods are relatively easy to implement for other modelling systems.
Emerging Concepts of Data Integration in Pathogen Phylodynamics.

PubMed

Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

2017-01-01

Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Emerging Concepts of Data Integration in Pathogen Phylodynamics

PubMed Central

Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

2017-01-01

Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Nowcasting of Low-Visibility Procedure States with Ordered Logistic Regression at Vienna International Airport

NASA Astrophysics Data System (ADS)

Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim

2017-04-01

Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
A systematic review of Bayesian articles in psychology: The last 25 years.

PubMed

van de Schoot, Rens; Winter, Sonja D; Ryan, Oisín; Zondervan-Zwijnenburg, Mariëlle; Depaoli, Sarah

2017-06-01

Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Tools and Techniques for Basin-Scale Climate Change Assessment

NASA Astrophysics Data System (ADS)

Zagona, E.; Rajagopalan, B.; Oakley, W.; Wilson, N.; Weinstein, P.; Verdin, A.; Jerla, C.; Prairie, J. R.

2012-12-01

The Department of Interior's WaterSMART Program seeks to secure and stretch water supplies to benefit future generations and identify adaptive measures to address climate change. Under WaterSMART, Basin Studies are comprehensive water studies to explore options for meeting projected imbalances in water supply and demand in specific basins. Such studies could be most beneficial with application of recent scientific advances in climate projections, stochastic simulation, operational modeling and robust decision-making, as well as computational techniques to organize and analyze many alternatives. A new integrated set of tools and techniques to facilitate these studies includes the following components: Future supply scenarios are produced by the Hydrology Simulator, which uses non-parametric K-nearest neighbor resampling techniques to generate ensembles of hydrologic traces based on historical data, optionally conditioned on long paleo reconstructed data using various Markov Chain techniuqes. Resampling can also be conditioned on climate change projections from e.g., downscaled GCM projections to capture increased variability; spatial and temporal disaggregation is also provided. The simulations produced are ensembles of hydrologic inputs to the RiverWare operations/infrastucture decision modeling software. Alternative demand scenarios can be produced with the Demand Input Tool (DIT), an Excel-based tool that allows modifying future demands by groups such as states; sectors, e.g., agriculture, municipal, energy; and hydrologic basins. The demands can be scaled at future dates or changes ramped over specified time periods. Resulting data is imported directly into the decision model. Different model files can represent infrastructure alternatives and different Policy Sets represent alternative operating policies, including options for noticing when conditions point to unacceptable vulnerabilities, which trigger dynamically executing changes in operations or other options. The over-arching Study Manager provides a graphical tool to create combinations of future supply scenarios, demand scenarios, infrastructure and operating policy alternatives; each scenario is executed as an ensemble of RiverWare runs, driven by the hydrologic supply. The Study Manager sets up and manages multiple executions on multi-core hardware. The sizeable are typically direct model outputs, or post-processed indicators of performance based on model outputs. Post processing statistical analysis of the outputs are possible using the Graphical Policy Analysis Tool or other statistical packages. Several Basin Studies undertaken have used RiverWare to evaluate future scenarios. The Colorado River Basin Study, the most complex and extensive to date, has taken advantage of these tools and techniques to generate supply scenarios, produce alternative demand scenarios and to set up and execute the many combinations of supplies, demands, policies, and infrastructure alternatives. The tools and techniques will be described with example applications.
A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more.

PubMed

Rivas, Elena; Lang, Raymond; Eddy, Sean R

2012-02-01

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases.
A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

PubMed Central

Rivas, Elena; Lang, Raymond; Eddy, Sean R.

2012-01-01

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using a generalized super-grammar that can be parameterized with probabilities, energies, or arbitrary scores. By using TORNADO, we find that probabilistic nearest-neighbor models perform comparably to (but not significantly better than) discriminative methods. We find that complex statistical models are prone to overfitting RNA structure and that evaluations should use structurally nonhomologous training and test data sets. Overfitting has affected at least one published method (ContextFold). The most important barrier to improving statistical approaches for RNA secondary structure prediction is the lack of diversity of well-curated single-sequence RNA secondary structures in current RNA databases. PMID:22194308
Risk assessment model for development of advanced age-related macular degeneration.

PubMed

Klein, Michael L; Francis, Peter J; Ferris, Frederick L; Hamon, Sara C; Clemons, Traci E

2011-12-01

To design a risk assessment model for development of advanced age-related macular degeneration (AMD) incorporating phenotypic, demographic, environmental, and genetic risk factors. We evaluated longitudinal data from 2846 participants in the Age-Related Eye Disease Study. At baseline, these individuals had all levels of AMD, ranging from none to unilateral advanced AMD (neovascular or geographic atrophy). Follow-up averaged 9.3 years. We performed a Cox proportional hazards analysis with demographic, environmental, phenotypic, and genetic covariates and constructed a risk assessment model for development of advanced AMD. Performance of the model was evaluated using the C statistic and the Brier score and externally validated in participants in the Complications of Age-Related Macular Degeneration Prevention Trial. The final model included the following independent variables: age, smoking history, family history of AMD (first-degree member), phenotype based on a modified Age-Related Eye Disease Study simple scale score, and genetic variants CFH Y402H and ARMS2 A69S. The model did well on performance measures, with very good discrimination (C statistic = 0.872) and excellent calibration and overall performance (Brier score at 5 years = 0.08). Successful external validation was performed, and a risk assessment tool was designed for use with or without the genetic component. We constructed a risk assessment model for development of advanced AMD. The model performed well on measures of discrimination, calibration, and overall performance and was successfully externally validated. This risk assessment tool is available for online use.
MetaGenyo: a web tool for meta-analysis of genetic association studies.

PubMed

Martorell-Marugan, Jordi; Toro-Dominguez, Daniel; Alarcon-Riquelme, Marta E; Carmona-Saez, Pedro

2017-12-16

Genetic association studies (GAS) aims to evaluate the association between genetic variants and phenotypes. In the last few years, the number of this type of study has increased exponentially, but the results are not always reproducible due to experimental designs, low sample sizes and other methodological errors. In this field, meta-analysis techniques are becoming very popular tools to combine results across studies to increase statistical power and to resolve discrepancies in genetic association studies. A meta-analysis summarizes research findings, increases statistical power and enables the identification of genuine associations between genotypes and phenotypes. Meta-analysis techniques are increasingly used in GAS, but it is also increasing the amount of published meta-analysis containing different errors. Although there are several software packages that implement meta-analysis, none of them are specifically designed for genetic association studies and in most cases their use requires advanced programming or scripting expertise. We have developed MetaGenyo, a web tool for meta-analysis in GAS. MetaGenyo implements a complete and comprehensive workflow that can be executed in an easy-to-use environment without programming knowledge. MetaGenyo has been developed to guide users through the main steps of a GAS meta-analysis, covering Hardy-Weinberg test, statistical association for different genetic models, analysis of heterogeneity, testing for publication bias, subgroup analysis and robustness testing of the results. MetaGenyo is a useful tool to conduct comprehensive genetic association meta-analysis. The application is freely available at http://bioinfo.genyo.es/metagenyo/ .

TU-A-17A-02: In Memoriam of Ben Galkin: Virtual Tools for Validation of X-Ray Breast Imaging Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Myers, K; Bakic, P; Abbey, C

2014-06-15

This symposium will explore simulation methods for the preclinical evaluation of novel 3D and 4D x-ray breast imaging systems – the subject of AAPM taskgroup TG234. Given the complex design of modern imaging systems, simulations offer significant advantages over long and costly clinical studies in terms of reproducibility, reduced radiation exposures, a known reference standard, and the capability for studying patient and disease subpopulations through appropriate choice of simulation parameters. Our focus will be on testing the realism of software anthropomorphic phantoms and virtual clinical trials tools developed for the optimization and validation of breast imaging systems. The symposium willmore » review the stateof- the-science, as well as the advantages and limitations of various approaches to testing realism of phantoms and simulated breast images. Approaches based upon the visual assessment of synthetic breast images by expert observers will be contrasted with approaches based upon comparing statistical properties between synthetic and clinical images. The role of observer models in the assessment of realism will be considered. Finally, an industry perspective will be presented, summarizing the role and importance of virtual tools and simulation methods in product development. The challenges and conditions that must be satisfied in order for computational modeling and simulation to play a significantly increased role in the design and evaluation of novel breast imaging systems will be addressed. Learning Objectives: Review the state-of-the science in testing realism of software anthropomorphic phantoms and virtual clinical trials tools; Compare approaches based upon the visual assessment by expert observers vs. the analysis of statistical properties of synthetic images; Discuss the role of observer models in the assessment of realism; Summarize the industry perspective to virtual methods for breast imaging.« less
WebArray: an online platform for microarray data analysis

PubMed Central

Xia, Xiaoqin; McClelland, Michael; Wang, Yipeng

2005-01-01

Background Many cutting-edge microarray analysis tools and algorithms, including commonly used limma and affy packages in Bioconductor, need sophisticated knowledge of mathematics, statistics and computer skills for implementation. Commercially available software can provide a user-friendly interface at considerable cost. To facilitate the use of these tools for microarray data analysis on an open platform we developed an online microarray data analysis platform, WebArray, for bench biologists to utilize these tools to explore data from single/dual color microarray experiments. Results The currently implemented functions were based on limma and affy package from Bioconductor, the spacings LOESS histogram (SPLOSH) method, PCA-assisted normalization method and genome mapping method. WebArray incorporates these packages and provides a user-friendly interface for accessing a wide range of key functions of limma and others, such as spot quality weight, background correction, graphical plotting, normalization, linear modeling, empirical bayes statistical analysis, false discovery rate (FDR) estimation, chromosomal mapping for genome comparison. Conclusion WebArray offers a convenient platform for bench biologists to access several cutting-edge microarray data analysis tools. The website is freely available at . It runs on a Linux server with Apache and MySQL. PMID:16371165
Exploring students’ perceived and actual ability in solving statistical problems based on Rasch measurement tools

NASA Astrophysics Data System (ADS)

Azila Che Musa, Nor; Mahmud, Zamalia; Baharun, Norhayati

2017-09-01

One of the important skills that is required from any student who are learning statistics is knowing how to solve statistical problems correctly using appropriate statistical methods. This will enable them to arrive at a conclusion and make a significant contribution and decision for the society. In this study, a group of 22 students majoring in statistics at UiTM Shah Alam were given problems relating to topics on testing of hypothesis which require them to solve the problems using confidence interval, traditional and p-value approach. Hypothesis testing is one of the techniques used in solving real problems and it is listed as one of the difficult concepts for students to grasp. The objectives of this study is to explore students’ perceived and actual ability in solving statistical problems and to determine which item in statistical problem solving that students find difficult to grasp. Students’ perceived and actual ability were measured based on the instruments developed from the respective topics. Rasch measurement tools such as Wright map and item measures for fit statistics were used to accomplish the objectives. Data were collected and analysed using Winsteps 3.90 software which is developed based on the Rasch measurement model. The results showed that students’ perceived themselves as moderately competent in solving the statistical problems using confidence interval and p-value approach even though their actual performance showed otherwise. Item measures for fit statistics also showed that the maximum estimated measures were found on two problems. These measures indicate that none of the students have attempted these problems correctly due to reasons which include their lack of understanding in confidence interval and probability values.
Analysis of Uncertainty and Variability in Finite Element Computational Models for Biomedical Engineering: Characterization and Propagation

PubMed Central

Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González

2016-01-01

Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering. PMID:27872840
Analysis of Uncertainty and Variability in Finite Element Computational Models for Biomedical Engineering: Characterization and Propagation.

PubMed

Mangado, Nerea; Piella, Gemma; Noailly, Jérôme; Pons-Prats, Jordi; Ballester, Miguel Ángel González

2016-01-01

Computational modeling has become a powerful tool in biomedical engineering thanks to its potential to simulate coupled systems. However, real parameters are usually not accurately known, and variability is inherent in living organisms. To cope with this, probabilistic tools, statistical analysis and stochastic approaches have been used. This article aims to review the analysis of uncertainty and variability in the context of finite element modeling in biomedical engineering. Characterization techniques and propagation methods are presented, as well as examples of their applications in biomedical finite element simulations. Uncertainty propagation methods, both non-intrusive and intrusive, are described. Finally, pros and cons of the different approaches and their use in the scientific community are presented. This leads us to identify future directions for research and methodological development of uncertainty modeling in biomedical engineering.
Gaussian and Lognormal Models of Hurricane Gust Factors

NASA Technical Reports Server (NTRS)

Merceret, Frank

2009-01-01

A document describes a tool that predicts the likelihood of land-falling tropical storms and hurricanes exceeding specified peak speeds, given the mean wind speed at various heights of up to 500 feet (150 meters) above ground level. Empirical models to calculate mean and standard deviation of the gust factor as a function of height and mean wind speed were developed in Excel based on data from previous hurricanes. Separate models were developed for Gaussian and offset lognormal distributions for the gust factor. Rather than forecasting a single, specific peak wind speed, this tool provides a probability of exceeding a specified value. This probability is provided as a function of height, allowing it to be applied at a height appropriate for tall structures. The user inputs the mean wind speed, height, and operational threshold. The tool produces the probability from each model that the given threshold will be exceeded. This application does have its limits. They were tested only in tropical storm conditions associated with the periphery of hurricanes. Winds of similar speed produced by non-tropical system may have different turbulence dynamics and stability, which may change those winds statistical characteristics. These models were developed along the Central Florida seacoast, and their results may not accurately extrapolate to inland areas, or even to coastal sites that are different from those used to build the models. Although this tool cannot be generalized for use in different environments, its methodology could be applied to those locations to develop a similar tool tuned to local conditions.
Statistically Based Approach to Broadband Liner Design and Assessment

NASA Technical Reports Server (NTRS)

Jones, Michael G. (Inventor); Nark, Douglas M. (Inventor)

2016-01-01

A broadband liner design optimization includes utilizing in-duct attenuation predictions with a statistical fan source model to obtain optimum impedance spectra over a number of flow conditions for one or more liner locations in a bypass duct. The predicted optimum impedance information is then used with acoustic liner modeling tools to design liners having impedance spectra that most closely match the predicted optimum values. Design selection is based on an acceptance criterion that provides the ability to apply increasing weighting to specific frequencies and/or operating conditions. One or more broadband design approaches are utilized to produce a broadband liner that targets a full range of frequencies and operating conditions.
Statistical, economic and other tools for assessing natural aggregate

USGS Publications Warehouse

Bliss, J.D.; Moyle, P.R.; Bolm, K.S.

2003-01-01

Quantitative aggregate resource assessment provides resource estimates useful for explorationists, land managers and those who make decisions about land allocation, which may have long-term implications concerning cost and the availability of aggregate resources. Aggregate assessment needs to be systematic and consistent, yet flexible enough to allow updating without invalidating other parts of the assessment. Evaluators need to use standard or consistent aggregate classification and statistic distributions or, in other words, models with geological, geotechnical and economic variables or interrelationships between these variables. These models can be used with subjective estimates, if needed, to estimate how much aggregate may be present in a region or country using distributions generated by Monte Carlo computer simulations.
Ranking of Business Process Simulation Software Tools with DEX/QQ Hierarchical Decision Model.

PubMed

Damij, Nadja; Boškoski, Pavle; Bohanec, Marko; Mileva Boshkoska, Biljana

2016-01-01

The omnipresent need for optimisation requires constant improvements of companies' business processes (BPs). Minimising the risk of inappropriate BP being implemented is usually performed by simulating the newly developed BP under various initial conditions and "what-if" scenarios. An effectual business process simulations software (BPSS) is a prerequisite for accurate analysis of an BP. Characterisation of an BPSS tool is a challenging task due to the complex selection criteria that includes quality of visual aspects, simulation capabilities, statistical facilities, quality reporting etc. Under such circumstances, making an optimal decision is challenging. Therefore, various decision support models are employed aiding the BPSS tool selection. The currently established decision support models are either proprietary or comprise only a limited subset of criteria, which affects their accuracy. Addressing this issue, this paper proposes a new hierarchical decision support model for ranking of BPSS based on their technical characteristics by employing DEX and qualitative to quantitative (QQ) methodology. Consequently, the decision expert feeds the required information in a systematic and user friendly manner. There are three significant contributions of the proposed approach. Firstly, the proposed hierarchical model is easily extendible for adding new criteria in the hierarchical structure. Secondly, a fully operational decision support system (DSS) tool that implements the proposed hierarchical model is presented. Finally, the effectiveness of the proposed hierarchical model is assessed by comparing the resulting rankings of BPSS with respect to currently available results.
An interactive Bayesian model for prediction of lymph node ratio and survival in pancreatic cancer patients.

PubMed

Smith, Brian J; Mezhir, James J

2014-10-01

Regional lymph node status has long been used as a dichotomous predictor of clinical outcomes in cancer patients. More recently, interest has turned to the prognostic utility of lymph node ratio (LNR), quantified as the proportion of positive nodes examined. However, statistical tools for the joint modeling of LNR and its effect on cancer survival are lacking. Data were obtained from the NCI SEER cancer registry on 6400 patients diagnosed with pancreatic ductal adenocarcinoma from 2004 to 2010 and who underwent radical oncologic resection. A novel Bayesian statistical approach was developed and applied to model simultaneously patients' true, but unobservable, LNR statuses and overall survival. New web development tools were then employed to create an interactive web application for individualized patient prediction. Histologic grade and T and M stages were important predictors of LNR status. Significant predictors of survival included age, gender, marital status, grade, histology, T and M stages, tumor size, and radiation therapy. LNR was found to have a highly significant, non-linear effect on survival. Furthermore, predictive performance of the survival model compared favorably to those from studies with more homogeneous patients and individualized predictors. We provide a new approach and tool set for the prediction of LNR and survival that are generally applicable to a host of cancer types, including breast, colon, melanoma, and stomach. Our methods are illustrated with the development of a validated model and web applications for the prediction of survival in a large set of pancreatic cancer patients. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
The Water Availability Tool for Environmental Resources (WATER): A Water-Budget Modeling Approach for Managing Water-Supply Resources in Kentucky - Phase I: Data Processing, Model Development, and Application to Non-Karst Areas

USGS Publications Warehouse

Williamson, Tanja N.; Odom, Kenneth R.; Newson, Jeremy K.; Downs, Aimee C.; Nelson, Hugh L.; Cinotto, Peter J.; Ayers, Mark A.

2009-01-01

The Water Availability Tool for Environmental Resources (WATER) was developed in cooperation with the Kentucky Division of Water to provide a consistent and defensible method of estimating streamflow and water availability in ungaged basins. WATER is process oriented; it is based on the TOPMODEL code and incorporates historical water-use data together with physiographic data that quantitatively describe topography and soil-water storage. The result is a user-friendly decision tool that can estimate water availability in non-karst areas of Kentucky without additional data or processing. The model runs on a daily time step, and critical source data include a historical record of daily temperature and precipitation, digital elevation models (DEMs), the Soil Survey Geographic Database (SSURGO), and historical records of water discharges and withdrawals. The model was calibrated and statistically evaluated for 12 basins by comparing the estimated discharge to that observed at U.S. Geological Survey streamflow-gaging stations. When statistically evaluated over a 2,119-day time period, the discharge estimates showed a bias of -0.29 to 0.42, a root mean square error of 1.66 to 5.06, a correlation of 0.54 to 0.85, and a Nash-Sutcliffe Efficiency of 0.26 to 0.72. The parameter and input modifications that most significantly improved the accuracy and precision of streamflow-discharge estimates were the addition of Next Generation radar (NEXRAD) precipitation data, a rooting depth of 30 centimeters, and a TOPMODEL scaling parameter (m) derived directly from SSURGO data that was multiplied by an adjustment factor of 0.10. No site-specific optimization was used.
Improving a complex finite-difference ground water flow model through the use of an analytic element screening model

USGS Publications Warehouse

Hunt, R.J.; Anderson, M.P.; Kelson, V.A.

1998-01-01

This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.This paper demonstrates that analytic element models have potential as powerful screening tools that can facilitate or improve calibration of more complicated finite-difference and finite-element models. We demonstrate how a two-dimensional analytic element model was used to identify errors in a complex three-dimensional finite-difference model caused by incorrect specification of boundary conditions. An improved finite-difference model was developed using boundary conditions developed from a far-field analytic element model. Calibration of a revised finite-difference model was achieved using fewer zones of hydraulic conductivity and lake bed conductance than the original finite-difference model. Calibration statistics were also improved in that simulated base-flows were much closer to measured values. The improved calibration is due mainly to improved specification of the boundary conditions made possible by first solving the far-field problem with an analytic element model.
Development of a short-term irradiance prediction system using post-processing tools on WRF-ARW meteorological forecasts in Spain

NASA Astrophysics Data System (ADS)

Rincón, A.; Jorba, O.; Baldasano, J. M.

2010-09-01

The increased contribution of solar energy in power generation sources requires an accurate estimation of surface solar irradiance conditioned by geographical, temporal and meteorological conditions. The knowledge of the variability of these factors is essential to estimate the expected energy production and therefore help stabilizing the electricity grid and increase the reliability of available solar energy. The use of numerical meteorological models in combination with statistical post-processing tools may have the potential to satisfy the requirements for short-term forecasting of solar irradiance for up to several days ahead and its application in solar devices. In this contribution, we present an assessment of a short-term irradiance prediction system based on the WRF-ARW mesoscale meteorological model (Skamarock et al., 2005) and several post-processing tools in order to improve the overall skills of the system in an annual simulation of the year 2004 in Spain. The WRF-ARW model is applied with 4 km x 4 km horizontal resolution and 38 vertical layers over the Iberian Peninsula. The hourly model irradiance is evaluated against more than 90 surface stations. The stations are used to assess the temporal and spatial fluctuations and trends of the system evaluating three different post-processes: Model Output Statistics technique (MOS; Glahn and Lowry, 1972), Recursive statistical method (REC; Boi, 2004) and Kalman Filter Predictor (KFP, Bozic, 1994; Roeger et al., 2003). A first evaluation of the system without post-processing tools shows an overestimation of the surface irradiance, due to the lack of atmospheric absorbers attenuation different than clouds not included in the meteorological model. This produces an annual BIAS of 16 W m-2 h-1, annual RMSE of 106 W m-2 h-1 and annual NMAE of 42%. The largest errors are observed in spring and summer, reaching RMSE of 350 W m-2 h-1. Results using Kalman Filter Predictor show a reduction of 8% of RMSE, 83% of BIAS, and NMAE decreases down to 32%. The REC method shows a reduction of 6% of RMSE, 79% of BIAS, and NMAE decreases down to 28%. When comparing stations at different altitudes, the overestimation is enhanced at coastal stations (less than 200m) up to 900 W m-2 h-1. The results allow us to analyze strengths and drawbacks of the irradiance prediction system and its application in the estimation of energy production from photovoltaic system cells. References Boi, P.: A statistical method for forecasting extreme daily temperatures using ECMWF 2-m temperatures and ground station measurements, Meteorol. Appl., 11, 245-251, 2004. Bozic, S.: Digital and Kalman filtering, John Wiley, Hoboken, New Jersey, 2nd edn., 1994. Glahn, H. and Lowry, D.: The use of Model Output Statistics (MOS) in Objective Weather Forecasting, Applied Meteorology, 11, 1203-1211, 1972. Roeger, C., Stull, R., McClung, D., Hacker, J., Deng, X., and Modzelewski, H.: Verification of Mesoscale Numerical Weather Forecasts in Mountainous Terrain for Application to Avalanche Prediction, Weather and forecasting, 18, 1140-1160, 2003. Skamarock, W., Klemp, J., Dudhia, J., Gill, D., Barker, D. M., Wang, W., and Powers, J. G.: A Description of the Advanced Research WRF Version 2, Tech. Rep. NCAR/TN-468+STR, NCAR Technical note, 2005.
Instruction of Statistics via Computer-Based Tools: Effects on Statistics' Anxiety, Attitude, and Achievement

ERIC Educational Resources Information Center

Ciftci, S. Koza; Karadag, Engin; Akdal, Pinar

2014-01-01

The purpose of this study was to determine the effect of statistics instruction using computer-based tools, on statistics anxiety, attitude, and achievement. This study was designed as quasi-experimental research and the pattern used was a matched pre-test/post-test with control group design. Data was collected using three scales: a Statistics…
Statistical inference to advance network models in epidemiology.

PubMed

Welch, David; Bansal, Shweta; Hunter, David R

2011-03-01

Contact networks are playing an increasingly important role in the study of epidemiology. Most of the existing work in this area has focused on considering the effect of underlying network structure on epidemic dynamics by using tools from probability theory and computer simulation. This work has provided much insight on the role that heterogeneity in host contact patterns plays on infectious disease dynamics. Despite the important understanding afforded by the probability and simulation paradigm, this approach does not directly address important questions about the structure of contact networks such as what is the best network model for a particular mode of disease transmission, how parameter values of a given model should be estimated, or how precisely the data allow us to estimate these parameter values. We argue that these questions are best answered within a statistical framework and discuss the role of statistical inference in estimating contact networks from epidemiological data. Copyright © 2011 Elsevier B.V. All rights reserved.
Statistical Analysis of Model Data for Operational Space Launch Weather Support at Kennedy Space Center and Cape Canaveral Air Force Station

NASA Technical Reports Server (NTRS)

Bauman, William H., III

2010-01-01

The 12-km resolution North American Mesoscale (NAM) model (MesoNAM) is used by the 45th Weather Squadron (45 WS) Launch Weather Officers at Kennedy Space Center (KSC) and Cape Canaveral Air Force Station (CCAFS) to support space launch weather operations. The 45 WS tasked the Applied Meteorology Unit to conduct an objective statistics-based analysis of MesoNAM output compared to wind tower mesonet observations and then develop a an operational tool to display the results. The National Centers for Environmental Prediction began running the current version of the MesoNAM in mid-August 2006. The period of record for the dataset was 1 September 2006 - 31 January 2010. The AMU evaluated MesoNAM hourly forecasts from 0 to 84 hours based on model initialization times of 00, 06, 12 and 18 UTC. The MesoNAM forecast winds, temperature and dew point were compared to the observed values of these parameters from the sensors in the KSC/CCAFS wind tower network. The data sets were stratified by model initialization time, month and onshore/offshore flow for each wind tower. Statistics computed included bias (mean difference), standard deviation of the bias, root mean square error (RMSE) and a hypothesis test for bias = O. Twelve wind towers located in close proximity to key launch complexes were used for the statistical analysis with the sensors on the towers positioned at varying heights to include 6 ft, 30 ft, 54 ft, 60 ft, 90 ft, 162 ft, 204 ft and 230 ft depending on the launch vehicle and associated weather launch commit criteria being evaluated. These twelve wind towers support activities for the Space Shuttle (launch and landing), Delta IV, Atlas V and Falcon 9 launch vehicles. For all twelve towers, the results indicate a diurnal signal in the bias of temperature (T) and weaker but discernable diurnal signal in the bias of dewpoint temperature (T(sub d)) in the MesoNAM forecasts. Also, the standard deviation of the bias and RMSE of T, T(sub d), wind speed and wind direction indicated the model error increased with the forecast period all four parameters. The hypothesis testing uses statistics to determine the probability that a given hypothesis is true. The goal of using the hypothesis test was to determine if the model bias of any of the parameters assessed throughout the model forecast period was statistically zero. For th is dataset, if this test produced a value >= -1 .96 or <= 1.96 for a data point, then the bias at that point was effectively zero and the model forecast for that point was considered to have no error. A graphical user interface (GUI) was developed so the 45 WS would have an operational tool at their disposal that would be easy to navigate among the multiple stratifications of information to include tower locations, month, model initialization times, sensor heights and onshore/offshore flow. The AMU developed the GUI using HyperText Markup Language (HTML) so the tool could be used in most popular web browsers with computers running different operating systems such as Microsoft Windows and Linux.
Prediction of morbidity and mortality in patients with type 2 diabetes.

PubMed

Wells, Brian J; Roth, Rachel; Nowacki, Amy S; Arrigain, Susana; Yu, Changhong; Rosenkrans, Wayne A; Kattan, Michael W

2013-01-01

Introduction. The objective of this study was to create a tool that accurately predicts the risk of morbidity and mortality in patients with type 2 diabetes according to an oral hypoglycemic agent. Materials and Methods. The model was based on a cohort of 33,067 patients with type 2 diabetes who were prescribed a single oral hypoglycemic agent at the Cleveland Clinic between 1998 and 2006. Competing risk regression models were created for coronary heart disease (CHD), heart failure, and stroke, while a Cox regression model was created for mortality. Propensity scores were used to account for possible treatment bias. A prediction tool was created and internally validated using tenfold cross-validation. The results were compared to a Framingham model and a model based on the United Kingdom Prospective Diabetes Study (UKPDS) for CHD and stroke, respectively. Results and Discussion. Median follow-up for the mortality outcome was 769 days. The numbers of patients experiencing events were as follows: CHD (3062), heart failure (1408), stroke (1451), and mortality (3661). The prediction tools demonstrated the following concordance indices (c-statistics) for the specific outcomes: CHD (0.730), heart failure (0.753), stroke (0.688), and mortality (0.719). The prediction tool was superior to the Framingham model at predicting CHD and was at least as accurate as the UKPDS model at predicting stroke. Conclusions. We created an accurate tool for predicting the risk of stroke, coronary heart disease, heart failure, and death in patients with type 2 diabetes. The calculator is available online at http://rcalc.ccf.org under the heading "Type 2 Diabetes" and entitled, "Predicting 5-Year Morbidity and Mortality." This may be a valuable tool to aid the clinician's choice of an oral hypoglycemic, to better inform patients, and to motivate dialogue between physician and patient.
The epistemological status of general circulation models

NASA Astrophysics Data System (ADS)

Loehle, Craig

2018-03-01

Forecasts of both likely anthropogenic effects on climate and consequent effects on nature and society are based on large, complex software tools called general circulation models (GCMs). Forecasts generated by GCMs have been used extensively in policy decisions related to climate change. However, the relation between underlying physical theories and results produced by GCMs is unclear. In the case of GCMs, many discretizations and approximations are made, and simulating Earth system processes is far from simple and currently leads to some results with unknown energy balance implications. Statistical testing of GCM forecasts for degree of agreement with data would facilitate assessment of fitness for use. If model results need to be put on an anomaly basis due to model bias, then both visual and quantitative measures of model fit depend strongly on the reference period used for normalization, making testing problematic. Epistemology is here applied to problems of statistical inference during testing, the relationship between the underlying physics and the models, the epistemic meaning of ensemble statistics, problems of spatial and temporal scale, the existence or not of an unforced null for climate fluctuations, the meaning of existing uncertainty estimates, and other issues. Rigorous reasoning entails carefully quantifying levels of uncertainty.
Habitat classification modeling with incomplete data: Pushing the habitat envelope

USGS Publications Warehouse

Zarnetske, P.L.; Edwards, T.C.; Moisen, Gretchen G.

2007-01-01

Habitat classification models (HCMs) are invaluable tools for species conservation, land-use planning, reserve design, and metapopulation assessments, particularly at broad spatial scales. However, species occurrence data are often lacking and typically limited to presence points at broad scales. This lack of absence data precludes the use of many statistical techniques for HCMs. One option is to generate pseudo-absence points so that the many available statistical modeling tools can be used. Traditional techniques generate pseudoabsence points at random across broadly defined species ranges, often failing to include biological knowledge concerning the species-habitat relationship. We incorporated biological knowledge of the species-habitat relationship into pseudo-absence points by creating habitat envelopes that constrain the region from which points were randomly selected. We define a habitat envelope as an ecological representation of a species, or species feature's (e.g., nest) observed distribution (i.e., realized niche) based on a single attribute, or the spatial intersection of multiple attributes. We created HCMs for Northern Goshawk (Accipiter gentilis atricapillus) nest habitat during the breeding season across Utah forests with extant nest presence points and ecologically based pseudo-absence points using logistic regression. Predictor variables were derived from 30-m USDA Landfire and 250-m Forest Inventory and Analysis (FIA) map products. These habitat-envelope-based models were then compared to null envelope models which use traditional practices for generating pseudo-absences. Models were assessed for fit and predictive capability using metrics such as kappa, thresholdindependent receiver operating characteristic (ROC) plots, adjusted deviance (Dadj2), and cross-validation, and were also assessed for ecological relevance. For all cases, habitat envelope-based models outperformed null envelope models and were more ecologically relevant, suggesting that incorporating biological knowledge into pseudo-absence point generation is a powerful tool for species habitat assessments. Furthermore, given some a priori knowledge of the species-habitat relationship, ecologically based pseudo-absence points can be applied to any species, ecosystem, data resolution, and spatial extent. ?? 2007 by the Ecological Society of America.
A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

NASA Astrophysics Data System (ADS)

Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em

2017-09-01

Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.

The relationship between particulate pollution levels in Australian cities, meteorology, and landscape fire activity detected from MODIS hotspots.

PubMed

Price, Owen F; Williamson, Grant J; Henderson, Sarah B; Johnston, Fay; Bowman, David M J S

2012-01-01

Smoke from bushfires is an emerging issue for fire managers because of increasing evidence for its public health effects. Development of forecasting models to predict future pollution levels based on the relationship between bushfire activity and current pollution levels would be a useful management tool. As a first step, we use daily thermal anomalies detected by the MODIS Active Fire Product (referred to as "hotspots"), pollution concentrations, and meteorological data for the years 2002 to 2008, to examine the statistical relationship between fire activity in the landscapes and pollution levels around Perth and Sydney, two large Australian cities. Resultant models were statistically significant, but differed in their goodness of fit and the distance at which the strength of the relationship was strongest. For Sydney, a univariate model for hotspot activity within 100 km explained 24% of variation in pollution levels, and the best model including atmospheric variables explained 56% of variation. For Perth, the best radius was 400 km, explaining only 7% of variation, while the model including atmospheric variables explained 31% of the variation. Pollution was higher when the atmosphere was more stable and in the presence of on-shore winds, whereas there was no effect of wind blowing from the fires toward the pollution monitors. Our analysis shows there is a good prospect for developing region-specific forecasting tools combining hotspot fire activity with meteorological data.
Role of Statistical Random-Effects Linear Models in Personalized Medicine.

PubMed

Diaz, Francisco J; Yeh, Hung-Wen; de Leon, Jose

2012-03-01

Some empirical studies and recent developments in pharmacokinetic theory suggest that statistical random-effects linear models are valuable tools that allow describing simultaneously patient populations as a whole and patients as individuals. This remarkable characteristic indicates that these models may be useful in the development of personalized medicine, which aims at finding treatment regimes that are appropriate for particular patients, not just appropriate for the average patient. In fact, published developments show that random-effects linear models may provide a solid theoretical framework for drug dosage individualization in chronic diseases. In particular, individualized dosages computed with these models by means of an empirical Bayesian approach may produce better results than dosages computed with some methods routinely used in therapeutic drug monitoring. This is further supported by published empirical and theoretical findings that show that random effects linear models may provide accurate representations of phase III and IV steady-state pharmacokinetic data, and may be useful for dosage computations. These models have applications in the design of clinical algorithms for drug dosage individualization in chronic diseases; in the computation of dose correction factors; computation of the minimum number of blood samples from a patient that are necessary for calculating an optimal individualized drug dosage in therapeutic drug monitoring; measure of the clinical importance of clinical, demographic, environmental or genetic covariates; study of drug-drug interactions in clinical settings; the implementation of computational tools for web-site-based evidence farming; design of pharmacogenomic studies; and in the development of a pharmacological theory of dosage individualization.
Eutrophication risk assessment in coastal embayments using simple statistical models.

PubMed

Arhonditsis, G; Eleftheriadou, M; Karydis, M; Tsirtsis, G

2003-09-01

A statistical methodology is proposed for assessing the risk of eutrophication in marine coastal embayments. The procedure followed was the development of regression models relating the levels of chlorophyll a (Chl) with the concentration of the limiting nutrient--usually nitrogen--and the renewal rate of the systems. The method was applied in the Gulf of Gera, Island of Lesvos, Aegean Sea and a surrogate for renewal rate was created using the Canberra metric as a measure of the resemblance between the Gulf and the oligotrophic waters of the open sea in terms of their physical, chemical and biological properties. The Chl-total dissolved nitrogen-renewal rate regression model was the most significant, accounting for 60% of the variation observed in Chl. Predicted distributions of Chl for various combinations of the independent variables, based on Bayesian analysis of the models, enabled comparison of the outcomes of specific scenarios of interest as well as further analysis of the system dynamics. The present statistical approach can be used as a methodological tool for testing the resilience of coastal ecosystems under alternative managerial schemes and levels of exogenous nutrient loading.
Re-evaluating causal modeling with mantel tests in landscape genetics

Treesearch

Samuel A. Cushman; Tzeidle N. Wasserman; Erin L. Landguth; Andrew J. Shirk

2013-01-01

The predominant analytical approach to associate landscape patterns with gene flow processes is based on the association of cost distances with genetic distances between individuals. Mantel and partial Mantel tests have been the dominant statistical tools used to correlate cost distances and genetic distances in landscape genetics. However, the inherent high...
TESF Methodology for Statistics Education Improvement

ERIC Educational Resources Information Center

Barone, Stefano; Lo Franco, Eva

2010-01-01

The need for universities to achieve excellence in the services they provide has been the subject of research for several decades. The idea of involving students and recognizing the importance of their opinions has led to the creation of various models and tools. This paper focuses on teaching, a central service from which improvement actions of…
Data mining in child welfare.

PubMed

Schoech, D; Quinn, A; Rycraft, J R

2000-01-01

Data mining is the sifting through of voluminous data to extract knowledge for decision making. This article illustrates the context, concepts, processes, techniques, and tools of data mining, using statistical and neural network analyses on a dataset concerning employee turnover. The resulting models and their predictive capability, advantages and disadvantages, and implications for decision support are highlighted.
A Computational Study of the Energy Dissipation Through an Acrylic Target Impacted by Various Size FSP

DTIC Science & Technology

2009-06-01

data, and then returns an array that describes the line. This function, when compared to the LOGEST statistical function of the Microsoft Excel, which...threats continues to grow, the ability to predict materials performances using advanced modeling tools increases. The current paper has demonstrated
Overview Of Recent Enhancements To The Bumper-II Meteoroid and Orbital Debris Risk Assessment Tool

NASA Technical Reports Server (NTRS)

Hyde, James L.; Christiansen, Eric L.; Lear, Dana M.; Prior, Thomas G.

2006-01-01

Discussion includes recent enhancements to the BUMPER-II program and input files in support of Shuttle Return to Flight. Improvements to the mesh definitions of the finite element input model will be presented. A BUMPER-II analysis process that was used to estimate statistical uncertainty is introduced.
Experimental investigation and statistical modeling of temperature rise in rotary ultrasonic bone drilling.

PubMed

Gupta, Vishal; Pandey, Pulak M

2016-11-01

Thermal necrosis is one of the major problems associated with the bone drilling process in orthopedic/trauma surgical operations. To overcome this problem a new bone drilling method has been introduced recently. Studies have been carried out with rotary ultrasonic drilling (RUD) on pig bones using diamond coated abrasive hollow tools. In the present work, influence of process parameters (rotational speed, feed rate, drill diameter and vibrational amplitude) on change in the temperature was studied using design of experiment technique i.e., response surface methodology (RSM) and data analysis was carried out using analysis of variance (ANOVA). Temperature was recorded and measured by using embedded thermocouple technique at a distance of 0.5mm, 1.0mm, 1.5mm and 2.0mm from the drill site. Statistical model was developed to predict the maximum temperature at the drill tool and bone interface. It was observed that temperature increased with increase in the rotational speed, feed rate and drill diameter and decreased with increase in the vibrational amplitude. Copyright © 2016 IPEM. Published by Elsevier Ltd. All rights reserved.
Genetic Simulation Resources: a website for the registration and discovery of genetic data simulators

PubMed Central

Peng, Bo; Chen, Huann-Sheng; Mechanic, Leah E.; Racine, Ben; Clarke, John; Clarke, Lauren; Gillanders, Elizabeth; Feuer, Eric J.

2013-01-01

Summary: Many simulation methods and programs have been developed to simulate genetic data of the human genome. These data have been widely used, for example, to predict properties of populations retrospectively or prospectively according to mathematically intractable genetic models, and to assist the validation, statistical inference and power analysis of a variety of statistical models. However, owing to the differences in type of genetic data of interest, simulation methods, evolutionary features, input and output formats, terminologies and assumptions for different applications, choosing the right tool for a particular study can be a resource-intensive process that usually involves searching, downloading and testing many different simulation programs. Genetic Simulation Resources (GSR) is a website provided by the National Cancer Institute (NCI) that aims to help researchers compare and choose the appropriate simulation tools for their studies. This website allows authors of simulation software to register their applications and describe them with well-defined attributes, thus allowing site users to search and compare simulators according to specified features. Availability: http://popmodels.cancercontrol.cancer.gov/gsr. Contact: gsr@mail.nih.gov PMID:23435068
SOCR: Statistics Online Computational Resource

PubMed Central

Dinov, Ivo D.

2011-01-01

The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
Statistical analysis of effective singular values in matrix rank determination

NASA Technical Reports Server (NTRS)

Konstantinides, Konstantinos; Yao, Kung

1988-01-01

A major problem in using SVD (singular-value decomposition) as a tool in determining the effective rank of a perturbed matrix is that of distinguishing between significantly small and significantly large singular values to the end, conference regions are derived for the perturbed singular values of matrices with noisy observation data. The analysis is based on the theories of perturbations of singular values and statistical significance test. Threshold bounds for perturbation due to finite-precision and i.i.d. random models are evaluated. In random models, the threshold bounds depend on the dimension of the matrix, the noisy variance, and predefined statistical level of significance. Results applied to the problem of determining the effective order of a linear autoregressive system from the approximate rank of a sample autocorrelation matrix are considered. Various numerical examples illustrating the usefulness of these bounds and comparisons to other previously known approaches are given.
Geoscience in the Big Data Era: Are models obsolete?

NASA Astrophysics Data System (ADS)

Yuen, D. A.; Zheng, L.; Stark, P. B.; Morra, G.; Knepley, M.; Wang, X.

2016-12-01

In last few decades, the velocity, volume, and variety of geophysical data have increased, while the development of the Internet and distributed computing has led to the emergence of "data science." Fitting and running numerical models, especially based on PDEs, is the main consumer of flops in geoscience. Can large amounts of diverse data supplant modeling? Without the ability to conduct randomized, controlled experiments, causal inference requires understanding the physics. It is sometimes possible to predict well without understanding the system—if (1) the system is predictable, (2) data on "important" variables are available, and (3) the system changes slowly enough. And sometimes even a crude model can help the data "speak for themselves" much more clearly. For example, Shearer (1991) used a 1-dimensional velocity model to stack long-period seismograms, revealing upper mantle discontinuities. This was a "big data" approach: the main use of computing was in the data processing, rather than in modeling, yet the "signal" became clear. In contrast, modelers tend to use all available computing power to fit even more complex models, resulting in a cycle where uncertainty quantification (UQ) is never possible: even if realistic UQ required only 1,000 model evaluations, it is never in reach. To make more reliable inferences requires better data analysis and statistics, not more complex models. Geoscientists need to learn new skills and tools: sound software engineering practices; open programming languages suitable for big data; parallel and distributed computing; data visualization; and basic nonparametric, computationally based statistical inference, such as permutation tests. They should work reproducibly, scripting all analyses and avoiding point-and-click tools.
iCFD: Interpreted Computational Fluid Dynamics - Degeneration of CFD to one-dimensional advection-dispersion models using statistical experimental design - The secondary clarifier.

PubMed

Guyonvarch, Estelle; Ramin, Elham; Kulahci, Murat; Plósz, Benedek Gy

2015-10-15

The present study aims at using statistically designed computational fluid dynamics (CFD) simulations as numerical experiments for the identification of one-dimensional (1-D) advection-dispersion models - computationally light tools, used e.g., as sub-models in systems analysis. The objective is to develop a new 1-D framework, referred to as interpreted CFD (iCFD) models, in which statistical meta-models are used to calculate the pseudo-dispersion coefficient (D) as a function of design and flow boundary conditions. The method - presented in a straightforward and transparent way - is illustrated using the example of a circular secondary settling tank (SST). First, the significant design and flow factors are screened out by applying the statistical method of two-level fractional factorial design of experiments. Second, based on the number of significant factors identified through the factor screening study and system understanding, 50 different sets of design and flow conditions are selected using Latin Hypercube Sampling (LHS). The boundary condition sets are imposed on a 2-D axi-symmetrical CFD simulation model of the SST. In the framework, to degenerate the 2-D model structure, CFD model outputs are approximated by the 1-D model through the calibration of three different model structures for D. Correlation equations for the D parameter then are identified as a function of the selected design and flow boundary conditions (meta-models), and their accuracy is evaluated against D values estimated in each numerical experiment. The evaluation and validation of the iCFD model structure is carried out using scenario simulation results obtained with parameters sampled from the corners of the LHS experimental region. For the studied SST, additional iCFD model development was carried out in terms of (i) assessing different density current sub-models; (ii) implementation of a combined flocculation, hindered, transient and compression settling velocity function; and (iii) assessment of modelling the onset of transient and compression settling. Furthermore, the optimal level of model discretization both in 2-D and 1-D was undertaken. Results suggest that the iCFD model developed for the SST through the proposed methodology is able to predict solid distribution with high accuracy - taking a reasonable computational effort - when compared to multi-dimensional numerical experiments, under a wide range of flow and design conditions. iCFD tools could play a crucial role in reliably predicting systems' performance under normal and shock events. Copyright © 2015 Elsevier Ltd. All rights reserved.
Modelling and interpreting spectral energy distributions of galaxies with BEAGLE

NASA Astrophysics Data System (ADS)

Chevallard, Jacopo; Charlot, Stéphane

2016-10-01

We present a new-generation tool to model and interpret spectral energy distributions (SEDs) of galaxies, which incorporates in a consistent way the production of radiation and its transfer through the interstellar and intergalactic media. This flexible tool, named BEAGLE (for BayEsian Analysis of GaLaxy sEds), allows one to build mock galaxy catalogues as well as to interpret any combination of photometric and spectroscopic galaxy observations in terms of physical parameters. The current version of the tool includes versatile modelling of the emission from stars and photoionized gas, attenuation by dust and accounting for different instrumental effects, such as spectroscopic flux calibration and line spread function. We show a first application of the BEAGLE tool to the interpretation of broad-band SEDs of a published sample of ˜ 10^4 galaxies at redshifts 0.1 ≲ z ≲ 8. We find that the constraints derived on photometric redshifts using this multipurpose tool are comparable to those obtained using public, dedicated photometric-redshift codes and quantify this result in a rigorous statistical way. We also show how the post-processing of BEAGLE output data with the PYTHON extension PYP-BEAGLE allows the characterization of systematic deviations between models and observations, in particular through posterior predictive checks. The modular design of the BEAGLE tool allows easy extensions to incorporate, for example, the absorption by neutral galactic and circumgalactic gas, and the emission from an active galactic nucleus, dust and shock-ionized gas. Information about public releases of the BEAGLE tool will be maintained on http://www.jacopochevallard.org/beagle.
New statistical potential for quality assessment of protein models and a survey of energy functions

PubMed Central

2010-01-01

Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality. PMID:20226048
The road map towards providing a robust Raman spectroscopy-based cancer diagnostic platform and integration into clinic

NASA Astrophysics Data System (ADS)

Lau, Katherine; Isabelle, Martin; Lloyd, Gavin R.; Old, Oliver; Shepherd, Neil; Bell, Ian M.; Dorney, Jennifer; Lewis, Aaran; Gaifulina, Riana; Rodriguez-Justo, Manuel; Kendall, Catherine; Stone, Nicolas; Thomas, Geraint; Reece, David

2016-03-01

Despite the demonstrated potential as an accurate cancer diagnostic tool, Raman spectroscopy (RS) is yet to be adopted by the clinic for histopathology reviews. The Stratified Medicine through Advanced Raman Technologies (SMART) consortium has begun to address some of the hurdles in its adoption for cancer diagnosis. These hurdles include awareness and acceptance of the technology, practicality of integration into the histopathology workflow, data reproducibility and availability of transferrable models. We have formed a consortium, in joint efforts, to develop optimised protocols for tissue sample preparation, data collection and analysis. These protocols will be supported by provision of suitable hardware and software tools to allow statistically sound classification models to be built and transferred for use on different systems. In addition, we are building a validated gastrointestinal (GI) cancers model, which can be trialled as part of the histopathology workflow at hospitals, and a classification tool. At the end of the project, we aim to deliver a robust Raman based diagnostic platform to enable clinical researchers to stage cancer, define tumour margin, build cancer diagnostic models and discover novel disease bio markers.
CalFitter: a web server for analysis of protein thermal denaturation data.

PubMed

Mazurenko, Stanislav; Stourac, Jan; Kunka, Antonin; Nedeljkovic, Sava; Bednar, David; Prokop, Zbynek; Damborsky, Jiri

2018-05-14

Despite significant advances in the understanding of protein structure-function relationships, revealing protein folding pathways still poses a challenge due to a limited number of relevant experimental tools. Widely-used experimental techniques, such as calorimetry or spectroscopy, critically depend on a proper data analysis. Currently, there are only separate data analysis tools available for each type of experiment with a limited model selection. To address this problem, we have developed the CalFitter web server to be a unified platform for comprehensive data fitting and analysis of protein thermal denaturation data. The server allows simultaneous global data fitting using any combination of input data types and offers 12 protein unfolding pathway models for selection, including irreversible transitions often missing from other tools. The data fitting produces optimal parameter values, their confidence intervals, and statistical information to define unfolding pathways. The server provides an interactive and easy-to-use interface that allows users to directly analyse input datasets and simulate modelled output based on the model parameters. CalFitter web server is available free at https://loschmidt.chemi.muni.cz/calfitter/.
A κ-generalized statistical mechanics approach to income analysis

NASA Astrophysics Data System (ADS)

Clementi, F.; Gallegati, M.; Kaniadakis, G.

2009-02-01

This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.
Application of linear regression analysis in accuracy assessment of rolling force calculations

NASA Astrophysics Data System (ADS)

Poliak, E. I.; Shim, M. K.; Kim, G. S.; Choo, W. Y.

1998-10-01

Efficient operation of the computational models employed in process control systems require periodical assessment of the accuracy of their predictions. Linear regression is proposed as a tool which allows separate systematic and random prediction errors from those related to measurements. A quantitative characteristic of the model predictive ability is introduced in addition to standard statistical tests for model adequacy. Rolling force calculations are considered as an example for the application. However, the outlined approach can be used to assess the performance of any computational model.

Probing dark energy using convergence power spectrum and bi-spectrum

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dinda, Bikash R., E-mail: bikash@ctp-jamia.res.in

Weak lensing convergence statistics is a powerful tool to probe dark energy. Dark energy plays an important role to the structure formation and the effects can be detected through the convergence power spectrum, bi-spectrum etc. One of the most promising and simplest dark energy model is the ΛCDM . However, it is worth investigating different dark energy models with evolving equation of state of the dark energy. In this work, detectability of different dark energy models from ΛCDM model has been explored through convergence power spectrum and bi-spectrum.
Satellite Systems Design/Simulation Environment: A Systems Approach to Pre-Phase A Design

NASA Technical Reports Server (NTRS)

Ferebee, Melvin J., Jr.; Troutman, Patrick A.; Monell, Donald W.

1997-01-01

A toolset for the rapid development of small satellite systems has been created. The objective of this tool is to support the definition of spacecraft mission concepts to satisfy a given set of mission and instrument requirements. The objective of this report is to provide an introduction to understanding and using the SMALLSAT Model. SMALLSAT is a computer-aided Phase A design and technology evaluation tool for small satellites. SMALLSAT enables satellite designers, mission planners, and technology program managers to observe the likely consequences of their decisions in terms of satellite configuration, non-recurring and recurring cost, and mission life cycle costs and availability statistics. It was developed by Princeton Synergetic, Inc. and User Systems, Inc. as a revision of the previous TECHSAT Phase A design tool, which modeled medium-sized Earth observation satellites. Both TECHSAT and SMALLSAT were developed for NASA.
Statistical learning and selective inference.

PubMed

Taylor, Jonathan; Tibshirani, Robert J

2015-06-23

We describe the problem of "selective inference." This addresses the following challenge: Having mined a set of data to find potential associations, how do we properly assess the strength of these associations? The fact that we have "cherry-picked"--searched for the strongest associations--means that we must set a higher bar for declaring significant the associations that we see. This challenge becomes more important in the era of big data and complex statistical modeling. The cherry tree (dataset) can be very large and the tools for cherry picking (statistical learning methods) are now very sophisticated. We describe some recent new developments in selective inference and illustrate their use in forward stepwise regression, the lasso, and principal components analysis.
Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

NASA Technical Reports Server (NTRS)

Mog, Robert A.

1997-01-01

Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.
Sensitivity analysis of navy aviation readiness based sparing model

DTIC Science & Technology

2017-09-01

variability. (See Figure 4.) Figure 4. Research design flowchart 18 Figure 4 lays out the four steps of the methodology , starting in the upper left-hand...as a function of changes in key inputs. We develop NAVARM Experimental Designs (NED), a computational tool created by applying a state-of-the-art...experimental design to the NAVARM model. Statistical analysis of the resulting data identifies the most influential cost factors. Those are, in order of
Soft Sensors: Chemoinformatic Model for Efficient Control and Operation in Chemical Plants.

PubMed

Funatsu, Kimito

2016-12-01

Soft sensor is statistical model as an essential tool for controlling pharmaceutical, chemical and industrial plants. I introduce soft sensor, the roles, the applications, the problems and the research examples such as adaptive soft sensor, database monitoring and efficient process control. The use of soft sensor enables chemical industrial plants to be operated more effectively and stably. © 2016 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Health Impact Assessment (HIA) Resource and Tool ...

EPA Pesticide Factsheets

Health Impact Assessment (HIA) is a relatively new and rapidly emerging field in the U.S. An inventory of available HIA resources and tools was conducted, with a primary focus on resources developed in the U.S. The resources and tools available to HIA practitioners in the conduct of their work were identified through multiple methods and compiled into a comprehensive list. The compilation includes tools and resources related to the HIA process itself and those that can be used to collect and analyze data, establish a baseline profile, assess potential health impacts, and establish benchmarks and indicators for monitoring and evaluation. These resources include literature and evidence bases, data and statistics, guidelines, benchmarks, decision and economic analysis tools, scientific models, methods, frameworks, indices, mapping, and various data collection tools. Understanding the data, tools, models, methods, and other resources available to perform HIAs will help to advance the HIA community of practice in the U.S., improve the quality and rigor of assessments upon which stakeholder and policy decisions are based, and potentially improve the overall effectiveness of HIA to promote healthy and sustainable communities. The Health Impact Assessment (HIA) Resource and Tool Compilation is a comprehensive list of resources and tools that can be utilized by HIA practitioners with all levels of HIA experience to guide them throughout the HIA process. The HIA Resource
Machinability of titanium metal matrix composites (Ti-MMCs)

NASA Astrophysics Data System (ADS)

Aramesh, Maryam

Titanium metal matrix composites (Ti-MMCs), as a new generation of materials, have various potential applications in aerospace and automotive industries. The presence of ceramic particles enhances the physical and mechanical properties of the alloy matrix. However, the hard and abrasive nature of these particles causes various issues in the field of their machinability. Severe tool wear and short tool life are the most important drawbacks of machining this class of materials. There is very limited work in the literature regarding the machinability of this class of materials especially in the area of tool life estimation and tool wear. By far, polycrystalline diamond (PCD) tools appear to be the best choice for machining MMCs from researchers' point of view. However, due to their high cost, economical alternatives are sought. Cubic boron nitride (CBN) inserts, as the second hardest available tools, show superior characteristics such as great wear resistance, high hardness at elevated temperatures, a low coefficient of friction and a high melting point. Yet, so far CBN tools have not been studied during machining of Ti-MMCs. In this study, a comprehensive study has been performed to explore the tool wear mechanisms of CBN inserts during turning of Ti-MMCs. The unique morphology of the worn faces of the tools was investigated for the first time, which led to new insights in the identification of chemical wear mechanisms during machining of Ti-MMCs. Utilizing the full tool life capacity of cutting tools is also very crucial, due to the considerable costs associated with suboptimal replacement of tools. This strongly motivates development of a reliable model for tool life estimation under any cutting conditions. In this study, a novel model based on the survival analysis methodology is developed to estimate the progressive states of tool wear under any cutting conditions during machining of Ti-MMCs. This statistical model takes into account the machining time in addition to the effect of cutting parameters. Thus, promising results were obtained which showed a very good agreement with the experimental results. Moreover, a more advanced model was constructed, by adding the tool wear as another variable to the previous model. Therefore, a new model was proposed for estimating the remaining life of worn inserts under different cutting conditions, using the current tool wear data as an input. The results of this model were validated with the experimental results. The estimated results were well consistent with the results obtained from the experiments.
Statistical Approaches to Interpretation of Local, Regional, and National Highway-Runoff and Urban-Stormwater Data

USGS Publications Warehouse

Tasker, Gary D.; Granato, Gregory E.

2000-01-01

Decision makers need viable methods for the interpretation of local, regional, and national-highway runoff and urban-stormwater data including flows, concentrations and loads of chemical constituents and sediment, potential effects on receiving waters, and the potential effectiveness of various best management practices (BMPs). Valid (useful for intended purposes), current, and technically defensible stormwater-runoff models are needed to interpret data collected in field studies, to support existing highway and urban-runoffplanning processes, to meet National Pollutant Discharge Elimination System (NPDES) requirements, and to provide methods for computation of Total Maximum Daily Loads (TMDLs) systematically and economically. Historically, conceptual, simulation, empirical, and statistical models of varying levels of detail, complexity, and uncertainty have been used to meet various data-quality objectives in the decision-making processes necessary for the planning, design, construction, and maintenance of highways and for other land-use applications. Water-quality simulation models attempt a detailed representation of the physical processes and mechanisms at a given site. Empirical and statistical regional water-quality assessment models provide a more general picture of water quality or changes in water quality over a region. All these modeling techniques share one common aspect-their predictive ability is poor without suitable site-specific data for calibration. To properly apply the correct model, one must understand the classification of variables, the unique characteristics of water-resources data, and the concept of population structure and analysis. Classifying variables being used to analyze data may determine which statistical methods are appropriate for data analysis. An understanding of the characteristics of water-resources data is necessary to evaluate the applicability of different statistical methods, to interpret the results of these techniques, and to use tools and techniques that account for the unique nature of water-resources data sets. Populations of data on stormwater-runoff quantity and quality are often best modeled as logarithmic transformations. Therefore, these factors need to be considered to form valid, current, and technically defensible stormwater-runoff models. Regression analysis is an accepted method for interpretation of water-resources data and for prediction of current or future conditions at sites that fit the input data model. Regression analysis is designed to provide an estimate of the average response of a system as it relates to variation in one or more known variables. To produce valid models, however, regression analysis should include visual analysis of scatterplots, an examination of the regression equation, evaluation of the method design assumptions, and regression diagnostics. A number of statistical techniques are described in the text and in the appendixes to provide information necessary to interpret data by use of appropriate methods. Uncertainty is an important part of any decisionmaking process. In order to deal with uncertainty problems, the analyst needs to know the severity of the statistical uncertainty of the methods used to predict water quality. Statistical models need to be based on information that is meaningful, representative, complete, precise, accurate, and comparable to be deemed valid, up to date, and technically supportable. To assess uncertainty in the analytical tools, the modeling methods, and the underlying data set, all of these components need be documented and communicated in an accessible format within project publications.
Assimilating the Future for Better Forecasts and Earlier Warnings

NASA Astrophysics Data System (ADS)

Du, H.; Wheatcroft, E.; Smith, L. A.

2016-12-01

Multi-model ensembles have become popular tools to account for some of the uncertainty due to model inadequacy in weather and climate simulation-based predictions. The current multi-model forecasts focus on combining single model ensemble forecasts by means of statistical post-processing. Assuming each model is developed independently or with different primary target variables, each is likely to contain different dynamical strengths and weaknesses. Using statistical post-processing, such information is only carried by the simulations under a single model ensemble: no advantage is taken to influence simulations under the other models. A novel methodology, named Multi-model Cross Pollination in Time, is proposed for multi-model ensemble scheme with the aim of integrating the dynamical information regarding the future from each individual model operationally. The proposed approach generates model states in time via applying data assimilation scheme(s) to yield truly "multi-model trajectories". It is demonstrated to outperform traditional statistical post-processing in the 40-dimensional Lorenz96 flow. Data assimilation approaches are originally designed to improve state estimation from the past to the current time. The aim of this talk is to introduce a framework that uses data assimilation to improve model forecasts at future time (not to argue for any one particular data assimilation scheme). Illustration of applying data assimilation "in the future" to provide early warning of future high-impact events is also presented.
miRNA Temporal Analyzer (mirnaTA): a bioinformatics tool for identifying differentially expressed microRNAs in temporal studies using normal quantile transformation.

PubMed

Cer, Regina Z; Herrera-Galeano, J Enrique; Anderson, Joseph J; Bishop-Lilly, Kimberly A; Mokashi, Vishwesh P

2014-01-01

Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one's own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts. We have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods. mirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.
Basic principles of stability.

PubMed

Egan, William; Schofield, Timothy

2009-11-01

An understanding of the principles of degradation, as well as the statistical tools for measuring product stability, is essential to management of product quality. Key to this is management of vaccine potency. Vaccine shelf life is best managed through determination of a minimum potency release requirement, which helps assure adequate potency throughout expiry. Use of statistical tools such a least squares regression analysis should be employed to model potency decay. The use of such tools provides incentive to properly design vaccine stability studies, while holding stability measurements to specification presents a disincentive for collecting valuable data. The laws of kinetics such as Arrhenius behavior help practitioners design effective accelerated stability programs, which can be utilized to manage stability after a process change. Design of stability studies should be carefully considered, with an eye to minimizing the variability of the stability parameter. In the case of measuring the degradation rate, testing at the beginning and the end of the study improves the precision of this estimate. Additional design considerations such as bracketing and matrixing improve the efficiency of stability evaluation of vaccines.
On improving the communication between models and data.

PubMed

Dietze, Michael C; Lebauer, David S; Kooper, Rob

2013-09-01

The potential for model-data synthesis is growing in importance as we enter an era of 'big data', greater connectivity and faster computation. Realizing this potential requires that the research community broaden its perspective about how and why they interact with models. Models can be viewed as scaffolds that allow data at different scales to inform each other through our understanding of underlying processes. Perceptions of relevance, accessibility and informatics are presented as the primary barriers to broader adoption of models by the community, while an inability to fully utilize the breadth of expertise and data from the community is a primary barrier to model improvement. Overall, we promote a community-based paradigm to model-data synthesis and highlight some of the tools and techniques that facilitate this approach. Scientific workflows address critical informatics issues in transparency, repeatability and automation, while intuitive, flexible web-based interfaces make running and visualizing models more accessible. Bayesian statistics provides powerful tools for assimilating a diversity of data types and for the analysis of uncertainty. Uncertainty analyses enable new measurements to target those processes most limiting our predictive ability. Moving forward, tools for information management and data assimilation need to be improved and made more accessible. © 2013 John Wiley & Sons Ltd.
"Dear Fresher …"--How Online Questionnaires Can Improve Learning and Teaching Statistics

ERIC Educational Resources Information Center

Bebermeier, Sarah; Nussbeck, Fridtjof W.; Ontrup, Greta

2015-01-01

Lecturers teaching statistics are faced with several challenges supporting students' learning in appropriate ways. A variety of methods and tools exist to facilitate students' learning on statistics courses. The online questionnaires presented in this report are a new, slightly different computer-based tool: the central aim was to support students…
Cardiac arrest risk standardization using administrative data compared to registry data.

PubMed

Grossestreuer, Anne V; Gaieski, David F; Donnino, Michael W; Nelson, Joshua I M; Mutter, Eric L; Carr, Brendan G; Abella, Benjamin S; Wiebe, Douglas J

2017-01-01

Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Two risk standardization logistic regression models were developed using 2453 patients treated from 2000-2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the "gold standard" with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876-0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895-0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799-0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788-0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA.
Cardiac arrest risk standardization using administrative data compared to registry data

PubMed Central

Gaieski, David F.; Donnino, Michael W.; Nelson, Joshua I. M.; Mutter, Eric L.; Carr, Brendan G.; Abella, Benjamin S.; Wiebe, Douglas J.

2017-01-01

Background Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Methods and results Two risk standardization logistic regression models were developed using 2453 patients treated from 2000–2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the “gold standard” with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876–0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895–0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799–0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788–0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Conclusions Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA. PMID:28783754
An Object-Based Approach to Evaluation of Climate Variability Projections and Predictions

NASA Astrophysics Data System (ADS)

Ammann, C. M.; Brown, B.; Kalb, C. P.; Bullock, R.

2017-12-01

Evaluations of the performance of earth system model predictions and projections are of critical importance to enhance usefulness of these products. Such evaluations need to address specific concerns depending on the system and decisions of interest; hence, evaluation tools must be tailored to inform about specific issues. Traditional approaches that summarize grid-based comparisons of analyses and models, or between current and future climate, often do not reveal important information about the models' performance (e.g., spatial or temporal displacements; the reason behind a poor score) and are unable to accommodate these specific information needs. For example, summary statistics such as the correlation coefficient or the mean-squared error provide minimal information to developers, users, and decision makers regarding what is "right" and "wrong" with a model. New spatial and temporal-spatial object-based tools from the field of weather forecast verification (where comparisons typically focus on much finer temporal and spatial scales) have been adapted to more completely answer some of the important earth system model evaluation questions. In particular, the Method for Object-based Diagnostic Evaluation (MODE) tool and its temporal (three-dimensional) extension (MODE-TD) have been adapted for these evaluations. More specifically, these tools can be used to address spatial and temporal displacements in projections of El Nino-related precipitation and/or temperature anomalies, ITCZ-associated precipitation areas, atmospheric rivers, seasonal sea-ice extent, and other features of interest. Examples of several applications of these tools in a climate context will be presented, using output of the CESM large ensemble. In general, these tools provide diagnostic information about model performance - accounting for spatial, temporal, and intensity differences - that cannot be achieved using traditional (scalar) model comparison approaches. Thus, they can provide more meaningful information that can be used in decision-making and planning. Future extensions and applications of these tools in a climate context will be considered.
Range of interaction in an opinion evolution model of ideological self-positioning: Contagion, hesitance and polarization

NASA Astrophysics Data System (ADS)

Gimenez, M. Cecilia; Paz García, Ana Pamela; Burgos Paci, Maxi A.; Reinaudi, Luis

2016-04-01

The evolution of public opinion using tools and concepts borrowed from Statistical Physics is an emerging area within the field of Sociophysics. In the present paper, a Statistical Physics model was developed to study the evolution of the ideological self-positioning of an ensemble of agents. The model consists of an array of L components, each one of which represents the ideology of an agent. The proposed mechanism is based on the ;voter model;, in which one agent can adopt the opinion of another one if the difference of their opinions lies within a certain range. The existence of ;undecided; agents (i.e. agents with no definite opinion) was implemented in the model. The possibility of radicalization of an agent's opinion upon interaction with another one was also implemented. The results of our simulations are compared to statistical data taken from the Latinobarómetro databank for the cases of Argentina, Chile, Brazil and Uruguay in the last decade. Among other results, the effect of taking into account the undecided agents is the formation of a single peak at the middle of the ideological spectrum (which corresponds to a centrist ideological position), in agreement with the real cases studied.
DECIDE: a software for computer-assisted evaluation of diagnostic test performance.

PubMed

Chiecchio, A; Bo, A; Manzone, P; Giglioli, F

1993-05-01

The evaluation of the performance of clinical tests is a complex problem involving different steps and many statistical tools, not always structured in an organic and rational system. This paper presents a software which provides an organic system of statistical tools helping evaluation of clinical test performance. The program allows (a) the building and the organization of a working database, (b) the selection of the minimal set of tests with the maximum information content, (c) the search of the model best fitting the distribution of the test values, (d) the selection of optimal diagnostic cut-off value of the test for every positive/negative situation, (e) the evaluation of performance of the combinations of correlated and uncorrelated tests. The uncertainty associated with all the variables involved is evaluated. The program works in a MS-DOS environment with EGA or higher performing graphic card.
Recent advances in mathematical criminology. Comment on "Statistical physics of crime: A review" by M.R. D'Orsogna and M. Perc

NASA Astrophysics Data System (ADS)

Rodríguez, Nancy

2015-03-01

The use of mathematical tools has long proved to be useful in gaining understanding of complex systems in physics [1]. Recently, many researchers have realized that there is an analogy between emerging phenomena in complex social systems and complex physical or biological systems [4,5,12]. This realization has particularly benefited the modeling and understanding of crime, a ubiquitous phenomena that is far from being understood. In fact, when one is interested in the bulk behavior of patterns that emerge from small and seemingly unrelated interactions as well as decisions that occur at the individual level, the mathematical tools that have been developed in statistical physics, game theory, network theory, dynamical systems, and partial differential equations can be useful in shedding light into the dynamics of these patterns [2-4,6,12].

Artificial neural networks in evaluation and optimization of modified release solid dosage forms.

PubMed

Ibrić, Svetlana; Djuriš, Jelena; Parojčić, Jelena; Djurić, Zorica

2012-10-18

Implementation of the Quality by Design (QbD) approach in pharmaceutical development has compelled researchers in the pharmaceutical industry to employ Design of Experiments (DoE) as a statistical tool, in product development. Among all DoE techniques, response surface methodology (RSM) is the one most frequently used. Progress of computer science has had an impact on pharmaceutical development as well. Simultaneous with the implementation of statistical methods, machine learning tools took an important place in drug formulation. Twenty years ago, the first papers describing application of artificial neural networks in optimization of modified release products appeared. Since then, a lot of work has been done towards implementation of new techniques, especially Artificial Neural Networks (ANN) in modeling of production, drug release and drug stability of modified release solid dosage forms. The aim of this paper is to review artificial neural networks in evaluation and optimization of modified release solid dosage forms.
Artificial Neural Networks in Evaluation and Optimization of Modified Release Solid Dosage Forms

PubMed Central

Ibrić, Svetlana; Djuriš, Jelena; Parojčić, Jelena; Djurić, Zorica

2012-01-01

Implementation of the Quality by Design (QbD) approach in pharmaceutical development has compelled researchers in the pharmaceutical industry to employ Design of Experiments (DoE) as a statistical tool, in product development. Among all DoE techniques, response surface methodology (RSM) is the one most frequently used. Progress of computer science has had an impact on pharmaceutical development as well. Simultaneous with the implementation of statistical methods, machine learning tools took an important place in drug formulation. Twenty years ago, the first papers describing application of artificial neural networks in optimization of modified release products appeared. Since then, a lot of work has been done towards implementation of new techniques, especially Artificial Neural Networks (ANN) in modeling of production, drug release and drug stability of modified release solid dosage forms. The aim of this paper is to review artificial neural networks in evaluation and optimization of modified release solid dosage forms. PMID:24300369
Forecasting daily source air quality using multivariate statistical analysis and radial basis function networks.

PubMed

Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A

2008-12-01

It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.
Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development.

PubMed

Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R

2016-12-01

Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Modeling the Effects of Land Use and Climate Change on Streamflow in the Delaware River Basin

NASA Astrophysics Data System (ADS)

Kwon, P. Y. S.; Endreny, T. A.; Kroll, C. N.; Williamson, T. N.

2014-12-01

Forest-cover loss and drinking-water reservoirs in the upper Delaware River Basin of New York may alter summer low streamflows, which could degrade the in-stream habitat for the endangered dwarf wedgemussel. Our project analyzes how flow statistics change with land-cover change for 30-year increments of model-simulated streamflow hydrographs for three watersheds of concern to the National Park Service: the East Branch, West Branch, and main stem of the Delaware River. We use four treatments for land cover ranging from historical high to low forest cover. We subject each land cover to adjusted GCM climate scenarios for 1600, 1900, 1940, and 2040 to isolate land cover from potential climate-change effects. Hydrographs are simulated using the Water Availability Tool for Environmental Resources (WATER), a TOPMODEL-based United States Geological Survey hydrologic decision-support tool, which uses the variable-source-area concept and water budgets to generate streamflow. Model parameters for each watershed change with land-use, and capture differences in soil-physical properties that control how rainfall infiltrates, evaporates, transpires, is stored in the soil, and moves to the stream. Our results analyze flow statistics used as indicators of hydrologic alteration, and access streamflow events below the critical flow needed to provide sustainable habitat for dwarf wedgemussels. These metrics will demonstrate how changes in climate and land use might affect flow statistics. Initial results show that the 1940 WATER simulation outputs generally match observed unregulated low flows from that time period, while performance for regulated flow from the same time period and from 1600, 1900, and 2040 require model input adjustments. Our study will illustrate how increased forest cover could potentially restore in-stream habitat for the endangered dwarf wedgemussel for current and future climate conditions.
Assessment of oximetry-based statistical classifiers as simplified screening tools in the management of childhood obstructive sleep apnea.

PubMed

Crespo, Andrea; Álvarez, Daniel; Kheirandish-Gozal, Leila; Gutiérrez-Tobal, Gonzalo C; Cerezo-Hernández, Ana; Gozal, David; Hornero, Roberto; Del Campo, Félix

2018-02-16

A variety of statistical models based on overnight oximetry has been proposed to simplify the detection of children with suspected obstructive sleep apnea syndrome (OSAS). Despite the usefulness reported, additional thorough comparative analyses are required. This study was aimed at assessing common binary classification models from oximetry for the detection of childhood OSAS. Overnight oximetry recordings from 176 children referred for clinical suspicion of OSAS were acquired during in-lab polysomnography. Several training and test datasets were randomly composed by means of bootstrapping for model optimization and independent validation. For every child, blood oxygen saturation (SpO 2 ) was parameterized by means of 17 features. Fast correlation-based filter (FCBF) was applied to search for the optimum features. The discriminatory power of three statistical pattern recognition algorithms was assessed: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and logistic regression (LR). The performance of each automated model was evaluated for the three common diagnostic polysomnographic cutoffs in pediatric OSAS: 1, 3, and 5 events/h. Best screening performances emerged using the 1 event/h cutoff for mild-to-severe childhood OSAS. LR achieved 84.3% accuracy (95% CI 76.8-91.5%) and 0.89 AUC (95% CI 0.83-0.94), while QDA reached 96.5% PPV (95% CI 90.3-100%) and 0.91 AUC (95% CI 0.85-0.96%). Moreover, LR and QDA reached diagnostic accuracies of 82.7% (95% CI 75.0-89.6%) and 82.1% (95% CI 73.8-89.5%) for a cutoff of 5 events/h, respectively. Automated analysis of overnight oximetry may be used to develop reliable as well as accurate screening tools for childhood OSAS.
Introduction, comparison, and validation of Meta‐Essentials: A free and simple tool for meta‐analysis

PubMed Central

van Rhee, Henk; Hak, Tony

2017-01-01

We present a new tool for meta‐analysis, Meta‐Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta‐analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta‐Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta‐analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp‐Hartung adjustment of the DerSimonian‐Laird estimator. However, more advanced meta‐analysis methods such as meta‐analytical structural equation modelling and meta‐regression with multiple covariates are not available. In summary, Meta‐Essentials may prove a valuable resource for meta‐analysts, including researchers, teachers, and students. PMID:28801932
Potential sources of variability in mesocosm experiments on the response of phytoplankton to ocean acidification

NASA Astrophysics Data System (ADS)

Moreno de Castro, Maria; Schartau, Markus; Wirtz, Kai

2017-04-01

Mesocosm experiments on phytoplankton dynamics under high CO2 concentrations mimic the response of marine primary producers to future ocean acidification. However, potential acidification effects can be hindered by the high standard deviation typically found in the replicates of the same CO2 treatment level. In experiments with multiple unresolved factors and a sub-optimal number of replicates, post-processing statistical inference tools might fail to detect an effect that is present. We propose that in such cases, data-based model analyses might be suitable tools to unearth potential responses to the treatment and identify the uncertainties that could produce the observed variability. As test cases, we used data from two independent mesocosm experiments. Both experiments showed high standard deviations and, according to statistical inference tools, biomass appeared insensitive to changing CO2 conditions. Conversely, our simulations showed earlier and more intense phytoplankton blooms in modeled replicates at high CO2 concentrations and suggested that uncertainties in average cell size, phytoplankton biomass losses, and initial nutrient concentration potentially outweigh acidification effects by triggering strong variability during the bloom phase. We also estimated the thresholds below which uncertainties do not escalate to high variability. This information might help in designing future mesocosm experiments and interpreting controversial results on the effect of acidification or other pressures on ecosystem functions.
Current algebra, statistical mechanics and quantum models

NASA Astrophysics Data System (ADS)

Vilela Mendes, R.

2017-11-01

Results obtained in the past for free boson systems at zero and nonzero temperatures are revisited to clarify the physical meaning of current algebra reducible functionals which are associated to systems with density fluctuations, leading to observable effects on phase transitions. To use current algebra as a tool for the formulation of quantum statistical mechanics amounts to the construction of unitary representations of diffeomorphism groups. Two mathematical equivalent procedures exist for this purpose. One searches for quasi-invariant measures on configuration spaces, the other for a cyclic vector in Hilbert space. Here, one argues that the second approach is closer to the physical intuition when modelling complex systems. An example of application of the current algebra methodology to the pairing phenomenon in two-dimensional fermion systems is discussed.
Desensitized Optimal Filtering and Sensor Fusion Toolkit

NASA Technical Reports Server (NTRS)

Karlgaard, Christopher D.

2015-01-01

Analytical Mechanics Associates, Inc., has developed a software toolkit that filters and processes navigational data from multiple sensor sources. A key component of the toolkit is a trajectory optimization technique that reduces the sensitivity of Kalman filters with respect to model parameter uncertainties. The sensor fusion toolkit also integrates recent advances in adaptive Kalman and sigma-point filters for non-Gaussian problems with error statistics. This Phase II effort provides new filtering and sensor fusion techniques in a convenient package that can be used as a stand-alone application for ground support and/or onboard use. Its modular architecture enables ready integration with existing tools. A suite of sensor models and noise distribution as well as Monte Carlo analysis capability are included to enable statistical performance evaluations.
Diagnostic tools for mixing models of stream water chemistry

USGS Publications Warehouse

Hooper, Richard P.

2003-01-01

Mixing models provide a useful null hypothesis against which to evaluate processes controlling stream water chemical data. Because conservative mixing of end‐members with constant concentration is a linear process, a number of simple mathematical and multivariate statistical methods can be applied to this problem. Although mixing models have been most typically used in the context of mixing soil and groundwater end‐members, an extension of the mathematics of mixing models is presented that assesses the “fit” of a multivariate data set to a lower dimensional mixing subspace without the need for explicitly identified end‐members. Diagnostic tools are developed to determine the approximate rank of the data set and to assess lack of fit of the data. This permits identification of processes that violate the assumptions of the mixing model and can suggest the dominant processes controlling stream water chemical variation. These same diagnostic tools can be used to assess the fit of the chemistry of one site into the mixing subspace of a different site, thereby permitting an assessment of the consistency of controlling end‐members across sites. This technique is applied to a number of sites at the Panola Mountain Research Watershed located near Atlanta, Georgia.
Protein and gene model inference based on statistical modeling in k-partite graphs.

PubMed

Gerster, Sarah; Qeli, Ermir; Ahrens, Christian H; Bühlmann, Peter

2010-07-06

One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
Simplified estimation of age-specific reference intervals for skewed data.

PubMed

Wright, E M; Royston, P

1997-12-30

Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.
Statistical and temporal irradiance fluctuations modeling for a ground-to-geostationary satellite optical link.

PubMed

Camboulives, A-R; Velluet, M-T; Poulenard, S; Saint-Antonin, L; Michau, V

2018-02-01

An optical communication link performance between the ground and a geostationary satellite can be impaired by scintillation, beam wandering, and beam spreading due to its propagation through atmospheric turbulence. These effects on the link performance can be mitigated by tracking and error correction codes coupled with interleaving. Precise numerical tools capable of describing the irradiance fluctuations statistically and of creating an irradiance time series are needed to characterize the benefits of these techniques and optimize them. The wave optics propagation methods have proven their capability of modeling the effects of atmospheric turbulence on a beam, but these are known to be computationally intensive. We present an analytical-numerical model which provides good results on the probability density functions of irradiance fluctuations as well as a time series with an important saving of time and computational resources.
HydroClimATe: hydrologic and climatic analysis toolkit

USGS Publications Warehouse

Dickinson, Jesse; Hanson, Randall T.; Predmore, Steven K.

2014-01-01

The potential consequences of climate variability and climate change have been identified as major issues for the sustainability and availability of the worldwide water resources. Unlike global climate change, climate variability represents deviations from the long-term state of the climate over periods of a few years to several decades. Currently, rich hydrologic time-series data are available, but the combination of data preparation and statistical methods developed by the U.S. Geological Survey as part of the Groundwater Resources Program is relatively unavailable to hydrologists and engineers who could benefit from estimates of climate variability and its effects on periodic recharge and water-resource availability. This report documents HydroClimATe, a computer program for assessing the relations between variable climatic and hydrologic time-series data. HydroClimATe was developed for a Windows operating system. The software includes statistical tools for (1) time-series preprocessing, (2) spectral analysis, (3) spatial and temporal analysis, (4) correlation analysis, and (5) projections. The time-series preprocessing tools include spline fitting, standardization using a normal or gamma distribution, and transformation by a cumulative departure. The spectral analysis tools include discrete Fourier transform, maximum entropy method, and singular spectrum analysis. The spatial and temporal analysis tool is empirical orthogonal function analysis. The correlation analysis tools are linear regression and lag correlation. The projection tools include autoregressive time-series modeling and generation of many realizations. These tools are demonstrated in four examples that use stream-flow discharge data, groundwater-level records, gridded time series of precipitation data, and the Multivariate ENSO Index.
The Population Tracking Model: A Simple, Scalable Statistical Model for Neural Population Data

PubMed Central

O'Donnell, Cian; alves, J. Tiago Gonç; Whiteley, Nick; Portera-Cailliau, Carlos; Sejnowski, Terrence J.

2017-01-01

Our understanding of neural population coding has been limited by a lack of analysis methods to characterize spiking data from large populations. The biggest challenge comes from the fact that the number of possible network activity patterns scales exponentially with the number of neurons recorded (∼2Neurons). Here we introduce a new statistical method for characterizing neural population activity that requires semi-independent fitting of only as many parameters as the square of the number of neurons, requiring drastically smaller data sets and minimal computation time. The model works by matching the population rate (the number of neurons synchronously active) and the probability that each individual neuron fires given the population rate. We found that this model can accurately fit synthetic data from up to 1000 neurons. We also found that the model could rapidly decode visual stimuli from neural population data from macaque primary visual cortex about 65 ms after stimulus onset. Finally, we used the model to estimate the entropy of neural population activity in developing mouse somatosensory cortex and, surprisingly, found that it first increases, and then decreases during development. This statistical model opens new options for interrogating neural population data and can bolster the use of modern large-scale in vivo Ca2+ and voltage imaging tools. PMID:27870612
Parallel computing for automated model calibration

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, John S.; Danielson, Gary R.; Schulz, Douglas A.

2002-07-29

Natural resources model calibration is a significant burden on computing and staff resources in modeling efforts. Most assessments must consider multiple calibration objectives (for example magnitude and timing of stream flow peak). An automated calibration process that allows real time updating of data/models, allowing scientists to focus effort on improving models is needed. We are in the process of building a fully featured multi objective calibration tool capable of processing multiple models cheaply and efficiently using null cycle computing. Our parallel processing and calibration software routines have been generically, but our focus has been on natural resources model calibration. Somore » far, the natural resources models have been friendly to parallel calibration efforts in that they require no inter-process communication, only need a small amount of input data and only output a small amount of statistical information for each calibration run. A typical auto calibration run might involve running a model 10,000 times with a variety of input parameters and summary statistical output. In the past model calibration has been done against individual models for each data set. The individual model runs are relatively fast, ranging from seconds to minutes. The process was run on a single computer using a simple iterative process. We have completed two Auto Calibration prototypes and are currently designing a more feature rich tool. Our prototypes have focused on running the calibration in a distributed computing cross platform environment. They allow incorporation of?smart? calibration parameter generation (using artificial intelligence processing techniques). Null cycle computing similar to SETI@Home has also been a focus of our efforts. This paper details the design of the latest prototype and discusses our plans for the next revision of the software.« less
The Lake Tahoe Basin Land Use Simulation Model

USGS Publications Warehouse

Forney, William M.; Oldham, I. Benson

2011-01-01

This U.S. Geological Survey Open-File Report describes the final modeling product for the Tahoe Decision Support System project for the Lake Tahoe Basin funded by the Southern Nevada Public Land Management Act and the U.S. Geological Survey's Geographic Analysis and Monitoring Program. This research was conducted by the U.S. Geological Survey Western Geographic Science Center. The purpose of this report is to describe the basic elements of the novel Lake Tahoe Basin Land Use Simulation Model, publish samples of the data inputs, basic outputs of the model, and the details of the Python code. The results of this report include a basic description of the Land Use Simulation Model, descriptions and summary statistics of model inputs, two figures showing the graphical user interface from the web-based tool, samples of the two input files, seven tables of basic output results from the web-based tool and descriptions of their parameters, and the fully functional Python code.
Application of GIS Rapid Mapping Technology in Disaster Monitoring

NASA Astrophysics Data System (ADS)

Wang, Z.; Tu, J.; Liu, G.; Zhao, Q.

2018-04-01

With the rapid development of GIS and RS technology, especially in recent years, GIS technology and its software functions have been increasingly mature and enhanced. And with the rapid development of mathematical statistical tools for spatial modeling and simulation, has promoted the widespread application and popularization of quantization in the field of geology. Based on the investigation of field disaster and the construction of spatial database, this paper uses remote sensing image, DEM and GIS technology to obtain the data information of disaster vulnerability analysis, and makes use of the information model to carry out disaster risk assessment mapping.Using ArcGIS software and its spatial data modeling method, the basic data information of the disaster risk mapping process was acquired and processed, and the spatial data simulation tool was used to map the disaster rapidly.
Joint Meteorological Statistics of Observing Sites for the Event Horizon Telescope

NASA Astrophysics Data System (ADS)

Lope Córdova Rosado, Rodrigo Eduardo; Doeleman, Sheperd; Paine, Scott; Johnson, Michael; Event Horizon Telescope (EHT)

2018-01-01

The Event Horizon Telescope (EHT) aims to resolve the general relativistic shadow of Sgr A*, the supermassive black hole at the center of our galaxy, via Very Long Baseline Interferometry (VLBI) measurements with a multinational array of radio observatories. In order to optimize the scheduling of future observations, we have developed tools to model the atmospheric opacity at each EHT site using the past 10 years of Global Forecast System (GFS) data describing the atmospheric state. These tools allow us to determine the ideal observing windows for EHT observations and to assess the suitability and impact of new EHT sites. We describe our modeling framework, compare our models to in-situ measurements at EHT sites, and discuss the implications of weather limitations for planned extensions of the EHT to higher frequencies, as well as additional sites and observation windows.

Ranking of Business Process Simulation Software Tools with DEX/QQ Hierarchical Decision Model

PubMed Central

2016-01-01

The omnipresent need for optimisation requires constant improvements of companies’ business processes (BPs). Minimising the risk of inappropriate BP being implemented is usually performed by simulating the newly developed BP under various initial conditions and “what-if” scenarios. An effectual business process simulations software (BPSS) is a prerequisite for accurate analysis of an BP. Characterisation of an BPSS tool is a challenging task due to the complex selection criteria that includes quality of visual aspects, simulation capabilities, statistical facilities, quality reporting etc. Under such circumstances, making an optimal decision is challenging. Therefore, various decision support models are employed aiding the BPSS tool selection. The currently established decision support models are either proprietary or comprise only a limited subset of criteria, which affects their accuracy. Addressing this issue, this paper proposes a new hierarchical decision support model for ranking of BPSS based on their technical characteristics by employing DEX and qualitative to quantitative (QQ) methodology. Consequently, the decision expert feeds the required information in a systematic and user friendly manner. There are three significant contributions of the proposed approach. Firstly, the proposed hierarchical model is easily extendible for adding new criteria in the hierarchical structure. Secondly, a fully operational decision support system (DSS) tool that implements the proposed hierarchical model is presented. Finally, the effectiveness of the proposed hierarchical model is assessed by comparing the resulting rankings of BPSS with respect to currently available results. PMID:26871694
Quantification of Operational Risk Using A Data Mining

NASA Technical Reports Server (NTRS)

Perera, J. Sebastian

1999-01-01

What is Data Mining? - Data Mining is the process of finding actionable information hidden in raw data. - Data Mining helps find hidden patterns, trends, and important relationships often buried in a sea of data - Typically, automated software tools based on advanced statistical analysis and data modeling technology can be utilized to automate the data mining process
Data mining: sophisticated forms of managed care modeling through artificial intelligence.

PubMed

Borok, L S

1997-01-01

Data mining is a recent development in computer science that combines artificial intelligence algorithms and relational databases to discover patterns automatically, without the use of traditional statistical methods. Work with data mining tools in health care is in a developmental stage that holds great promise, given the combination of demographic and diagnostic information.
Required, Practical, or Unnecessary? An Examination and Demonstration of Propensity Score Matching Using Longitudinal Secondary Data

ERIC Educational Resources Information Center

Padgett, Ryan D.; Salisbury, Mark H.; An, Brian P.; Pascarella, Ernest T.

2010-01-01

The sophisticated analytical techniques available to institutional researchers give them an array of procedures to estimate a causal effect using observational data. But as many quantitative researchers have discovered, access to a wider selection of statistical tools does not necessarily ensure construction of a better analytical model. Moreover,…
Survivability Versus Time

NASA Technical Reports Server (NTRS)

Joyner, James J., Sr.

2014-01-01

Develop Survivability vs Time Model as a decision-evaluation tool to assess various emergency egress methods used at Launch Complex 39B (LC 39B) and in the Vehicle Assembly Building (VAB) on NASAs Kennedy Space Center. For each hazard scenario, develop probability distributions to address statistical uncertainty resulting in survivability plots over time and composite survivability plots encompassing multiple hazard scenarios.
Effect Size Measures for Mediation Models: Quantitative Strategies for Communicating Indirect Effects

ERIC Educational Resources Information Center

Preacher, Kristopher J.; Kelley, Ken

2011-01-01

The statistical analysis of mediation effects has become an indispensable tool for helping scientists investigate processes thought to be causal. Yet, in spite of many recent advances in the estimation and testing of mediation effects, little attention has been given to methods for communicating effect size and the practical importance of those…
How to Use Value-Added Analysis to Improve Student Learning: A Field Guide for School and District Leaders

ERIC Educational Resources Information Center

Kennedy, Kate; Peters, Mary; Thomas, Mike

2012-01-01

Value-added analysis is the most robust, statistically significant method available for helping educators quantify student progress over time. This powerful tool also reveals tangible strategies for improving instruction. Built around the work of Battelle for Kids, this book provides a field-tested continuous improvement model for using…
A Content Analysis of Dissertations in the Field of Educational Technology: The Case of Turkey

ERIC Educational Resources Information Center

Durak, Gurhan; Cankaya, Serkan; Yunkul, Eyup; Misirli, Zeynel Abidin

2018-01-01

The present study aimed at conducting content analysis on dissertations carried out so far in the field of Educational Technology in Turkey. A total of 137 dissertations were examined to determine the key words, academic discipline, research areas, theoretical frameworks, research designs and models, statistical analyses, data collection tools,…
The Precision-Power-Gradient Theory for Teaching Basic Research Statistical Tools to Graduate Students.

ERIC Educational Resources Information Center

Cassel, Russell N.

This paper relates educational and psychological statistics to certain "Research Statistical Tools" (RSTs) necessary to accomplish and understand general research in the behavioral sciences. Emphasis is placed on acquiring an effective understanding of the RSTs and to this end they are are ordered to a continuum scale in terms of individual…
Syndromic surveillance of influenza activity in Sweden: an evaluation of three tools.

PubMed

Ma, T; Englund, H; Bjelkmar, P; Wallensten, A; Hulth, A

2015-08-01

An evaluation was conducted to determine which syndromic surveillance tools complement traditional surveillance by serving as earlier indicators of influenza activity in Sweden. Web queries, medical hotline statistics, and school absenteeism data were evaluated against two traditional surveillance tools. Cross-correlation calculations utilized aggregated weekly data for all-age, nationwide activity for four influenza seasons, from 2009/2010 to 2012/2013. The surveillance tool indicative of earlier influenza activity, by way of statistical and visual evidence, was identified. The web query algorithm and medical hotline statistics performed equally well as each other and to the traditional surveillance tools. School absenteeism data were not reliable resources for influenza surveillance. Overall, the syndromic surveillance tools did not perform with enough consistency in season lead nor in earlier timing of the peak week to be considered as early indicators. They do, however, capture incident cases before they have formally entered the primary healthcare system.
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.

Code of Federal Regulations, 2011 CFR

2011-10-01

... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.

Code of Federal Regulations, 2012 CFR

2012-10-01

... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.

Code of Federal Regulations, 2013 CFR

2013-10-01

... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
48 CFR 1852.223-76 - Federal Automotive Statistical Tool Reporting.

Code of Federal Regulations, 2014 CFR

2014-10-01

... data describing vehicle usage required by the Federal Automotive Statistical Tool (FAST) by October 15 of each year. FAST is accessed through http://fastweb.inel.gov/. (End of clause) [68 FR 43334, July...
Statistical inference for noisy nonlinear ecological dynamic systems.

PubMed

Wood, Simon N

2010-08-26

Chaotic ecological dynamic systems defy conventional statistical analysis. Systems with near-chaotic dynamics are little better. Such systems are almost invariably driven by endogenous dynamic processes plus demographic and environmental process noise, and are only observable with error. Their sensitivity to history means that minute changes in the driving noise realization, or the system parameters, will cause drastic changes in the system trajectory. This sensitivity is inherited and amplified by the joint probability density of the observable data and the process noise, rendering it useless as the basis for obtaining measures of statistical fit. Because the joint density is the basis for the fit measures used by all conventional statistical methods, this is a major theoretical shortcoming. The inability to make well-founded statistical inferences about biological dynamic models in the chaotic and near-chaotic regimes, other than on an ad hoc basis, leaves dynamic theory without the methods of quantitative validation that are essential tools in the rest of biological science. Here I show that this impasse can be resolved in a simple and general manner, using a method that requires only the ability to simulate the observed data on a system from the dynamic model about which inferences are required. The raw data series are reduced to phase-insensitive summary statistics, quantifying local dynamic structure and the distribution of observations. Simulation is used to obtain the mean and the covariance matrix of the statistics, given model parameters, allowing the construction of a 'synthetic likelihood' that assesses model fit. This likelihood can be explored using a straightforward Markov chain Monte Carlo sampler, but one further post-processing step returns pure likelihood-based inference. I apply the method to establish the dynamic nature of the fluctuations in Nicholson's classic blowfly experiments.
Statistical metrology—measurement and modeling of variation for advanced process development and design rule generation

NASA Astrophysics Data System (ADS)

Boning, Duane S.; Chung, James E.

1998-11-01

Advanced process technology will require more detailed understanding and tighter control of variation in devices and interconnects. The purpose of statistical metrology is to provide methods to measure and characterize variation, to model systematic and random components of that variation, and to understand the impact of variation on both yield and performance of advanced circuits. Of particular concern are spatial or pattern-dependencies within individual chips; such systematic variation within the chip can have a much larger impact on performance than wafer-level random variation. Statistical metrology methods will play an important role in the creation of design rules for advanced technologies. For example, a key issue in multilayer interconnect is the uniformity of interlevel dielectric (ILD) thickness within the chip. For the case of ILD thickness, we describe phases of statistical metrology development and application to understanding and modeling thickness variation arising from chemical-mechanical polishing (CMP). These phases include screening experiments including design of test structures and test masks to gather electrical or optical data, techniques for statistical decomposition and analysis of the data, and approaches to calibrating empirical and physical variation models. These models can be integrated with circuit CAD tools to evaluate different process integration or design rule strategies. One focus for the generation of interconnect design rules are guidelines for the use of "dummy fill" or "metal fill" to improve the uniformity of underlying metal density and thus improve the uniformity of oxide thickness within the die. Trade-offs that can be evaluated via statistical metrology include the improvements to uniformity possible versus the effect of increased capacitance due to additional metal.
3Drefine: an interactive web server for efficient protein structure refinement

PubMed Central

Bhattacharya, Debswapna; Nowotny, Jackson; Cao, Renzhi; Cheng, Jianlin

2016-01-01

3Drefine is an interactive web server for consistent and computationally efficient protein structure refinement with the capability to perform web-based statistical and visual analysis. The 3Drefine refinement protocol utilizes iterative optimization of hydrogen bonding network combined with atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields for efficient protein structure refinement. The method has been extensively evaluated on blind CASP experiments as well as on large-scale and diverse benchmark datasets and exhibits consistent improvement over the initial structure in both global and local structural quality measures. The 3Drefine web server allows for convenient protein structure refinement through a text or file input submission, email notification, provided example submission and is freely available without any registration requirement. The server also provides comprehensive analysis of submissions through various energy and statistical feedback and interactive visualization of multiple refined models through the JSmol applet that is equipped with numerous protein model analysis tools. The web server has been extensively tested and used by many users. As a result, the 3Drefine web server conveniently provides a useful tool easily accessible to the community. The 3Drefine web server has been made publicly available at the URL: http://sysbio.rnet.missouri.edu/3Drefine/. PMID:27131371
Illustrating the practice of statistics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamada, Christina A; Hamada, Michael S

2009-01-01

The practice of statistics involves analyzing data and planning data collection schemes to answer scientific questions. Issues often arise with the data that must be dealt with and can lead to new procedures. In analyzing data, these issues can sometimes be addressed through the statistical models that are developed. Simulation can also be helpful in evaluating a new procedure. Moreover, simulation coupled with optimization can be used to plan a data collection scheme. The practice of statistics as just described is much more than just using a statistical package. In analyzing the data, it involves understanding the scientific problem andmore » incorporating the scientist's knowledge. In modeling the data, it involves understanding how the data were collected and accounting for limitations of the data where possible. Moreover, the modeling is likely to be iterative by considering a series of models and evaluating the fit of these models. Designing a data collection scheme involves understanding the scientist's goal and staying within hislher budget in terms of time and the available resources. Consequently, a practicing statistician is faced with such tasks and requires skills and tools to do them quickly. We have written this article for students to provide a glimpse of the practice of statistics. To illustrate the practice of statistics, we consider a problem motivated by some precipitation data that our relative, Masaru Hamada, collected some years ago. We describe his rain gauge observational study in Section 2. We describe modeling and an initial analysis of the precipitation data in Section 3. In Section 4, we consider alternative analyses that address potential issues with the precipitation data. In Section 5, we consider the impact of incorporating additional infonnation. We design a data collection scheme to illustrate the use of simulation and optimization in Section 6. We conclude this article in Section 7 with a discussion.« less
A study on the application of topic models to motif finding algorithms.

PubMed

Basha Gutierrez, Josep; Nakai, Kenta

2016-12-22

Topic models are statistical algorithms which try to discover the structure of a set of documents according to the abstract topics contained in them. Here we try to apply this approach to the discovery of the structure of the transcription factor binding sites (TFBS) contained in a set of biological sequences, which is a fundamental problem in molecular biology research for the understanding of transcriptional regulation. Here we present two methods that make use of topic models for motif finding. First, we developed an algorithm in which first a set of biological sequences are treated as text documents, and the k-mers contained in them as words, to then build a correlated topic model (CTM) and iteratively reduce its perplexity. We also used the perplexity measurement of CTMs to improve our previous algorithm based on a genetic algorithm and several statistical coefficients. The algorithms were tested with 56 data sets from four different species and compared to 14 other methods by the use of several coefficients both at nucleotide and site level. The results of our first approach showed a performance comparable to the other methods studied, especially at site level and in sensitivity scores, in which it scored better than any of the 14 existing tools. In the case of our previous algorithm, the new approach with the addition of the perplexity measurement clearly outperformed all of the other methods in sensitivity, both at nucleotide and site level, and in overall performance at site level. The statistics obtained show that the performance of a motif finding method based on the use of a CTM is satisfying enough to conclude that the application of topic models is a valid method for developing motif finding algorithms. Moreover, the addition of topic models to a previously developed method dramatically increased its performance, suggesting that this combined algorithm can be a useful tool to successfully predict motifs in different kinds of sets of DNA sequences.
Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

NASA Astrophysics Data System (ADS)

Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

A Framework for Assessing High School Students' Statistical Reasoning.

PubMed

Chan, Shiau Wei; Ismail, Zaleha; Sumintono, Bambang

2016-01-01

Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students' statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students' statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework's cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments.
A Framework for Assessing High School Students' Statistical Reasoning

PubMed Central

2016-01-01

Based on a synthesis of literature, earlier studies, analyses and observations on high school students, this study developed an initial framework for assessing students’ statistical reasoning about descriptive statistics. Framework descriptors were established across five levels of statistical reasoning and four key constructs. The former consisted of idiosyncratic reasoning, verbal reasoning, transitional reasoning, procedural reasoning, and integrated process reasoning. The latter include describing data, organizing and reducing data, representing data, and analyzing and interpreting data. In contrast to earlier studies, this initial framework formulated a complete and coherent statistical reasoning framework. A statistical reasoning assessment tool was then constructed from this initial framework. The tool was administered to 10 tenth-grade students in a task-based interview. The initial framework was refined, and the statistical reasoning assessment tool was revised. The ten students then participated in the second task-based interview, and the data obtained were used to validate the framework. The findings showed that the students’ statistical reasoning levels were consistent across the four constructs, and this result confirmed the framework’s cohesion. Developed to contribute to statistics education, this newly developed statistical reasoning framework provides a guide for planning learning goals and designing instruction and assessments. PMID:27812091
Regional Earthquake Likelihood Models: A realm on shaky grounds?

NASA Astrophysics Data System (ADS)

Kossobokov, V.

2005-12-01

Seismology is juvenile and its appropriate statistical tools to-date may have a "medievil flavor" for those who hurry up to apply a fuzzy language of a highly developed probability theory. To become "quantitatively probabilistic" earthquake forecasts/predictions must be defined with a scientific accuracy. Following the most popular objectivists' viewpoint on probability, we cannot claim "probabilities" adequate without a long series of "yes/no" forecast/prediction outcomes. Without "antiquated binary language" of "yes/no" certainty we cannot judge an outcome ("success/failure"), and, therefore, quantify objectively a forecast/prediction method performance. Likelihood scoring is one of the delicate tools of Statistics, which could be worthless or even misleading when inappropriate probability models are used. This is a basic loophole for a misuse of likelihood as well as other statistical methods on practice. The flaw could be avoided by an accurate verification of generic probability models on the empirical data. It is not an easy task in the frames of the Regional Earthquake Likelihood Models (RELM) methodology, which neither defines the forecast precision nor allows a means to judge the ultimate success or failure in specific cases. Hopefully, the RELM group realizes the problem and its members do their best to close the hole with an adequate, data supported choice. Regretfully, this is not the case with the erroneous choice of Gerstenberger et al., who started the public web site with forecasts of expected ground shaking for `tomorrow' (Nature 435, 19 May 2005). Gerstenberger et al. have inverted the critical evidence of their study, i.e., the 15 years of recent seismic record accumulated just in one figure, which suggests rejecting with confidence above 97% "the generic California clustering model" used in automatic calculations. As a result, since the date of publication in Nature the United States Geological Survey website delivers to the public, emergency planners and the media, a forecast product, which is based on wrong assumptions that violate the best-documented earthquake statistics in California, which accuracy was not investigated, and which forecasts were not tested in a rigorous way.
Statistics, Adjusted Statistics, and Maladjusted Statistics.

PubMed

Kaufman, Jay S

2017-05-01

Statistical adjustment is a ubiquitous practice in all quantitative fields that is meant to correct for improprieties or limitations in observed data, to remove the influence of nuisance variables or to turn observed correlations into causal inferences. These adjustments proceed by reporting not what was observed in the real world, but instead modeling what would have been observed in an imaginary world in which specific nuisances and improprieties are absent. These techniques are powerful and useful inferential tools, but their application can be hazardous or deleterious if consumers of the adjusted results mistake the imaginary world of models for the real world of data. Adjustments require decisions about which factors are of primary interest and which are imagined away, and yet many adjusted results are presented without any explanation or justification for these decisions. Adjustments can be harmful if poorly motivated, and are frequently misinterpreted in the media's reporting of scientific studies. Adjustment procedures have become so routinized that many scientists and readers lose the habit of relating the reported findings back to the real world in which we live.
Empirical Reference Distributions for Networks of Different Size

PubMed Central

Smith, Anna; Calder, Catherine A.; Browning, Christopher R.

2016-01-01

Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity

PubMed Central

Agami, Sarit

2017-01-01

Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
Development of a New Data Tool for Computing Launch and Landing Availability with Respect to Surface Weather

NASA Technical Reports Server (NTRS)

Burns, K. Lee; Altino, Karen

2008-01-01

The Marshall Space Flight Center Natural Environments Branch has a long history of expertise in the modeling and computation of statistical launch availabilities with respect to weather conditions. Their existing data analysis product, the Atmospheric Parametric Risk Assessment (APRA) tool, computes launch availability given an input set of vehicle hardware and/or operational weather constraints by calculating the climatological probability of exceeding the specified constraint limits, APRA has been used extensively to provide the Space Shuttle program the ability to estimate impacts that various proposed design modifications would have to overall launch availability. The model accounts for both seasonal and diurnal variability at a single geographic location and provides output probabilities for a single arbitrary launch attempt. Recently, the Shuttle program has shown interest in having additional capabilities added to the APRA model, including analysis of humidity parameters, inclusion of landing site weather to produce landing availability, and concurrent analysis of multiple sites, to assist in operational landing site selection. In addition, the Constellation program has also expressed interest in the APRA tool, and has requested several additional capabilities to address some Constellation-specific issues, both in the specification and verification of design requirements and in the development of operations concepts. The combined scope of the requested capability enhancements suggests an evolution of the model beyond a simple revision process. Development has begun for a new data analysis tool that will satisfy the requests of both programs. This new tool, Probabilities of Atmospheric Conditions and Environmental Risk (PACER), will provide greater flexibility and significantly enhanced functionality compared to the currently existing tool.
The Global Modeling and Assimilation Office (GMAO) 4d-Var and its Adjoint-based Tools

NASA Technical Reports Server (NTRS)

Todling, Ricardo; Tremolet, Yannick

2008-01-01

The fifth generation of the Goddard Earth Observing System (GEOS-5) Data Assimilation System (DAS) is a 3d-var system that uses the Grid-point Statistical Interpolation (GSI) system developed in collaboration with NCEP, and a general circulation model developed at Goddard, that includes the finite-volume hydrodynamics of GEOS-4 wrapped in the Earth System Modeling Framework and physical packages tuned to provide a reliable hydrological cycle for the integration of the Modern Era Retrospective-analysis for Research and Applications (MERRA). This MERRA system is essentially complete and the next generation GEOS is under intense development. A prototype next generation system is now complete and has been producing preliminary results. This prototype system replaces the GSI-based Incremental Analysis Update procedure with a GSI-based 4d-var which uses the adjoint of the finite-volume hydrodynamics of GEOS-4 together with a vertical diffusing scheme for simplified physics. As part of this development we have kept the GEOS-5 IAU procedure as an option and have added the capability to experiment with a First Guess at the Appropriate Time (FGAT) procedure, thus allowing for at least three modes of running the data assimilation experiments. The prototype system is a large extension of GEOS-5 as it also includes various adjoint-based tools, namely, a forecast sensitivity tool, a singular vector tool, and an observation impact tool, that combines the model sensitivity tool with a GSI-based adjoint tool. These features bring the global data assimilation effort at Goddard up to date with technologies used in data assimilation systems at major meteorological centers elsewhere. Various aspects of the next generation GEOS will be discussed during the presentation at the Workshop, and preliminary results will illustrate the discussion.
Statistical models for causation: what inferential leverage do they provide?

PubMed

Freedman, David A

2006-12-01

Experiments offer more reliable evidence on causation than observational studies, which is not to gainsay the contribution to knowledge from observation. Experiments should be analyzed as experiments, not as observational studies. A simple comparison of rates might be just the right tool, with little value added by "sophisticated" models. This article discusses current models for causation, as applied to experimental and observational data. The intention-to-treat principle and the effect of treatment on the treated will also be discussed. Flaws in per-protocol and treatment-received estimates will be demonstrated.
NIRS-SPM: statistical parametric mapping for near infrared spectroscopy

NASA Astrophysics Data System (ADS)

Tak, Sungho; Jang, Kwang Eun; Jung, Jinwook; Jang, Jaeduck; Jeong, Yong; Ye, Jong Chul

2008-02-01

Even though there exists a powerful statistical parametric mapping (SPM) tool for fMRI, similar public domain tools are not available for near infrared spectroscopy (NIRS). In this paper, we describe a new public domain statistical toolbox called NIRS-SPM for quantitative analysis of NIRS signals. Specifically, NIRS-SPM statistically analyzes the NIRS data using GLM and makes inference as the excursion probability which comes from the random field that are interpolated from the sparse measurement. In order to obtain correct inference, NIRS-SPM offers the pre-coloring and pre-whitening method for temporal correlation estimation. For simultaneous recording NIRS signal with fMRI, the spatial mapping between fMRI image and real coordinate in 3-D digitizer is estimated using Horn's algorithm. These powerful tools allows us the super-resolution localization of the brain activation which is not possible using the conventional NIRS analysis tools.
Role of Statistical Random-Effects Linear Models in Personalized Medicine

PubMed Central

Diaz, Francisco J; Yeh, Hung-Wen; de Leon, Jose

2012-01-01

Some empirical studies and recent developments in pharmacokinetic theory suggest that statistical random-effects linear models are valuable tools that allow describing simultaneously patient populations as a whole and patients as individuals. This remarkable characteristic indicates that these models may be useful in the development of personalized medicine, which aims at finding treatment regimes that are appropriate for particular patients, not just appropriate for the average patient. In fact, published developments show that random-effects linear models may provide a solid theoretical framework for drug dosage individualization in chronic diseases. In particular, individualized dosages computed with these models by means of an empirical Bayesian approach may produce better results than dosages computed with some methods routinely used in therapeutic drug monitoring. This is further supported by published empirical and theoretical findings that show that random effects linear models may provide accurate representations of phase III and IV steady-state pharmacokinetic data, and may be useful for dosage computations. These models have applications in the design of clinical algorithms for drug dosage individualization in chronic diseases; in the computation of dose correction factors; computation of the minimum number of blood samples from a patient that are necessary for calculating an optimal individualized drug dosage in therapeutic drug monitoring; measure of the clinical importance of clinical, demographic, environmental or genetic covariates; study of drug-drug interactions in clinical settings; the implementation of computational tools for web-site-based evidence farming; design of pharmacogenomic studies; and in the development of a pharmacological theory of dosage individualization. PMID:23467392
Origin of Pareto-like spatial distributions in ecosystems.

PubMed

Manor, Alon; Shnerb, Nadav M

2008-12-31

Recent studies of cluster distribution in various ecosystems revealed Pareto statistics for the size of spatial colonies. These results were supported by cellular automata simulations that yield robust criticality for endogenous pattern formation based on positive feedback. We show that this patch statistics is a manifestation of the law of proportionate effect. Mapping the stochastic model to a Markov birth-death process, the transition rates are shown to scale linearly with cluster size. This mapping provides a connection between patch statistics and the dynamics of the ecosystem; the "first passage time" for different colonies emerges as a powerful tool that discriminates between endogenous and exogenous clustering mechanisms. Imminent catastrophic shifts (such as desertification) manifest themselves in a drastic change of the stability properties of spatial colonies.
High-Throughput Assay Optimization and Statistical Interpolation of Rubella-Specific Neutralizing Antibody Titers

PubMed Central

Lambert, Nathaniel D.; Pankratz, V. Shane; Larrabee, Beth R.; Ogee-Nwankwo, Adaeze; Chen, Min-hsin; Icenogle, Joseph P.

2014-01-01

Rubella remains a social and economic burden due to the high incidence of congenital rubella syndrome (CRS) in some countries. For this reason, an accurate and efficient high-throughput measure of antibody response to vaccination is an important tool. In order to measure rubella-specific neutralizing antibodies in a large cohort of vaccinated individuals, a high-throughput immunocolorimetric system was developed. Statistical interpolation models were applied to the resulting titers to refine quantitative estimates of neutralizing antibody titers relative to the assayed neutralizing antibody dilutions. This assay, including the statistical methods developed, can be used to assess the neutralizing humoral immune response to rubella virus and may be adaptable for assessing the response to other viral vaccines and infectious agents. PMID:24391140
Statistical Tolerance and Clearance Analysis for Assembly

NASA Technical Reports Server (NTRS)

Lee, S.; Yi, C.

1996-01-01

Tolerance is inevitable because manufacturing exactly equal parts is known to be impossible. Furthermore, the specification of tolerances is an integral part of product design since tolerances directly affect the assemblability, functionality, manufacturability, and cost effectiveness of a product. In this paper, we present statistical tolerance and clearance analysis for the assembly. Our proposed work is expected to make the following contributions: (i) to help the designers to evaluate products for assemblability, (ii) to provide a new perspective to tolerance problems, and (iii) to provide a tolerance analysis tool which can be incorporated into a CAD or solid modeling system.
Characterizing and Addressing the Need for Statistical Adjustment of Global Climate Model Data

NASA Astrophysics Data System (ADS)

White, K. D.; Baker, B.; Mueller, C.; Villarini, G.; Foley, P.; Friedman, D.

2017-12-01

As part of its mission to research and measure the effects of the changing climate, the U. S. Army Corps of Engineers (USACE) regularly uses the World Climate Research Programme's Coupled Model Intercomparison Project Phase 5 (CMIP5) multi-model dataset. However, these data are generated at a global level and are not fine-tuned for specific watersheds. This often causes CMIP5 output to vary from locally observed patterns in the climate. Several downscaling methods have been developed to increase the resolution of the CMIP5 data and decrease systemic differences to support decision-makers as they evaluate results at the watershed scale. Evaluating preliminary comparisons of observed and projected flow frequency curves over the US revealed a simple framework for water resources decision makers to plan and design water resources management measures under changing conditions using standard tools. Using this framework as a basis, USACE has begun to explore to use of statistical adjustment to alter global climate model data to better match the locally observed patterns while preserving the general structure and behavior of the model data. When paired with careful measurement and hypothesis testing, statistical adjustment can be particularly effective at navigating the compromise between the locally observed patterns and the global climate model structures for decision makers.
The Relationship between Particulate Pollution Levels in Australian Cities, Meteorology, and Landscape Fire Activity Detected from MODIS Hotspots

PubMed Central

Price, Owen F.; Williamson, Grant J.; Henderson, Sarah B.; Johnston, Fay; Bowman, David M. J. S.

2012-01-01

Smoke from bushfires is an emerging issue for fire managers because of increasing evidence for its public health effects. Development of forecasting models to predict future pollution levels based on the relationship between bushfire activity and current pollution levels would be a useful management tool. As a first step, we use daily thermal anomalies detected by the MODIS Active Fire Product (referred to as “hotspots”), pollution concentrations, and meteorological data for the years 2002 to 2008, to examine the statistical relationship between fire activity in the landscapes and pollution levels around Perth and Sydney, two large Australian cities. Resultant models were statistically significant, but differed in their goodness of fit and the distance at which the strength of the relationship was strongest. For Sydney, a univariate model for hotspot activity within 100 km explained 24% of variation in pollution levels, and the best model including atmospheric variables explained 56% of variation. For Perth, the best radius was 400 km, explaining only 7% of variation, while the model including atmospheric variables explained 31% of the variation. Pollution was higher when the atmosphere was more stable and in the presence of on-shore winds, whereas there was no effect of wind blowing from the fires toward the pollution monitors. Our analysis shows there is a good prospect for developing region-specific forecasting tools combining hotspot fire activity with meteorological data. PMID:23071788
Utility of existing diabetes risk prediction tools for young black and white adults: Evidence from the Bogalusa Heart Study.

PubMed

Pollock, Benjamin D; Hu, Tian; Chen, Wei; Harville, Emily W; Li, Shengxu; Webber, Larry S; Fonseca, Vivian; Bazzano, Lydia A

2017-01-01

To evaluate several adult diabetes risk calculation tools for predicting the development of incident diabetes and pre-diabetes in a bi-racial, young adult population. Surveys beginning in young adulthood (baseline age ≥18) and continuing across multiple decades for 2122 participants of the Bogalusa Heart Study were used to test the associations of five well-known adult diabetes risk scores with incident diabetes and pre-diabetes using separate Cox models for each risk score. Racial differences were tested within each model. Predictive utility and discrimination were determined for each risk score using the Net Reclassification Index (NRI) and Harrell's c-statistic. All risk scores were strongly associated (p<.0001) with incident diabetes and pre-diabetes. The Wilson model indicated greater risk of diabetes for blacks versus whites with equivalent risk scores (HR=1.59; 95% CI 1.11-2.28; p=.01). C-statistics for the diabetes risk models ranged from 0.79 to 0.83. Non-event NRIs indicated high specificity (non-event NRIs: 76%-88%), but poor sensitivity (event NRIs: -23% to -3%). Five diabetes risk scores established in middle-aged, racially homogenous adult populations are generally applicable to younger adults with good specificity but poor sensitivity. The addition of race to these models did not result in greater predictive capabilities. A more sensitive risk score to predict diabetes in younger adults is needed. Copyright © 2017 Elsevier Inc. All rights reserved.
Algorithms for the Computation of Debris Risk

NASA Technical Reports Server (NTRS)

Matney, Mark J.

2017-01-01

Determining the risks from space debris involve a number of statistical calculations. These calculations inevitably involve assumptions about geometry - including the physical geometry of orbits and the geometry of satellites. A number of tools have been developed in NASA’s Orbital Debris Program Office to handle these calculations; many of which have never been published before. These include algorithms that are used in NASA’s Orbital Debris Engineering Model ORDEM 3.0, as well as other tools useful for computing orbital collision rates and ground casualty risks. This paper presents an introduction to these algorithms and the assumptions upon which they are based.
Algorithms for the Computation of Debris Risks

NASA Technical Reports Server (NTRS)

Matney, Mark

2017-01-01

Determining the risks from space debris involve a number of statistical calculations. These calculations inevitably involve assumptions about geometry - including the physical geometry of orbits and the geometry of non-spherical satellites. A number of tools have been developed in NASA's Orbital Debris Program Office to handle these calculations; many of which have never been published before. These include algorithms that are used in NASA's Orbital Debris Engineering Model ORDEM 3.0, as well as other tools useful for computing orbital collision rates and ground casualty risks. This paper will present an introduction to these algorithms and the assumptions upon which they are based.
Recent Progress in the Development of Metabolome Databases for Plant Systems Biology

PubMed Central

Fukushima, Atsushi; Kusano, Miyako

2013-01-01

Metabolomics has grown greatly as a functional genomics tool, and has become an invaluable diagnostic tool for biochemical phenotyping of biological systems. Over the past decades, a number of databases involving information related to mass spectra, compound names and structures, statistical/mathematical models and metabolic pathways, and metabolite profile data have been developed. Such databases complement each other and support efficient growth in this area, although the data resources remain scattered across the World Wide Web. Here, we review available metabolome databases and summarize the present status of development of related tools, particularly focusing on the plant metabolome. Data sharing discussed here will pave way for the robust interpretation of metabolomic data and advances in plant systems biology. PMID:23577015

Tracing the source of numerical climate model uncertainties in precipitation simulations using a feature-oriented statistical model

NASA Astrophysics Data System (ADS)

Xu, Y.; Jones, A. D.; Rhoades, A.

2017-12-01

Precipitation is a key component in hydrologic cycles, and changing precipitation regimes contribute to more intense and frequent drought and flood events around the world. Numerical climate modeling is a powerful tool to study climatology and to predict future changes. Despite the continuous improvement in numerical models, long-term precipitation prediction remains a challenge especially at regional scales. To improve numerical simulations of precipitation, it is important to find out where the uncertainty in precipitation simulations comes from. There are two types of uncertainty in numerical model predictions. One is related to uncertainty in the input data, such as model's boundary and initial conditions. These uncertainties would propagate to the final model outcomes even if the numerical model has exactly replicated the true world. But a numerical model cannot exactly replicate the true world. Therefore, the other type of model uncertainty is related the errors in the model physics, such as the parameterization of sub-grid scale processes, i.e., given precise input conditions, how much error could be generated by the in-precise model. Here, we build two statistical models based on a neural network algorithm to predict long-term variation of precipitation over California: one uses "true world" information derived from observations, and the other uses "modeled world" information using model inputs and outputs from the North America Coordinated Regional Downscaling Project (NA CORDEX). We derive multiple climate feature metrics as the predictors for the statistical model to represent the impact of global climate on local hydrology, and include topography as a predictor to represent the local control. We first compare the predictors between the true world and the modeled world to determine the errors contained in the input data. By perturbing the predictors in the statistical model, we estimate how much uncertainty in the model's final outcomes is accounted for by each predictor. By comparing the statistical model derived from true world information and modeled world information, we assess the errors lying in the physics of the numerical models. This work provides a unique insight to assess the performance of numerical climate models, and can be used to guide improvement of precipitation prediction.
Sequence History Update Tool

NASA Technical Reports Server (NTRS)

Khanampompan, Teerapat; Gladden, Roy; Fisher, Forest; DelGuercio, Chris

2008-01-01

The Sequence History Update Tool performs Web-based sequence statistics archiving for Mars Reconnaissance Orbiter (MRO). Using a single UNIX command, the software takes advantage of sequencing conventions to automatically extract the needed statistics from multiple files. This information is then used to populate a PHP database, which is then seamlessly formatted into a dynamic Web page. This tool replaces a previous tedious and error-prone process of manually editing HTML code to construct a Web-based table. Because the tool manages all of the statistics gathering and file delivery to and from multiple data sources spread across multiple servers, there is also a considerable time and effort savings. With the use of The Sequence History Update Tool what previously took minutes is now done in less than 30 seconds, and now provides a more accurate archival record of the sequence commanding for MRO.
Creating, generating and comparing random network models with NetworkRandomizer.

PubMed

Tosadori, Gabriele; Bestvina, Ivan; Spoto, Fausto; Laudanna, Carlo; Scardoni, Giovanni

2016-01-01

Biological networks are becoming a fundamental tool for the investigation of high-throughput data in several fields of biology and biotechnology. With the increasing amount of information, network-based models are gaining more and more interest and new techniques are required in order to mine the information and to validate the results. To fill the validation gap we present an app, for the Cytoscape platform, which aims at creating randomised networks and randomising existing, real networks. Since there is a lack of tools that allow performing such operations, our app aims at enabling researchers to exploit different, well known random network models that could be used as a benchmark for validating real, biological datasets. We also propose a novel methodology for creating random weighted networks, i.e. the multiplication algorithm, starting from real, quantitative data. Finally, the app provides a statistical tool that compares real versus randomly computed attributes, in order to validate the numerical findings. In summary, our app aims at creating a standardised methodology for the validation of the results in the context of the Cytoscape platform.
Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

PubMed

Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

2017-02-01

We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.
Multivariate model of female black bear habitat use for a Geographic Information System

USGS Publications Warehouse

Clark, Joseph D.; Dunn, James E.; Smith, Kimberly G.

1993-01-01

Simple univariate statistical techniques may not adequately assess the multidimensional nature of habitats used by wildlife. Thus, we developed a multivariate method to model habitat-use potential using a set of female black bear (Ursus americanus) radio locations and habitat data consisting of forest cover type, elevation, slope, aspect, distance to roads, distance to streams, and forest cover type diversity score in the Ozark Mountains of Arkansas. The model is based on the Mahalanobis distance statistic coupled with Geographic Information System (GIS) technology. That statistic is a measure of dissimilarity and represents a standardized squared distance between a set of sample variates and an ideal based on the mean of variates associated with animal observations. Calculations were made with the GIS to produce a map containing Mahalanobis distance values within each cell on a 60- × 60-m grid. The model identified areas of high habitat use potential that could not otherwise be identified by independent perusal of any single map layer. This technique avoids many pitfalls that commonly affect typical multivariate analyses of habitat use and is a useful tool for habitat manipulation or mitigation to favor terrestrial vertebrates that use habitats on a landscape scale.
A meta-analysis and statistical modelling of nitrates in groundwater at the African scale

NASA Astrophysics Data System (ADS)

Ouedraogo, Issoufou; Vanclooster, Marnik

2016-06-01

Contamination of groundwater with nitrate poses a major health risk to millions of people around Africa. Assessing the space-time distribution of this contamination, as well as understanding the factors that explain this contamination, is important for managing sustainable drinking water at the regional scale. This study aims to assess the variables that contribute to nitrate pollution in groundwater at the African scale by statistical modelling. We compiled a literature database of nitrate concentration in groundwater (around 250 studies) and combined it with digital maps of physical attributes such as soil, geology, climate, hydrogeology, and anthropogenic data for statistical model development. The maximum, medium, and minimum observed nitrate concentrations were analysed. In total, 13 explanatory variables were screened to explain observed nitrate pollution in groundwater. For the mean nitrate concentration, four variables are retained in the statistical explanatory model: (1) depth to groundwater (shallow groundwater, typically < 50 m); (2) recharge rate; (3) aquifer type; and (4) population density. The first three variables represent intrinsic vulnerability of groundwater systems to pollution, while the latter variable is a proxy for anthropogenic pollution pressure. The model explains 65 % of the variation of mean nitrate contamination in groundwater at the African scale. Using the same proxy information, we could develop a statistical model for the maximum nitrate concentrations that explains 42 % of the nitrate variation. For the maximum concentrations, other environmental attributes such as soil type, slope, rainfall, climate class, and region type improve the prediction of maximum nitrate concentrations at the African scale. As to minimal nitrate concentrations, in the absence of normal distribution assumptions of the data set, we do not develop a statistical model for these data. The data-based statistical model presented here represents an important step towards developing tools that will allow us to accurately predict nitrate distribution at the African scale and thus may support groundwater monitoring and water management that aims to protect groundwater systems. Yet they should be further refined and validated when more detailed and harmonized data become available and/or combined with more conceptual descriptions of the fate of nutrients in the hydrosystem.
Improved Analysis of Earth System Models and Observations using Simple Climate Models

NASA Astrophysics Data System (ADS)

Nadiga, B. T.; Urban, N. M.

2016-12-01

Earth system models (ESM) are the most comprehensive tools we have to study climate change and develop climate projections. However, the computational infrastructure required and the cost incurred in running such ESMs precludes direct use of such models in conjunction with a wide variety of tools that can further our understanding of climate. Here we are referring to tools that range from dynamical systems tools that give insight into underlying flow structure and topology to tools that come from various applied mathematical and statistical techniques and are central to quantifying stability, sensitivity, uncertainty and predictability to machine learning tools that are now being rapidly developed or improved. Our approach to facilitate the use of such models is to analyze output of ESM experiments (cf. CMIP) using a range of simpler models that consider integral balances of important quantities such as mass and/or energy in a Bayesian framework.We highlight the use of this approach in the context of the uptake of heat by the world oceans in the ongoing global warming. Indeed, since in excess of 90% of the anomalous radiative forcing due greenhouse gas emissions is sequestered in the world oceans, the nature of ocean heat uptake crucially determines the surface warming that is realized (cf. climate sensitivity). Nevertheless, ESMs themselves are never run long enough to directly assess climate sensitivity. So, we consider a range of models based on integral balances--balances that have to be realized in all first-principles based models of the climate system including the most detailed state-of-the art climate simulations. The models range from simple models of energy balance to those that consider dynamically important ocean processes such as the conveyor-belt circulation (Meridional Overturning Circulation, MOC), North Atlantic Deep Water (NADW) formation, Antarctic Circumpolar Current (ACC) and eddy mixing. Results from Bayesian analysis of such models using both ESM experiments and actual observations are presented. One such result points to the importance of direct sequestration of heat below 700 m, a process that is not allowed for in the simple models that have been traditionally used to deduce climate sensitivity.
Sampling and sensitivity analyses tools (SaSAT) for computational modelling

PubMed Central

Hoare, Alexander; Regan, David G; Wilson, David P

2008-01-01

SaSAT (Sampling and Sensitivity Analysis Tools) is a user-friendly software package for applying uncertainty and sensitivity analyses to mathematical and computational models of arbitrary complexity and context. The toolbox is built in Matlab®, a numerical mathematical software package, and utilises algorithms contained in the Matlab® Statistics Toolbox. However, Matlab® is not required to use SaSAT as the software package is provided as an executable file with all the necessary supplementary files. The SaSAT package is also designed to work seamlessly with Microsoft Excel but no functionality is forfeited if that software is not available. A comprehensive suite of tools is provided to enable the following tasks to be easily performed: efficient and equitable sampling of parameter space by various methodologies; calculation of correlation coefficients; regression analysis; factor prioritisation; and graphical output of results, including response surfaces, tornado plots, and scatterplots. Use of SaSAT is exemplified by application to a simple epidemic model. To our knowledge, a number of the methods available in SaSAT for performing sensitivity analyses have not previously been used in epidemiological modelling and their usefulness in this context is demonstrated. PMID:18304361
A Prototype of Pilot Knowledge Evaluation by an Intelligent CAI (Computer -Aided Instruction) System Using a Bayesian Diagnostic Model.

DTIC Science & Technology

1987-06-01

to a field of research called Computer-Aided Instruction (CAI). CAI is a powerful methodology for enhancing the overall quaiity and effectiveness of...provides a very powerful tool for statistical inference, especially when pooling informations from different source is appropriate. Thus. prior...04 , 2 ’ .. ."k, + ++ ,,;-+-,..,,..v ->’,0,,.’ I The power of the model lies in its ability to adapt a diagnostic session to the level of knowledge
Characterization of Cloud Water-Content Distribution

NASA Technical Reports Server (NTRS)

Lee, Seungwon

2010-01-01

The development of realistic cloud parameterizations for climate models requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, a software tool was developed to characterize cloud water-content distributions in climate-model sub-grid scales. This software characterizes distributions of cloud water content with respect to cloud phase, cloud type, precipitation occurrence, and geo-location using CloudSat radar measurements. It uses a statistical method called maximum likelihood estimation to estimate the probability density function of the cloud water content.
Generalized Linear Models of Home Activity for Automatic Detection of Mild Cognitive Impairment in Older Adults*

PubMed Central

Akl, Ahmad; Snoek, Jasper; Mihailidis, Alex

2015-01-01

With a globally aging population, the burden of care of cognitively impaired older adults is becoming increasingly concerning. Instances of Alzheimer’s disease and other forms of dementia are becoming ever more frequent. Earlier detection of cognitive impairment offers significant benefits, but remains difficult to do in practice. In this paper, we develop statistical models of the behavior of older adults within their homes using sensor data in order to detect the early onset of cognitive decline. Specifically, we use inhomogenous Poisson processes to model the presence of subjects within different rooms throughout the day in the home using unobtrusive sensing technologies. We compare the distributions learned from cognitively intact and impaired subjects using information theoretic tools and observe statistical differences between the two populations which we believe can be used to help detect the onset of cognitive decline. PMID:25570050
Generalized Linear Models of home activity for automatic detection of mild cognitive impairment in older adults.

PubMed

Akl, Ahmad; Snoek, Jasper; Mihailidis, Alex

2014-01-01

With a globally aging population, the burden of care of cognitively impaired older adults is becoming increasingly concerning. Instances of Alzheimer's disease and other forms of dementia are becoming ever more frequent. Earlier detection of cognitive impairment offers significant benefits, but remains difficult to do in practice. In this paper, we develop statistical models of the behavior of older adults within their homes using sensor data in order to detect the early onset of cognitive decline. Specifically, we use inhomogenous Poisson processes to model the presence of subjects within different rooms throughout the day in the home using unobtrusive sensing technologies. We compare the distributions learned from cognitively intact and impaired subjects using information theoretic tools and observe statistical differences between the two populations which we believe can be used to help detect the onset of cognitive decline.
Introducing SONS, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures.

PubMed

Schloss, Patrick D; Handelsman, Jo

2006-10-01

The recent advent of tools enabling statistical inferences to be drawn from comparisons of microbial communities has enabled the focus of microbial ecology to move from characterizing biodiversity to describing the distribution of that biodiversity. Although statistical tools have been developed to compare community structures across a phylogenetic tree, we lack tools to compare the memberships and structures of two communities at a particular operational taxonomic unit (OTU) definition. Furthermore, current tests of community structure do not indicate the similarity of the communities but only report the probability of a statistical hypothesis. Here we present a computer program, SONS, which implements nonparametric estimators for the fraction and richness of OTUs shared between two communities.
Software Used to Generate Cancer Statistics - SEER Cancer Statistics

Cancer.gov

Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.
Polypropylene Production Optimization in Fluidized Bed Catalytic Reactor (FBCR): Statistical Modeling and Pilot Scale Experimental Validation

PubMed Central

Khan, Mohammad Jakir Hossain; Hussain, Mohd Azlan; Mujtaba, Iqbal Mohammed

2014-01-01

Propylene is one type of plastic that is widely used in our everyday life. This study focuses on the identification and justification of the optimum process parameters for polypropylene production in a novel pilot plant based fluidized bed reactor. This first-of-its-kind statistical modeling with experimental validation for the process parameters of polypropylene production was conducted by applying ANNOVA (Analysis of variance) method to Response Surface Methodology (RSM). Three important process variables i.e., reaction temperature, system pressure and hydrogen percentage were considered as the important input factors for the polypropylene production in the analysis performed. In order to examine the effect of process parameters and their interactions, the ANOVA method was utilized among a range of other statistical diagnostic tools such as the correlation between actual and predicted values, the residuals and predicted response, outlier t plot, 3D response surface and contour analysis plots. The statistical analysis showed that the proposed quadratic model had a good fit with the experimental results. At optimum conditions with temperature of 75°C, system pressure of 25 bar and hydrogen percentage of 2%, the highest polypropylene production obtained is 5.82% per pass. Hence it is concluded that the developed experimental design and proposed model can be successfully employed with over a 95% confidence level for optimum polypropylene production in a fluidized bed catalytic reactor (FBCR). PMID:28788576
Prediction of N-nitrosodimethylamine (NDMA) formation as a disinfection by-product.

PubMed

Kim, Jongo; Clevenger, Thomas E

2007-06-25

This study investigated the possibility of a statistical model application for the prediction of N-nitrosodimethylamine (NDMA) formation. The NDMA formation was studied as a function of monochloramine concentration (0.001-5mM) at fixed dimethylamine (DMA) concentrations of 0.01mM or 0.05mM. Excellent linear correlations were observed between the molar ratio of monochloramine to DMA and the NDMA formation on a log scale at pH 7 and 8. When a developed prediction equation was applied to a previously reported study, a good result was obtained. The statistical model appears to predict adequately NDMA concentrations if other NDMA precursors are excluded. Using the predictive tool, a simple and approximate calculation of NDMA formation can be obtained in drinking water systems.
Synchrony and motor mimicking in chimpanzee observational learning

PubMed Central

Fuhrmann, Delia; Ravignani, Andrea; Marshall-Pescini, Sarah; Whiten, Andrew

2014-01-01

Cumulative tool-based culture underwrote our species' evolutionary success, and tool-based nut-cracking is one of the strongest candidates for cultural transmission in our closest relatives, chimpanzees. However the social learning processes that may explain both the similarities and differences between the species remain unclear. A previous study of nut-cracking by initially naïve chimpanzees suggested that a learning chimpanzee holding no hammer nevertheless replicated hammering actions it witnessed. This observation has potentially important implications for the nature of the social learning processes and underlying motor coding involved. In the present study, model and observer actions were quantified frame-by-frame and analysed with stringent statistical methods, demonstrating synchrony between the observer's and model's movements, cross-correlation of these movements above chance level and a unidirectional transmission process from model to observer. These results provide the first quantitative evidence for motor mimicking underlain by motor coding in apes, with implications for mirror neuron function. PMID:24923651
Rasch analysis of Stamps's Index of Work Satisfaction in nursing population.

PubMed

Ahmad, Nora; Oranye, Nelson Ositadimma; Danilov, Alyona

2017-01-01

One of the most commonly used tools for measuring job satisfaction in nursing is the Stamps Index of Work Satisfaction. Several studies have reported on the reliability of the Stamps' tool based on traditional statistical model. The aim of this study was to apply the Rasch model to examine the adequacy of Stamps's Index of Work Satisfaction for measuring nurses' job satisfaction cross-culturally and to determine the validity and reliability of the instrument using the Rasch criteria. A secondary data analysis was conducted on a sample of 556 registered nurses from two countries. The RUMM 2030 software was used to analyse the psychometric properties of the Index of Work Satisfaction. The persons mean location of -0.018 approximated the items mean of 0.00, suggesting a good alignment of the measure and the traits being measured. However, at the items level, some items were misfiting to the Rasch model.
Empirical flow parameters : a tool for hydraulic model validity

USGS Publications Warehouse

Asquith, William H.; Burley, Thomas E.; Cleveland, Theodore G.

2013-01-01

The objectives of this project were (1) To determine and present from existing data in Texas, relations between observed stream flow, topographic slope, mean section velocity, and other hydraulic factors, to produce charts such as Figure 1 and to produce empirical distributions of the various flow parameters to provide a methodology to "check if model results are way off!"; (2) To produce a statistical regional tool to estimate mean velocity or other selected parameters for storm flows or other conditional discharges at ungauged locations (most bridge crossings) in Texas to provide a secondary way to compare such values to a conventional hydraulic modeling approach. (3.) To present ancillary values such as Froude number, stream power, Rosgen channel classification, sinuosity, and other selected characteristics (readily determinable from existing data) to provide additional information to engineers concerned with the hydraulic-soil-foundation component of transportation infrastructure.
Synchrony and motor mimicking in chimpanzee observational learning.

PubMed

Fuhrmann, Delia; Ravignani, Andrea; Marshall-Pescini, Sarah; Whiten, Andrew

2014-06-13

Cumulative tool-based culture underwrote our species' evolutionary success, and tool-based nut-cracking is one of the strongest candidates for cultural transmission in our closest relatives, chimpanzees. However the social learning processes that may explain both the similarities and differences between the species remain unclear. A previous study of nut-cracking by initially naïve chimpanzees suggested that a learning chimpanzee holding no hammer nevertheless replicated hammering actions it witnessed. This observation has potentially important implications for the nature of the social learning processes and underlying motor coding involved. In the present study, model and observer actions were quantified frame-by-frame and analysed with stringent statistical methods, demonstrating synchrony between the observer's and model's movements, cross-correlation of these movements above chance level and a unidirectional transmission process from model to observer. These results provide the first quantitative evidence for motor mimicking underlain by motor coding in apes, with implications for mirror neuron function.

A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

NASA Astrophysics Data System (ADS)

Guillen, George; Rainey, Gail; Morin, Michelle

2004-04-01

Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Modelling of electronic excitation and radiation in the Direct Simulation Monte Carlo Macroscopic Chemistry Method

NASA Astrophysics Data System (ADS)

Goldsworthy, M. J.

2012-10-01

One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.
Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population

PubMed Central

2013-01-01

Background The present study aimed to develop an artificial neural network (ANN) based prediction model for cardiovascular autonomic (CA) dysfunction in the general population. Methods We analyzed a previous dataset based on a population sample consisted of 2,092 individuals aged 30–80 years. The prediction models were derived from an exploratory set using ANN analysis. Performances of these prediction models were evaluated in the validation set. Results Univariate analysis indicated that 14 risk factors showed statistically significant association with CA dysfunction (P < 0.05). The mean area under the receiver-operating curve was 0.762 (95% CI 0.732–0.793) for prediction model developed using ANN analysis. The mean sensitivity, specificity, positive and negative predictive values were similar in the prediction models was 0.751, 0.665, 0.330 and 0.924, respectively. All HL statistics were less than 15.0. Conclusion ANN is an effective tool for developing prediction models with high value for predicting CA dysfunction among the general population. PMID:23902963
Statistical approaches to account for missing values in accelerometer data: Applications to modeling physical activity.

PubMed

Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki

2018-04-01

Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
A Flexible Approach for the Statistical Visualization of Ensemble Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potter, K.; Wilson, A.; Bremer, P.

2009-09-29

Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less
Quantifying variation in speciation and extinction rates with clade data.

PubMed

Paradis, Emmanuel; Tedesco, Pablo A; Hugueny, Bernard

2013-12-01

High-level phylogenies are very common in evolutionary analyses, although they are often treated as incomplete data. Here, we provide statistical tools to analyze what we name "clade data," which are the ages of clades together with their numbers of species. We develop a general approach for the statistical modeling of variation in speciation and extinction rates, including temporal variation, unknown variation, and linear and nonlinear modeling. We show how this approach can be generalized to a wide range of situations, including testing the effects of life-history traits and environmental variables on diversification rates. We report the results of an extensive simulation study to assess the performance of some statistical tests presented here as well as of the estimators of speciation and extinction rates. These latter results suggest the possibility to estimate correctly extinction rate in the absence of fossils. An example with data on fish is presented. © 2013 The Author(s). Evolution © 2013 The Society for the Study of Evolution.
Fully Bayesian inference for structural MRI: application to segmentation and statistical analysis of T2-hypointensities.

PubMed

Schmidt, Paul; Schmid, Volker J; Gaser, Christian; Buck, Dorothea; Bührlen, Susanne; Förschler, Annette; Mühlau, Mark

2013-01-01

Aiming at iron-related T2-hypointensity, which is related to normal aging and neurodegenerative processes, we here present two practicable approaches, based on Bayesian inference, for preprocessing and statistical analysis of a complex set of structural MRI data. In particular, Markov Chain Monte Carlo methods were used to simulate posterior distributions. First, we rendered a segmentation algorithm that uses outlier detection based on model checking techniques within a Bayesian mixture model. Second, we rendered an analytical tool comprising a Bayesian regression model with smoothness priors (in the form of Gaussian Markov random fields) mitigating the necessity to smooth data prior to statistical analysis. For validation, we used simulated data and MRI data of 27 healthy controls (age: [Formula: see text]; range, [Formula: see text]). We first observed robust segmentation of both simulated T2-hypointensities and gray-matter regions known to be T2-hypointense. Second, simulated data and images of segmented T2-hypointensity were analyzed. We found not only robust identification of simulated effects but also a biologically plausible age-related increase of T2-hypointensity primarily within the dentate nucleus but also within the globus pallidus, substantia nigra, and red nucleus. Our results indicate that fully Bayesian inference can successfully be applied for preprocessing and statistical analysis of structural MRI data.
Use of statistical and pharmacokinetic-pharmacodynamic modeling and simulation to improve decision-making: A section summary report of the trends and innovations in clinical trial statistics conference.

PubMed

Kimko, Holly; Berry, Seth; O'Kelly, Michael; Mehrotra, Nitin; Hutmacher, Matthew; Sethuraman, Venkat

2017-01-01

The application of modeling and simulation (M&S) methods to improve decision-making was discussed during the Trends & Innovations in Clinical Trial Statistics Conference held in Durham, North Carolina, USA on May 1-4, 2016. Uses of both pharmacometric and statistical M&S were presented during the conference, highlighting the diversity of the methods employed by pharmacometricians and statisticians to address a broad range of quantitative issues in drug development. Five presentations are summarized herein, which cover the development strategy of employing M&S to drive decision-making; European initiatives on best practice in M&S; case studies of pharmacokinetic/pharmacodynamics modeling in regulatory decisions; estimation of exposure-response relationships in the presence of confounding; and the utility of estimating the probability of a correct decision for dose selection when prior information is limited. While M&S has been widely used during the last few decades, it is expected to play an essential role as more quantitative assessments are employed in the decision-making process. By integrating M&S as a tool to compile the totality of evidence collected throughout the drug development program, more informed decisions will be made.
Statistical tools for transgene copy number estimation based on real-time PCR.

PubMed

Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

2007-11-01

As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Kinetic Analysis of Dynamic Positron Emission Tomography Data using Open-Source Image Processing and Statistical Inference Tools.

PubMed

Hawe, David; Hernández Fernández, Francisco R; O'Suilleabháin, Liam; Huang, Jian; Wolsztynski, Eric; O'Sullivan, Finbarr

2012-05-01

In dynamic mode, positron emission tomography (PET) can be used to track the evolution of injected radio-labelled molecules in living tissue. This is a powerful diagnostic imaging technique that provides a unique opportunity to probe the status of healthy and pathological tissue by examining how it processes substrates. The spatial aspect of PET is well established in the computational statistics literature. This article focuses on its temporal aspect. The interpretation of PET time-course data is complicated because the measured signal is a combination of vascular delivery and tissue retention effects. If the arterial time-course is known, the tissue time-course can typically be expressed in terms of a linear convolution between the arterial time-course and the tissue residue. In statistical terms, the residue function is essentially a survival function - a familiar life-time data construct. Kinetic analysis of PET data is concerned with estimation of the residue and associated functionals such as flow, flux, volume of distribution and transit time summaries. This review emphasises a nonparametric approach to the estimation of the residue based on a piecewise linear form. Rapid implementation of this by quadratic programming is described. The approach provides a reference for statistical assessment of widely used one- and two-compartmental model forms. We illustrate the method with data from two of the most well-established PET radiotracers, (15)O-H(2)O and (18)F-fluorodeoxyglucose, used for assessment of blood perfusion and glucose metabolism respectively. The presentation illustrates the use of two open-source tools, AMIDE and R, for PET scan manipulation and model inference.
Development and Evaluation of the American College of Surgeons NSQIP Pediatric Surgical Risk Calculator.

PubMed

Kraemer, Kari; Cohen, Mark E; Liu, Yaoming; Barnhart, Douglas C; Rangel, Shawn J; Saito, Jacqueline M; Bilimoria, Karl Y; Ko, Clifford Y; Hall, Bruce L

2016-11-01

There is an increased desire among patients and families to be involved in the surgical decision-making process. A surgeon's ability to provide patients and families with patient-specific estimates of postoperative complications is critical for shared decision making and informed consent. Surgeons can also use patient-specific risk estimates to decide whether or not to operate and what options to offer patients. Our objective was to develop and evaluate a publicly available risk estimation tool that would cover many common pediatric surgical procedures across all specialties. American College of Surgeons NSQIP Pediatric standardized data from 67 hospitals were used to develop a risk estimation tool. Surgeons enter 18 preoperative variables (demographics, comorbidities, procedure) that are used in a logistic regression model to predict 9 postoperative outcomes. A surgeon adjustment score is also incorporated to adjust for any additional risk not accounted for in the 18 risk factors. A pediatric surgical risk calculator was developed based on 181,353 cases covering 382 CPT codes across all specialties. It had excellent discrimination for mortality (c-statistic = 0.98), morbidity (c-statistic = 0.81), and 7 additional complications (c-statistic > 0.77). The Hosmer-Lemeshow statistic and graphic representations also showed excellent calibration. The ACS NSQIP Pediatric Surgical Risk Calculator was developed using standardized and audited multi-institutional data from the ACS NSQIP Pediatric, and it provides empirically derived, patient-specific postoperative risks. It can be used as a tool in the shared decision-making process by providing clinicians, families, and patients with useful information for many of the most common operations performed on pediatric patients in the US. Copyright © 2016 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
A New Approach to Monte Carlo Simulations in Statistical Physics

NASA Astrophysics Data System (ADS)

Landau, David P.

2002-08-01

Monte Carlo simulations [1] have become a powerful tool for the study of diverse problems in statistical/condensed matter physics. Standard methods sample the probability distribution for the states of the system, most often in the canonical ensemble, and over the past several decades enormous improvements have been made in performance. Nonetheless, difficulties arise near phase transitions-due to critical slowing down near 2nd order transitions and to metastability near 1st order transitions, and these complications limit the applicability of the method. We shall describe a new Monte Carlo approach [2] that uses a random walk in energy space to determine the density of states directly. Once the density of states is known, all thermodynamic properties can be calculated. This approach can be extended to multi-dimensional parameter spaces and should be effective for systems with complex energy landscapes, e.g., spin glasses, protein folding models, etc. Generalizations should produce a broadly applicable optimization tool. 1. A Guide to Monte Carlo Simulations in Statistical Physics, D. P. Landau and K. Binder (Cambridge U. Press, Cambridge, 2000). 2. Fugao Wang and D. P. Landau, Phys. Rev. Lett. 86, 2050 (2001); Phys. Rev. E64, 056101-1 (2001).
Machine learning patterns for neuroimaging-genetic studies in the cloud.

PubMed

Da Mota, Benoit; Tudoran, Radu; Costan, Alexandru; Varoquaux, Gaël; Brasche, Goetz; Conrod, Patricia; Lemaitre, Herve; Paus, Tomas; Rietschel, Marcella; Frouin, Vincent; Poline, Jean-Baptiste; Antoniu, Gabriel; Thirion, Bertrand

2014-01-01

Brain imaging is a natural intermediate phenotype to understand the link between genetic information and behavior or brain pathologies risk factors. Massive efforts have been made in the last few years to acquire high-dimensional neuroimaging and genetic data on large cohorts of subjects. The statistical analysis of such data is carried out with increasingly sophisticated techniques and represents a great computational challenge. Fortunately, increasing computational power in distributed architectures can be harnessed, if new neuroinformatics infrastructures are designed and training to use these new tools is provided. Combining a MapReduce framework (TomusBLOB) with machine learning algorithms (Scikit-learn library), we design a scalable analysis tool that can deal with non-parametric statistics on high-dimensional data. End-users describe the statistical procedure to perform and can then test the model on their own computers before running the very same code in the cloud at a larger scale. We illustrate the potential of our approach on real data with an experiment showing how the functional signal in subcortical brain regions can be significantly fit with genome-wide genotypes. This experiment demonstrates the scalability and the reliability of our framework in the cloud with a 2 weeks deployment on hundreds of virtual machines.
Statistical Issues for Calculating Reentry Hazards

NASA Technical Reports Server (NTRS)

Matney, Mark; Bacon, John

2016-01-01

A number of statistical tools have been developed over the years for assessing the risk of reentering object to human populations. These tools make use of the characteristics (e.g., mass, shape, size) of debris that are predicted by aerothermal models to survive reentry. This information, combined with information on the expected ground path of the reentry, is used to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of this analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. This inevitably involves assumptions that simplify the problem and make it tractable, but it is often difficult to test the accuracy and applicability of these assumptions. This paper builds on previous IAASS work to re-examine many of these theoretical assumptions, including the mathematical basis for the hazard calculations, and outlining the conditions under which the simplifying assumptions hold. This study also employs empirical and theoretical information to test these assumptions, and makes recommendations how to improve the accuracy of these calculations in the future.
Statistical Issues for Calculating Reentry Hazards

NASA Technical Reports Server (NTRS)

Bacon, John B.; Matney, Mark

2016-01-01

A number of statistical tools have been developed over the years for assessing the risk of reentering object to human populations. These tools make use of the characteristics (e.g., mass, shape, size) of debris that are predicted by aerothermal models to survive reentry. This information, combined with information on the expected ground path of the reentry, is used to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of this analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. This inevitably involves assumptions that simplify the problem and make it tractable, but it is often difficult to test the accuracy and applicability of these assumptions. This paper builds on previous IAASS work to re-examine one of these theoretical assumptions.. This study employs empirical and theoretical information to test the assumption of a fully random decay along the argument of latitude of the final orbit, and makes recommendations how to improve the accuracy of this calculation in the future.
Evolution of Western Mediterranean Sea Surface Temperature between 1985 and 2005: a complementary study in situ, satellite and modelling approaches

NASA Astrophysics Data System (ADS)

Troupin, C.; Lenartz, F.; Sirjacobs, D.; Alvera-Azcárate, A.; Barth, A.; Ouberdous, M.; Beckers, J.-M.

2009-04-01

In order to evaluate the variability of the sea surface temperature (SST) in the Western Mediterranean Sea between 1985 and 2005, an integrated approach combining geostatistical tools and modelling techniques has been set up. The objectives are: underline the capability of each tool to capture characteristic phenomena, compare and assess the quality of their outputs, infer an interannual trend from the results. Diva (Data Interpolating Variationnal Analysis, Brasseur et al. (1996) Deep-Sea Res.) was applied on a collection of in situ data gathered from various sources (World Ocean Database 2005, Hydrobase2, Coriolis and MedAtlas2), from which duplicates and suspect values were removed. This provided monthly gridded fields in the region of interest. Heterogeneous time data coverage was taken into account by computing and removing the annual trend, provided by Diva detrending tool. Heterogeneous correlation length was applied through an advection constraint. Statistical technique DINEOF (Data Interpolation with Empirical Orthogonal Functions, Alvera-Azc
Vector Autoregression, Structural Equation Modeling, and Their Synthesis in Neuroimaging Data Analysis

PubMed Central

Chen, Gang; Glen, Daniel R.; Saad, Ziad S.; Hamilton, J. Paul; Thomason, Moriah E.; Gotlib, Ian H.; Cox, Robert W.

2011-01-01

Vector autoregression (VAR) and structural equation modeling (SEM) are two popular brain-network modeling tools. VAR, which is a data-driven approach, assumes that connected regions exert time-lagged influences on one another. In contrast, the hypothesis-driven SEM is used to validate an existing connectivity model where connected regions have contemporaneous interactions among them. We present the two models in detail and discuss their applicability to FMRI data, and interpretational limits. We also propose a unified approach that models both lagged and contemporaneous effects. The unifying model, structural vector autoregression (SVAR), may improve statistical and explanatory power, and avoids some prevalent pitfalls that can occur when VAR and SEM are utilized separately. PMID:21975109
GAPIT version 2: an enhanced integrated tool for genomic association and prediction

USDA-ARS?s Scientific Manuscript database

Most human diseases and agriculturally important traits are complex. Dissecting their genetic architecture requires continued development of innovative and powerful statistical methods. Corresponding advances in computing tools are critical to efficiently use these statistical innovations and to enh...
Objective Lightning Forecasting at Kennedy Space Center and Cape Canaveral Air Force Station using Cloud-to-Ground Lightning Surveillance System Data

NASA Technical Reports Server (NTRS)

Lambert, Winfred; Wheeler, Mark; Roeder, William

2005-01-01

The 45th Weather Squadron (45 WS) at Cape Canaveral Air-Force Station (CCAFS)ln Florida issues a probability of lightning occurrence in their daily 24-hour and weekly planning forecasts. This information is used for general planning of operations at CCAFS and Kennedy Space Center (KSC). These facilities are located in east-central Florida at the east end of a corridor known as 'Lightning Alley', an indication that lightning has a large impact on space-lift operations. Much of the current lightning probability forecast is based on a subjective analysis of model and observational data and an objective forecast tool developed over 30 years ago. The 45 WS requested that a new lightning probability forecast tool based on statistical analysis of more recent historical warm season (May-September) data be developed in order to increase the objectivity of the daily thunderstorm probability forecast. The resulting tool is a set of statistical lightning forecast equations, one for each month of the warm season, that provide a lightning occurrence probability for the day by 1100 UTC (0700 EDT) during the warm season.
Application of statistical classification methods for predicting the acceptability of well-water quality

NASA Astrophysics Data System (ADS)

Cameron, Enrico; Pilla, Giorgio; Stella, Fabio A.

2018-06-01

The application of statistical classification methods is investigated—in comparison also to spatial interpolation methods—for predicting the acceptability of well-water quality in a situation where an effective quantitative model of the hydrogeological system under consideration cannot be developed. In the example area in northern Italy, in particular, the aquifer is locally affected by saline water and the concentration of chloride is the main indicator of both saltwater occurrence and groundwater quality. The goal is to predict if the chloride concentration in a water well will exceed the allowable concentration so that the water is unfit for the intended use. A statistical classification algorithm achieved the best predictive performances and the results of the study show that statistical classification methods provide further tools for dealing with groundwater quality problems concerning hydrogeological systems that are too difficult to describe analytically or to simulate effectively.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.