statistical models based: Topics by Science.gov

Sample records for statistical models based

Comparing and combining process-based crop models and statistical models with some implications for climate change

NASA Astrophysics Data System (ADS)

Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram

2017-09-01

We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.
Physics-based statistical model and simulation method of RF propagation in urban environments

DOEpatents

Pao, Hsueh-Yuan; Dvorak, Steven L.

2010-09-14

A physics-based statistical model and simulation/modeling method and system of electromagnetic wave propagation (wireless communication) in urban environments. In particular, the model is a computationally efficient close-formed parametric model of RF propagation in an urban environment which is extracted from a physics-based statistical wireless channel simulation method and system. The simulation divides the complex urban environment into a network of interconnected urban canyon waveguides which can be analyzed individually; calculates spectral coefficients of modal fields in the waveguides excited by the propagation using a database of statistical impedance boundary conditions which incorporates the complexity of building walls in the propagation model; determines statistical parameters of the calculated modal fields; and determines a parametric propagation model based on the statistical parameters of the calculated modal fields from which predictions of communications capability may be made.
Security of statistical data bases: invasion of privacy through attribute correlational modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Palley, M.A.

This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
A nonparametric spatial scan statistic for continuous data.

PubMed

Jung, Inkyung; Cho, Ho Jin

2015-10-20

Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Online Statistical Modeling (Regression Analysis) for Independent Responses

NASA Astrophysics Data System (ADS)

Made Tirta, I.; Anggraeni, Dian; Pandutama, Martinus

2017-06-01

Regression analysis (statistical analmodelling) are among statistical methods which are frequently needed in analyzing quantitative data, especially to model relationship between response and explanatory variables. Nowadays, statistical models have been developed into various directions to model various type and complex relationship of data. Rich varieties of advanced and recent statistical modelling are mostly available on open source software (one of them is R). However, these advanced statistical modelling, are not very friendly to novice R users, since they are based on programming script or command line interface. Our research aims to developed web interface (based on R and shiny), so that most recent and advanced statistical modelling are readily available, accessible and applicable on web. We have previously made interface in the form of e-tutorial for several modern and advanced statistical modelling on R especially for independent responses (including linear models/LM, generalized linier models/GLM, generalized additive model/GAM and generalized additive model for location scale and shape/GAMLSS). In this research we unified them in the form of data analysis, including model using Computer Intensive Statistics (Bootstrap and Markov Chain Monte Carlo/ MCMC). All are readily accessible on our online Virtual Statistics Laboratory. The web (interface) make the statistical modeling becomes easier to apply and easier to compare them in order to find the most appropriate model for the data.
Comparing statistical and process-based flow duration curve models in ungauged basins and changing rain regimes

NASA Astrophysics Data System (ADS)

Müller, M. F.; Thompson, S. E.

2016-02-01

The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drivers of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by frequent wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are favored over statistical models.
WORKSHOP ON APPLICATION OF STATISTICAL METHODS TO BIOLOGICALLY-BASED PHARMACOKINETIC MODELING FOR RISK ASSESSMENT

EPA Science Inventory

Biologically-based pharmacokinetic models are being increasingly used in the risk assessment of environmental chemicals. These models are based on biological, mathematical, statistical and engineering principles. Their potential uses in risk assessment include extrapolation betwe...
Some Statistics for Assessing Person-Fit Based on Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere Joan

2010-01-01

This article proposes several statistics for assessing individual fit based on two unidimensional models for continuous responses: linear factor analysis and Samejima's continuous response model. Both models are approached using a common framework based on underlying response variables and are formulated at the individual level as fixed regression…
Stochastic or statistic? Comparing flow duration curve models in ungauged basins and changing climates

NASA Astrophysics Data System (ADS)

Müller, M. F.; Thompson, S. E.

2015-09-01

The prediction of flow duration curves (FDCs) in ungauged basins remains an important task for hydrologists given the practical relevance of FDCs for water management and infrastructure design. Predicting FDCs in ungauged basins typically requires spatial interpolation of statistical or model parameters. This task is complicated if climate becomes non-stationary, as the prediction challenge now also requires extrapolation through time. In this context, process-based models for FDCs that mechanistically link the streamflow distribution to climate and landscape factors may have an advantage over purely statistical methods to predict FDCs. This study compares a stochastic (process-based) and statistical method for FDC prediction in both stationary and non-stationary contexts, using Nepal as a case study. Under contemporary conditions, both models perform well in predicting FDCs, with Nash-Sutcliffe coefficients above 0.80 in 75 % of the tested catchments. The main drives of uncertainty differ between the models: parameter interpolation was the main source of error for the statistical model, while violations of the assumptions of the process-based model represented the main source of its error. The process-based approach performed better than the statistical approach in numerical simulations with non-stationary climate drivers. The predictions of the statistical method under non-stationary rainfall conditions were poor if (i) local runoff coefficients were not accurately determined from the gauge network, or (ii) streamflow variability was strongly affected by changes in rainfall. A Monte Carlo analysis shows that the streamflow regimes in catchments characterized by a strong wet-season runoff and a rapid, strongly non-linear hydrologic response are particularly sensitive to changes in rainfall statistics. In these cases, process-based prediction approaches are strongly favored over statistical models.
Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

PubMed Central

Snijders, Tom A.B.; Steglich, Christian E.G.

2014-01-01

Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of generalized linear statistical models they aim to be realistic detailed representations of network dynamics in empirical data sets. Statistical parallels to micro-macro considerations can be found in the estimation of parameters determining local actor behavior from empirical data, and the assessment of goodness of fit from the correspondence with network-level descriptives. This article studies several network-level consequences of dynamic actor-based models applied to represent cross-sectional network data. Two examples illustrate how network-level characteristics can be obtained as emergent features implied by micro-specifications of actor-based models. PMID:25960578
Ultra-low-dose computed tomographic angiography with model-based iterative reconstruction compared with standard-dose imaging after endovascular aneurysm repair: a prospective pilot study.

PubMed

Naidu, Sailen G; Kriegshauser, J Scott; Paden, Robert G; He, Miao; Wu, Qing; Hara, Amy K

2014-12-01

An ultra-low-dose radiation protocol reconstructed with model-based iterative reconstruction was compared with our standard-dose protocol. This prospective study evaluated 20 men undergoing surveillance-enhanced computed tomography after endovascular aneurysm repair. All patients underwent standard-dose and ultra-low-dose venous phase imaging; images were compared after reconstruction with filtered back projection, adaptive statistical iterative reconstruction, and model-based iterative reconstruction. Objective measures of aortic contrast attenuation and image noise were averaged. Images were subjectively assessed (1 = worst, 5 = best) for diagnostic confidence, image noise, and vessel sharpness. Aneurysm sac diameter and endoleak detection were compared. Quantitative image noise was 26% less with ultra-low-dose model-based iterative reconstruction than with standard-dose adaptive statistical iterative reconstruction and 58% less than with ultra-low-dose adaptive statistical iterative reconstruction. Average subjective noise scores were not different between ultra-low-dose model-based iterative reconstruction and standard-dose adaptive statistical iterative reconstruction (3.8 vs. 4.0, P = .25). Subjective scores for diagnostic confidence were better with standard-dose adaptive statistical iterative reconstruction than with ultra-low-dose model-based iterative reconstruction (4.4 vs. 4.0, P = .002). Vessel sharpness was decreased with ultra-low-dose model-based iterative reconstruction compared with standard-dose adaptive statistical iterative reconstruction (3.3 vs. 4.1, P < .0001). Ultra-low-dose model-based iterative reconstruction and standard-dose adaptive statistical iterative reconstruction aneurysm sac diameters were not significantly different (4.9 vs. 4.9 cm); concordance for the presence of endoleak was 100% (P < .001). Compared with a standard-dose technique, an ultra-low-dose model-based iterative reconstruction protocol provides comparable image quality and diagnostic assessment at a 73% lower radiation dose.
Different Manhattan project: automatic statistical model generation

NASA Astrophysics Data System (ADS)

Yap, Chee Keng; Biermann, Henning; Hertzmann, Aaron; Li, Chen; Meyer, Jon; Pao, Hsing-Kuo; Paxia, Salvatore

2002-03-01

We address the automatic generation of large geometric models. This is important in visualization for several reasons. First, many applications need access to large but interesting data models. Second, we often need such data sets with particular characteristics (e.g., urban models, park and recreation landscape). Thus we need the ability to generate models with different parameters. We propose a new approach for generating such models. It is based on a top-down propagation of statistical parameters. We illustrate the method in the generation of a statistical model of Manhattan. But the method is generally applicable in the generation of models of large geographical regions. Our work is related to the literature on generating complex natural scenes (smoke, forests, etc) based on procedural descriptions. The difference in our approach stems from three characteristics: modeling with statistical parameters, integration of ground truth (actual map data), and a library-based approach for texture mapping.
A Model of Statistics Performance Based on Achievement Goal Theory.

ERIC Educational Resources Information Center

Bandalos, Deborah L.; Finney, Sara J.; Geske, Jenenne A.

2003-01-01

Tests a model of statistics performance based on achievement goal theory. Both learning and performance goals affected achievement indirectly through study strategies, self-efficacy, and test anxiety. Implications of these findings for teaching and learning statistics are discussed. (Contains 47 references, 3 tables, 3 figures, and 1 appendix.)…
Comparisons between physics-based, engineering, and statistical learning models for outdoor sound propagation.

PubMed

Hart, Carl R; Reznicek, Nathan J; Wilson, D Keith; Pettit, Chris L; Nykaza, Edward T

2016-05-01

Many outdoor sound propagation models exist, ranging from highly complex physics-based simulations to simplified engineering calculations, and more recently, highly flexible statistical learning methods. Several engineering and statistical learning models are evaluated by using a particular physics-based model, namely, a Crank-Nicholson parabolic equation (CNPE), as a benchmark. Narrowband transmission loss values predicted with the CNPE, based upon a simulated data set of meteorological, boundary, and source conditions, act as simulated observations. In the simulated data set sound propagation conditions span from downward refracting to upward refracting, for acoustically hard and soft boundaries, and low frequencies. Engineering models used in the comparisons include the ISO 9613-2 method, Harmonoise, and Nord2000 propagation models. Statistical learning methods used in the comparisons include bagged decision tree regression, random forest regression, boosting regression, and artificial neural network models. Computed skill scores are relative to sound propagation in a homogeneous atmosphere over a rigid ground. Overall skill scores for the engineering noise models are 0.6%, -7.1%, and 83.8% for the ISO 9613-2, Harmonoise, and Nord2000 models, respectively. Overall skill scores for the statistical learning models are 99.5%, 99.5%, 99.6%, and 99.6% for bagged decision tree, random forest, boosting, and artificial neural network regression models, respectively.
Interpolative modeling of GaAs FET S-parameter data bases for use in Monte Carlo simulations

NASA Technical Reports Server (NTRS)

Campbell, L.; Purviance, J.

1992-01-01

A statistical interpolation technique is presented for modeling GaAs FET S-parameter measurements for use in the statistical analysis and design of circuits. This is accomplished by interpolating among the measurements in a GaAs FET S-parameter data base in a statistically valid manner.
Moment-Based Physical Models of Broadband Clutter due to Aggregations of Fish

DTIC Science & Technology

2013-09-30

statistical models for signal-processing algorithm development. These in turn will help to develop a capability to statistically forecast the impact of...aggregations of fish based on higher-order statistical measures describable in terms of physical and system parameters. Environmentally , these models...processing. In this experiment, we had good ground truth on (1) and (2), and had control over (3) and (4) except for environmentally -imposed restrictions
A simulations approach for meta-analysis of genetic association studies based on additive genetic model.

PubMed

John, Majnu; Lencz, Todd; Malhotra, Anil K; Correll, Christoph U; Zhang, Jian-Ping

2018-06-01

Meta-analysis of genetic association studies is being increasingly used to assess phenotypic differences between genotype groups. When the underlying genetic model is assumed to be dominant or recessive, assessing the phenotype differences based on summary statistics, reported for individual studies in a meta-analysis, is a valid strategy. However, when the genetic model is additive, a similar strategy based on summary statistics will lead to biased results. This fact about the additive model is one of the things that we establish in this paper, using simulations. The main goal of this paper is to present an alternate strategy for the additive model based on simulating data for the individual studies. We show that the alternate strategy is far superior to the strategy based on summary statistics.
Testing the Predictive Power of Coulomb Stress on Aftershock Sequences

NASA Astrophysics Data System (ADS)

Woessner, J.; Lombardi, A.; Werner, M. J.; Marzocchi, W.

2009-12-01

Empirical and statistical models of clustered seismicity are usually strongly stochastic and perceived to be uninformative in their forecasts, since only marginal distributions are used, such as the Omori-Utsu and Gutenberg-Richter laws. In contrast, so-called physics-based aftershock models, based on seismic rate changes calculated from Coulomb stress changes and rate-and-state friction, make more specific predictions: anisotropic stress shadows and multiplicative rate changes. We test the predictive power of models based on Coulomb stress changes against statistical models, including the popular Short Term Earthquake Probabilities and Epidemic-Type Aftershock Sequences models: We score and compare retrospective forecasts on the aftershock sequences of the 1992 Landers, USA, the 1997 Colfiorito, Italy, and the 2008 Selfoss, Iceland, earthquakes. To quantify predictability, we use likelihood-based metrics that test the consistency of the forecasts with the data, including modified and existing tests used in prospective forecast experiments within the Collaboratory for the Study of Earthquake Predictability (CSEP). Our results indicate that a statistical model performs best. Moreover, two Coulomb model classes seem unable to compete: Models based on deterministic Coulomb stress changes calculated from a given fault-slip model, and those based on fixed receiver faults. One model of Coulomb stress changes does perform well and sometimes outperforms the statistical models, but its predictive information is diluted, because of uncertainties included in the fault-slip model. Our results suggest that models based on Coulomb stress changes need to incorporate stochastic features that represent model and data uncertainty.
An adaptive state of charge estimation approach for lithium-ion series-connected battery system

NASA Astrophysics Data System (ADS)

Peng, Simin; Zhu, Xuelai; Xing, Yinjiao; Shi, Hongbing; Cai, Xu; Pecht, Michael

2018-07-01

Due to the incorrect or unknown noise statistics of a battery system and its cell-to-cell variations, state of charge (SOC) estimation of a lithium-ion series-connected battery system is usually inaccurate or even divergent using model-based methods, such as extended Kalman filter (EKF) and unscented Kalman filter (UKF). To resolve this problem, an adaptive unscented Kalman filter (AUKF) based on a noise statistics estimator and a model parameter regulator is developed to accurately estimate the SOC of a series-connected battery system. An equivalent circuit model is first built based on the model parameter regulator that illustrates the influence of cell-to-cell variation on the battery system. A noise statistics estimator is then used to attain adaptively the estimated noise statistics for the AUKF when its prior noise statistics are not accurate or exactly Gaussian. The accuracy and effectiveness of the SOC estimation method is validated by comparing the developed AUKF and UKF when model and measurement statistics noises are inaccurate, respectively. Compared with the UKF and EKF, the developed method shows the highest SOC estimation accuracy.
A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

PubMed

Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

2013-01-01

As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

Standard and reduced radiation dose liver CT images: adaptive statistical iterative reconstruction versus model-based iterative reconstruction-comparison of findings and image quality.

PubMed

Shuman, William P; Chan, Keith T; Busey, Janet M; Mitsumori, Lee M; Choi, Eunice; Koprowicz, Kent M; Kanal, Kalpana M

2014-12-01

To investigate whether reduced radiation dose liver computed tomography (CT) images reconstructed with model-based iterative reconstruction ( MBIR model-based iterative reconstruction ) might compromise depiction of clinically relevant findings or might have decreased image quality when compared with clinical standard radiation dose CT images reconstructed with adaptive statistical iterative reconstruction ( ASIR adaptive statistical iterative reconstruction ). With institutional review board approval, informed consent, and HIPAA compliance, 50 patients (39 men, 11 women) were prospectively included who underwent liver CT. After a portal venous pass with ASIR adaptive statistical iterative reconstruction images, a 60% reduced radiation dose pass was added with MBIR model-based iterative reconstruction images. One reviewer scored ASIR adaptive statistical iterative reconstruction image quality and marked findings. Two additional independent reviewers noted whether marked findings were present on MBIR model-based iterative reconstruction images and assigned scores for relative conspicuity, spatial resolution, image noise, and image quality. Liver and aorta Hounsfield units and image noise were measured. Volume CT dose index and size-specific dose estimate ( SSDE size-specific dose estimate ) were recorded. Qualitative reviewer scores were summarized. Formal statistical inference for signal-to-noise ratio ( SNR signal-to-noise ratio ), contrast-to-noise ratio ( CNR contrast-to-noise ratio ), volume CT dose index, and SSDE size-specific dose estimate was made (paired t tests), with Bonferroni adjustment. Two independent reviewers identified all 136 ASIR adaptive statistical iterative reconstruction image findings (n = 272) on MBIR model-based iterative reconstruction images, scoring them as equal or better for conspicuity, spatial resolution, and image noise in 94.1% (256 of 272), 96.7% (263 of 272), and 99.3% (270 of 272), respectively. In 50 image sets, two reviewers (n = 100) scored overall image quality as sufficient or good with MBIR model-based iterative reconstruction in 99% (99 of 100). Liver SNR signal-to-noise ratio was significantly greater for MBIR model-based iterative reconstruction (10.8 ± 2.5 [standard deviation] vs 7.7 ± 1.4, P < .001); there was no difference for CNR contrast-to-noise ratio (2.5 ± 1.4 vs 2.4 ± 1.4, P = .45). For ASIR adaptive statistical iterative reconstruction and MBIR model-based iterative reconstruction , respectively, volume CT dose index was 15.2 mGy ± 7.6 versus 6.2 mGy ± 3.6; SSDE size-specific dose estimate was 16.4 mGy ± 6.6 versus 6.7 mGy ± 3.1 (P < .001). Liver CT images reconstructed with MBIR model-based iterative reconstruction may allow up to 59% radiation dose reduction compared with the dose with ASIR adaptive statistical iterative reconstruction , without compromising depiction of findings or image quality. © RSNA, 2014.
Forecasting runout of rock and debris avalanches

USGS Publications Warehouse

Iverson, Richard M.; Evans, S.G.; Mugnozza, G.S.; Strom, A.; Hermanns, R.L.

2006-01-01

Physically based mathematical models and statistically based empirical equations each may provide useful means of forecasting runout of rock and debris avalanches. This paper compares the foundations, strengths, and limitations of a physically based model and a statistically based forecasting method, both of which were developed to predict runout across three-dimensional topography. The chief advantage of the physically based model results from its ties to physical conservation laws and well-tested axioms of soil and rock mechanics, such as the Coulomb friction rule and effective-stress principle. The output of this model provides detailed information about the dynamics of avalanche runout, at the expense of high demands for accurate input data, numerical computation, and experimental testing. In comparison, the statistical method requires relatively modest computation and no input data except identification of prospective avalanche source areas and a range of postulated avalanche volumes. Like the physically based model, the statistical method yields maps of predicted runout, but it provides no information on runout dynamics. Although the two methods differ significantly in their structure and objectives, insights gained from one method can aid refinement of the other.
TOWARDS REFINED USE OF TOXICITY DATA IN STATISTICALLY BASED SAR MODELS FOR DEVELOPMENTAL TOXICITY.

EPA Science Inventory

In 2003, an International Life Sciences Institute (ILSI) Working Group examined the potential of statistically based structure-activity relationship (SAR) models for use in screening environmental contaminants for possible developmental toxicants.
Statistical tools for transgene copy number estimation based on real-time PCR.

PubMed

Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

2007-11-01

As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
The Effect on the 8th Grade Students' Attitude towards Statistics of Project Based Learning

ERIC Educational Resources Information Center

Koparan, Timur; Güven, Bülent

2014-01-01

This study investigates the effect of the project based learning approach on 8th grade students' attitude towards statistics. With this aim, an attitude scale towards statistics was developed. Quasi-experimental research model was used in this study. Following this model in the control group the traditional method was applied to teach statistics…
A new statistical approach to climate change detection and attribution

NASA Astrophysics Data System (ADS)

Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

2017-01-01

We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
Cognitive components underpinning the development of model-based learning.

PubMed

Potter, Tracey C S; Bryce, Nessa V; Hartley, Catherine A

2017-06-01

Reinforcement learning theory distinguishes "model-free" learning, which fosters reflexive repetition of previously rewarded actions, from "model-based" learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9-25, we examined whether the abilities to infer sequential regularities in the environment ("statistical learning"), maintain information in an active state ("working memory") and integrate distant concepts to solve problems ("fluid reasoning") predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
Bayesian models based on test statistics for multiple hypothesis testing problems.

PubMed

Ji, Yuan; Lu, Yiling; Mills, Gordon B

2008-04-01

We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Modified Likelihood-Based Item Fit Statistics for the Generalized Graded Unfolding Model

ERIC Educational Resources Information Center

Roberts, James S.

2008-01-01

Orlando and Thissen (2000) developed an item fit statistic for binary item response theory (IRT) models known as S-X[superscript 2]. This article generalizes their statistic to polytomous unfolding models. Four alternative formulations of S-X[superscript 2] are developed for the generalized graded unfolding model (GGUM). The GGUM is a…
Comparisons of non-Gaussian statistical models in DNA methylation analysis.

PubMed

Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-06-16

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.
Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

PubMed Central

Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

2014-01-01

As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687
Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

NASA Astrophysics Data System (ADS)

Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

2017-07-01

Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.
Cognitive Components Underpinning the Development of Model-Based Learning

PubMed Central

Potter, Tracey C.S.; Bryce, Nessa V.; Hartley, Catherine A.

2016-01-01

Reinforcement learning theory distinguishes “model-free” learning, which fosters reflexive repetition of previously rewarded actions, from “model-based” learning, which recruits a mental model of the environment to flexibly select goal-directed actions. Whereas model-free learning is evident across development, recruitment of model-based learning appears to increase with age. However, the cognitive processes underlying the development of model-based learning remain poorly characterized. Here, we examined whether age-related differences in cognitive processes underlying the construction and flexible recruitment of mental models predict developmental increases in model-based choice. In a cohort of participants aged 9–25, we examined whether the abilities to infer sequential regularities in the environment (“statistical learning”), maintain information in an active state (“working memory”) and integrate distant concepts to solve problems (“fluid reasoning”) predicted age-related improvements in model-based choice. We found that age-related improvements in statistical learning performance did not mediate the relationship between age and model-based choice. Ceiling performance on our working memory assay prevented examination of its contribution to model-based learning. However, age-related improvements in fluid reasoning statistically mediated the developmental increase in the recruitment of a model-based strategy. These findings suggest that gradual development of fluid reasoning may be a critical component process underlying the emergence of model-based learning. PMID:27825732
Development of the AFRL Aircrew Perfomance and Protection Data Bank

DTIC Science & Technology

2007-12-01

Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It
Comparing estimates of climate change impacts from process-based and statistical crop models

NASA Astrophysics Data System (ADS)

Lobell, David B.; Asseng, Senthold

2017-01-01

The potential impacts of climate change on crop productivity are of widespread interest to those concerned with addressing climate change and improving global food security. Two common approaches to assess these impacts are process-based simulation models, which attempt to represent key dynamic processes affecting crop yields, and statistical models, which estimate functional relationships between historical observations of weather and yields. Examples of both approaches are increasingly found in the scientific literature, although often published in different disciplinary journals. Here we compare published sensitivities to changes in temperature, precipitation, carbon dioxide (CO2), and ozone from each approach for the subset of crops, locations, and climate scenarios for which both have been applied. Despite a common perception that statistical models are more pessimistic, we find no systematic differences between the predicted sensitivities to warming from process-based and statistical models up to +2 °C, with limited evidence at higher levels of warming. For precipitation, there are many reasons why estimates could be expected to differ, but few estimates exist to develop robust comparisons, and precipitation changes are rarely the dominant factor for predicting impacts given the prominent role of temperature, CO2, and ozone changes. A common difference between process-based and statistical studies is that the former tend to include the effects of CO2 increases that accompany warming, whereas statistical models typically do not. Major needs moving forward include incorporating CO2 effects into statistical studies, improving both approaches’ treatment of ozone, and increasing the use of both methods within the same study. At the same time, those who fund or use crop model projections should understand that in the short-term, both approaches when done well are likely to provide similar estimates of warming impacts, with statistical models generally requiring fewer resources to produce robust estimates, especially when applied to crops beyond the major grains.
Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

PubMed

Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

2017-01-01

This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.
Developing Statistical Knowledge for Teaching during Design-Based Research

ERIC Educational Resources Information Center

Groth, Randall E.

2017-01-01

Statistical knowledge for teaching is not precisely equivalent to statistics subject matter knowledge. Teachers must know how to make statistics understandable to others as well as understand the subject matter themselves. This dual demand on teachers calls for the development of viable teacher education models. This paper offers one such model,…
How Statisticians Speak Risk

DOE Office of Scientific and Technical Information (OSTI.GOV)

Redus, K.S.

2007-07-01

The foundation of statistics deals with (a) how to measure and collect data and (b) how to identify models using estimates of statistical parameters derived from the data. Risk is a term used by the statistical community and those that employ statistics to express the results of a statistically based study. Statistical risk is represented as a probability that, for example, a statistical model is sufficient to describe a data set; but, risk is also interpreted as a measure of worth of one alternative when compared to another. The common thread of any risk-based problem is the combination of (a)more » the chance an event will occur, with (b) the value of the event. This paper presents an introduction to, and some examples of, statistical risk-based decision making from a quantitative, visual, and linguistic sense. This should help in understanding areas of radioactive waste management that can be suitably expressed using statistical risk and vice-versa. (authors)« less
Tree injury and mortality in fires: developing process-based models

Treesearch

Bret W. Butler; Matthew B. Dickinson

2010-01-01

Wildland fire managers are often required to predict tree injury and mortality when planning a prescribed burn or when considering wildfire management options; and, currently, statistical models based on post-fire observations are the only tools available for this purpose. Implicit in the derivation of statistical models is the assumption that they are strictly...
Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

USGS Publications Warehouse

Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

2016-01-01

Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

A cloud and radiation model-based algorithm for rainfall retrieval from SSM/I multispectral microwave measurements

NASA Technical Reports Server (NTRS)

Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.

1992-01-01

A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.
Monte Carlo based statistical power analysis for mediation models: methods and software.

PubMed

Zhang, Zhiyong

2014-12-01

The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
Functional Status Outperforms Comorbidities as a Predictor of 30-Day Acute Care Readmissions in the Inpatient Rehabilitation Population.

PubMed

Shih, Shirley L; Zafonte, Ross; Bates, David W; Gerrard, Paul; Goldstein, Richard; Mix, Jacqueline; Niewczyk, Paulette; Greysen, S Ryan; Kazis, Lewis; Ryan, Colleen M; Schneider, Jeffrey C

2016-10-01

Functional status is associated with patient outcomes, but is rarely included in hospital readmission risk models. The objective of this study was to determine whether functional status is a better predictor of 30-day acute care readmission than traditionally investigated variables including demographics and comorbidities. Retrospective database analysis between 2002 and 2011. 1158 US inpatient rehabilitation facilities. 4,199,002 inpatient rehabilitation facility admissions comprising patients from 16 impairment groups within the Uniform Data System for Medical Rehabilitation database. Logistic regression models predicting 30-day readmission were developed based on age, gender, comorbidities (Elixhauser comorbidity index, Deyo-Charlson comorbidity index, and Medicare comorbidity tier system), and functional status [Functional Independence Measure (FIM)]. We hypothesized that (1) function-based models would outperform demographic- and comorbidity-based models and (2) the addition of demographic and comorbidity data would not significantly enhance function-based models. For each impairment group, Function Only Models were compared against Demographic-Comorbidity Models and Function Plus Models (Function-Demographic-Comorbidity Models). The primary outcome was 30-day readmission, and the primary measure of model performance was the c-statistic. All-cause 30-day readmission rate from inpatient rehabilitation facilities to acute care hospitals was 9.87%. C-statistics for the Function Only Models were 0.64 to 0.70. For all 16 impairment groups, the Function Only Model demonstrated better c-statistics than the Demographic-Comorbidity Models (c-statistic difference: 0.03-0.12). The best-performing Function Plus Models exhibited negligible improvements in model performance compared to Function Only Models, with c-statistic improvements of only 0.01 to 0.05. Readmissions are currently used as a marker of hospital performance, with recent financial penalties to hospitals for excessive readmissions. Function-based readmission models outperform models based only on demographics and comorbidities. Readmission risk models would benefit from the inclusion of functional status as a primary predictor. Copyright © 2016 AMDA – The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.
Supervised variational model with statistical inference and its application in medical image segmentation.

PubMed

Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David

2015-01-01

Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.
A Weibull statistics-based lignocellulose saccharification model and a built-in parameter accurately predict lignocellulose hydrolysis performance.

PubMed

Wang, Mingyu; Han, Lijuan; Liu, Shasha; Zhao, Xuebing; Yang, Jinghua; Loh, Soh Kheang; Sun, Xiaomin; Zhang, Chenxi; Fang, Xu

2015-09-01

Renewable energy from lignocellulosic biomass has been deemed an alternative to depleting fossil fuels. In order to improve this technology, we aim to develop robust mathematical models for the enzymatic lignocellulose degradation process. By analyzing 96 groups of previously published and newly obtained lignocellulose saccharification results and fitting them to Weibull distribution, we discovered Weibull statistics can accurately predict lignocellulose saccharification data, regardless of the type of substrates, enzymes and saccharification conditions. A mathematical model for enzymatic lignocellulose degradation was subsequently constructed based on Weibull statistics. Further analysis of the mathematical structure of the model and experimental saccharification data showed the significance of the two parameters in this model. In particular, the λ value, defined the characteristic time, represents the overall performance of the saccharification system. This suggestion was further supported by statistical analysis of experimental saccharification data and analysis of the glucose production levels when λ and n values change. In conclusion, the constructed Weibull statistics-based model can accurately predict lignocellulose hydrolysis behavior and we can use the λ parameter to assess the overall performance of enzymatic lignocellulose degradation. Advantages and potential applications of the model and the λ value in saccharification performance assessment were discussed. Copyright © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Improvements to an earth observing statistical performance model with applications to LWIR spectral variability

NASA Astrophysics Data System (ADS)

Zhao, Runchen; Ientilucci, Emmett J.

2017-05-01

Hyperspectral remote sensing systems provide spectral data composed of hundreds of narrow spectral bands. Spectral remote sensing systems can be used to identify targets, for example, without physical interaction. Often it is of interested to characterize the spectral variability of targets or objects. The purpose of this paper is to identify and characterize the LWIR spectral variability of targets based on an improved earth observing statistical performance model, known as the Forecasting and Analysis of Spectroradiometric System Performance (FASSP) model. FASSP contains three basic modules including a scene model, sensor model and a processing model. Instead of using mean surface reflectance only as input to the model, FASSP transfers user defined statistical characteristics of a scene through the image chain (i.e., from source to sensor). The radiative transfer model, MODTRAN, is used to simulate the radiative transfer based on user defined atmospheric parameters. To retrieve class emissivity and temperature statistics, or temperature / emissivity separation (TES), a LWIR atmospheric compensation method is necessary. The FASSP model has a method to transform statistics in the visible (ie., ELM) but currently does not have LWIR TES algorithm in place. This paper addresses the implementation of such a TES algorithm and its associated transformation of statistics.
Augmenting Latent Dirichlet Allocation and Rank Threshold Detection with Ontologies

DTIC Science & Technology

2010-03-01

Probabilistic Latent Semantic Indexing (PLSI) is an automated indexing information retrieval model [20]. It is based on a statistical latent class model which is...uses a statistical foundation that is more accurate in finding hidden semantic relationships [20]. The model uses factor analysis of count data, number...principle of statistical infer- ence which asserts that all of the information in a sample is contained in the likelihood function [20]. The statistical
Gene-Based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions.

PubMed

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y; Chen, Wei

2016-02-01

Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, here we develop Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT), which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. © 2016 WILEY PERIODICALS, INC.
Gene-based Association Analysis for Censored Traits Via Fixed Effect Functional Regressions

PubMed Central

Fan, Ruzong; Wang, Yifan; Yan, Qi; Ding, Ying; Weeks, Daniel E.; Lu, Zhaohui; Ren, Haobo; Cook, Richard J; Xiong, Momiao; Swaroop, Anand; Chew, Emily Y.; Chen, Wei

2015-01-01

Summary Genetic studies of survival outcomes have been proposed and conducted recently, but statistical methods for identifying genetic variants that affect disease progression are rarely developed. Motivated by our ongoing real studies, we develop here Cox proportional hazard models using functional regression (FR) to perform gene-based association analysis of survival traits while adjusting for covariates. The proposed Cox models are fixed effect models where the genetic effects of multiple genetic variants are assumed to be fixed. We introduce likelihood ratio test (LRT) statistics to test for associations between the survival traits and multiple genetic variants in a genetic region. Extensive simulation studies demonstrate that the proposed Cox RF LRT statistics have well-controlled type I error rates. To evaluate power, we compare the Cox FR LRT with the previously developed burden test (BT) in a Cox model and sequence kernel association test (SKAT) which is based on mixed effect Cox models. The Cox FR LRT statistics have higher power than or similar power as Cox SKAT LRT except when 50%/50% causal variants had negative/positive effects and all causal variants are rare. In addition, the Cox FR LRT statistics have higher power than Cox BT LRT. The models and related test statistics can be useful in the whole genome and whole exome association studies. An age-related macular degeneration dataset was analyzed as an example. PMID:26782979
The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

PubMed

Fukuda, Haruhisa; Kuroki, Manabu

2016-03-01

To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.
In defence of model-based inference in phylogeography

PubMed Central

Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent

2017-01-01

Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

NASA Astrophysics Data System (ADS)

Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

2017-09-01

In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.
Linking Statistically- and Physically-Based Models for Improved Streamflow Simulation in Gaged and Ungaged Areas

NASA Astrophysics Data System (ADS)

Lafontaine, J.; Hay, L.; Archfield, S. A.; Farmer, W. H.; Kiang, J. E.

2014-12-01

The U.S. Geological Survey (USGS) has developed a National Hydrologic Model (NHM) to support coordinated, comprehensive and consistent hydrologic model development, and facilitate the application of hydrologic simulations within the continental US. The portion of the NHM located within the Gulf Coastal Plains and Ozarks Landscape Conservation Cooperative (GCPO LCC) is being used to test the feasibility of improving streamflow simulations in gaged and ungaged watersheds by linking statistically- and physically-based hydrologic models. The GCPO LCC covers part or all of 12 states and 5 sub-geographies, totaling approximately 726,000 km2, and is centered on the lower Mississippi Alluvial Valley. A total of 346 USGS streamgages in the GCPO LCC region were selected to evaluate the performance of this new calibration methodology for the period 1980 to 2013. Initially, the physically-based models are calibrated to measured streamflow data to provide a baseline for comparison. An enhanced calibration procedure then is used to calibrate the physically-based models in the gaged and ungaged areas of the GCPO LCC using statistically-based estimates of streamflow. For this application, the calibration procedure is adjusted to address the limitations of the statistically generated time series to reproduce measured streamflow in gaged basins, primarily by incorporating error and bias estimates. As part of this effort, estimates of uncertainty in the model simulations are also computed for the gaged and ungaged watersheds.
A statistical parts-based appearance model of inter-subject variability.

PubMed

Toews, Matthew; Collins, D Louis; Arbel, Tal

2006-01-01

In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.
Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment

PubMed Central

Hashim, Mazlan

2015-01-01

This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning. PMID:25898919
Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment.

PubMed

Shahabi, Himan; Hashim, Mazlan

2015-04-22

This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.
Using the Expectancy Value Model of Motivation to Understand the Relationship between Student Attitudes and Achievement in Statistics

ERIC Educational Resources Information Center

Hood, Michelle; Creed, Peter A.; Neumann, David L.

2012-01-01

We tested a model of the relationship between attitudes toward statistics and achievement based on Eccles' Expectancy Value Model (1983). Participants (n = 149; 83% female) were second-year Australian university students in a psychology statistics course (mean age = 23.36 years, SD = 7.94 years). We obtained demographic details, past performance,…
A Unified Statistical Rain-Attenuation Model for Communication Link Fade Predictions and Optimal Stochastic Fade Control Design Using a Location-Dependent Rain-Statistic Database

NASA Technical Reports Server (NTRS)

Manning, Robert M.

1990-01-01

A static and dynamic rain-attenuation model is presented which describes the statistics of attenuation on an arbitrarily specified satellite link for any location for which there are long-term rainfall statistics. The model may be used in the design of the optimal stochastic control algorithms to mitigate the effects of attenuation and maintain link reliability. A rain-statistics data base is compiled, which makes it possible to apply the model to any location in the continental U.S. with a resolution of 0-5 degrees in latitude and longitude. The model predictions are compared with experimental observations, showing good agreement.
75 FR 72611 - Assessments, Large Bank Pricing

Federal Register 2010, 2011, 2012, 2013, 2014

2010-11-24

... the worst risk ranking and are included in the statistical analysis. Appendix 1 to the NPR describes the statistical analysis in detail. \\12\\ The percentage approximated by factors is based on the statistical model for that particual year. Actual weights assigned to each scorecard measure are largely based...
Unified risk analysis of fatigue failure in ductile alloy components during all three stages of fatigue crack evolution process.

PubMed

Patankar, Ravindra

2003-10-01

Statistical fatigue life of a ductile alloy specimen is traditionally divided into three stages, namely, crack nucleation, small crack growth, and large crack growth. Crack nucleation and small crack growth show a wide variation and hence a big spread on cycles versus crack length graph. Relatively, large crack growth shows a lesser variation. Therefore, different models are fitted to the different stages of the fatigue evolution process, thus treating different stages as different phenomena. With these independent models, it is impossible to predict one phenomenon based on the information available about the other phenomenon. Experimentally, it is easier to carry out crack length measurements of large cracks compared to nucleating cracks and small cracks. Thus, it is easier to collect statistical data for large crack growth compared to the painstaking effort it would take to collect statistical data for crack nucleation and small crack growth. This article presents a fracture mechanics-based stochastic model of fatigue crack growth in ductile alloys that are commonly encountered in mechanical structures and machine components. The model has been validated by Ray (1998) for crack propagation by various statistical fatigue data. Based on the model, this article proposes a technique to predict statistical information of fatigue crack nucleation and small crack growth properties that uses the statistical properties of large crack growth under constant amplitude stress excitation. The statistical properties of large crack growth under constant amplitude stress excitation can be obtained via experiments.

ModelTest Server: a web-based tool for the statistical selection of models of nucleotide substitution online

PubMed Central

Posada, David

2006-01-01

ModelTest server is a web-based application for the selection of models of nucleotide substitution using the program ModelTest. The server takes as input a text file with likelihood scores for the set of candidate models. Models can be selected with hierarchical likelihood ratio tests, or with the Akaike or Bayesian information criteria. The output includes several statistics for the assessment of model selection uncertainty, for model averaging or to estimate the relative importance of model parameters. The server can be accessed at . PMID:16845102
Variability-aware compact modeling and statistical circuit validation on SRAM test array

NASA Astrophysics Data System (ADS)

Qiao, Ying; Spanos, Costas J.

2016-03-01

Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose a variability-aware compact model characterization methodology based on stepwise parameter selection. Transistor I-V measurements are obtained from bit transistor accessible SRAM test array fabricated using a collaborating foundry's 28nm FDSOI technology. Our in-house customized Monte Carlo simulation bench can incorporate these statistical compact models; and simulation results on SRAM writability performance are very close to measurements in distribution estimation. Our proposed statistical compact model parameter extraction methodology also has the potential of predicting non-Gaussian behavior in statistical circuit performances through mixtures of Gaussian distributions.
A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

PubMed Central

Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

2013-01-01

For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809
A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design.

PubMed

Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

2013-01-01

For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.
Toward statistical modeling of saccadic eye-movement and visual saliency.

PubMed

Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

2014-11-01

In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.
Texture and haptic cues in slant discrimination: reliability-based cue weighting without statistically optimal cue combination

NASA Astrophysics Data System (ADS)

Rosas, Pedro; Wagemans, Johan; Ernst, Marc O.; Wichmann, Felix A.

2005-05-01

A number of models of depth-cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum-variance unbiased estimator that can be constructed from the available information. Here we test such models by using visual and haptic depth information. Different texture types produce differences in slant-discrimination performance, thus providing a means for testing a reliability-sensitive cue-combination model with texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability but fell short of statistically optimal combination - we find reliability-based reweighting but not statistically optimal cue combination.
Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

EPA Science Inventory

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...
Evaluating model accuracy for model-based reasoning

NASA Technical Reports Server (NTRS)

Chien, Steve; Roden, Joseph

1992-01-01

Described here is an approach to automatically assessing the accuracy of various components of a model. In this approach, actual data from the operation of a target system is used to drive statistical measures to evaluate the prediction accuracy of various portions of the model. We describe how these statistical measures of model accuracy can be used in model-based reasoning for monitoring and design. We then describe the application of these techniques to the monitoring and design of the water recovery system of the Environmental Control and Life Support System (ECLSS) of Space Station Freedom.
Comparison between two statistically based methods, and two physically based models developed to compute daily mean streamflow at ungaged locations in the Cedar River Basin, Iowa

USGS Publications Warehouse

Linhart, S. Mike; Nania, Jon F.; Christiansen, Daniel E.; Hutchinson, Kasey J.; Sanders, Curtis L.; Archfield, Stacey A.

2013-01-01

A variety of individuals from water resource managers to recreational users need streamflow information for planning and decisionmaking at locations where there are no streamgages. To address this problem, two statistically based methods, the Flow Duration Curve Transfer method and the Flow Anywhere method, were developed for statewide application and the two physically based models, the Precipitation Runoff Modeling-System and the Soil and Water Assessment Tool, were only developed for application for the Cedar River Basin. Observed and estimated streamflows for the two methods and models were compared for goodness of fit at 13 streamgages modeled in the Cedar River Basin by using the Nash-Sutcliffe and the percent-bias efficiency values. Based on median and mean Nash-Sutcliffe values for the 13 streamgages the Precipitation Runoff Modeling-System and Soil and Water Assessment Tool models appear to have performed similarly and better than Flow Duration Curve Transfer and Flow Anywhere methods. Based on median and mean percent bias values, the Soil and Water Assessment Tool model appears to have generally overestimated daily mean streamflows, whereas the Precipitation Runoff Modeling-System model and statistical methods appear to have underestimated daily mean streamflows. The Flow Duration Curve Transfer method produced the lowest median and mean percent bias values and appears to perform better than the other models.
Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

NASA Astrophysics Data System (ADS)

Nearing, G. S.

2014-12-01

Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.
The Importance of Statistical Modeling in Data Analysis and Inference

ERIC Educational Resources Information Center

Rollins, Derrick, Sr.

2017-01-01

Statistical inference simply means to draw a conclusion based on information that comes from data. Error bars are the most commonly used tool for data analysis and inference in chemical engineering data studies. This work demonstrates, using common types of data collection studies, the importance of specifying the statistical model for sound…
Teaching Engineering Statistics with Technology, Group Learning, Contextual Projects, Simulation Models and Student Presentations

ERIC Educational Resources Information Center

Romeu, Jorge Luis

2008-01-01

This article discusses our teaching approach in graduate level Engineering Statistics. It is based on the use of modern technology, learning groups, contextual projects, simulation models, and statistical and simulation software to entice student motivation. The use of technology to facilitate group projects and presentations, and to generate,…
Segmenting lung fields in serial chest radiographs using both population-based and patient-specific shape statistics.

PubMed

Shi, Y; Qi, F; Xue, Z; Chen, L; Ito, K; Matsuo, H; Shen, D

2008-04-01

This paper presents a new deformable model using both population-based and patient-specific shape statistics to segment lung fields from serial chest radiographs. There are two novelties in the proposed deformable model. First, a modified scale invariant feature transform (SIFT) local descriptor, which is more distinctive than the general intensity and gradient features, is used to characterize the image features in the vicinity of each pixel. Second, the deformable contour is constrained by both population-based and patient-specific shape statistics, and it yields more robust and accurate segmentation of lung fields for serial chest radiographs. In particular, for segmenting the initial time-point images, the population-based shape statistics is used to constrain the deformable contour; as more subsequent images of the same patient are acquired, the patient-specific shape statistics online collected from the previous segmentation results gradually takes more roles. Thus, this patient-specific shape statistics is updated each time when a new segmentation result is obtained, and it is further used to refine the segmentation results of all the available time-point images. Experimental results show that the proposed method is more robust and accurate than other active shape models in segmenting the lung fields from serial chest radiographs.
Relevance of the c-statistic when evaluating risk-adjustment models in surgery.

PubMed

Merkow, Ryan P; Hall, Bruce L; Cohen, Mark E; Dimick, Justin B; Wang, Edward; Chow, Warren B; Ko, Clifford Y; Bilimoria, Karl Y

2012-05-01

The measurement of hospital quality based on outcomes requires risk adjustment. The c-statistic is a popular tool used to judge model performance, but can be limited, particularly when evaluating specific operations in focused populations. Our objectives were to examine the interpretation and relevance of the c-statistic when used in models with increasingly similar case mix and to consider an alternative perspective on model calibration based on a graphical depiction of model fit. From the American College of Surgeons National Surgical Quality Improvement Program (2008-2009), patients were identified who underwent a general surgery procedure, and procedure groups were increasingly restricted: colorectal-all, colorectal-elective cases only, and colorectal-elective cancer cases only. Mortality and serious morbidity outcomes were evaluated using logistic regression-based risk adjustment, and model c-statistics and calibration curves were used to compare model performance. During the study period, 323,427 general, 47,605 colorectal-all, 39,860 colorectal-elective, and 21,680 colorectal cancer patients were studied. Mortality ranged from 1.0% in general surgery to 4.1% in the colorectal-all group, and serious morbidity ranged from 3.9% in general surgery to 12.4% in the colorectal-all procedural group. As case mix was restricted, c-statistics progressively declined from the general to the colorectal cancer surgery cohorts for both mortality and serious morbidity (mortality: 0.949 to 0.866; serious morbidity: 0.861 to 0.668). Calibration was evaluated graphically by examining predicted vs observed number of events over risk deciles. For both mortality and serious morbidity, there was no qualitative difference in calibration identified between the procedure groups. In the present study, we demonstrate how the c-statistic can become less informative and, in certain circumstances, can lead to incorrect model-based conclusions, as case mix is restricted and patients become more homogenous. Although it remains an important tool, caution is advised when the c-statistic is advanced as the sole measure of a model performance. Copyright © 2012 American College of Surgeons. All rights reserved.
Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model.

PubMed

Austin, Peter C

2018-01-01

The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.
Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model

PubMed Central

Austin, Peter C.

2017-01-01

The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest. PMID:29321694
Statistical learning and probabilistic prediction in music cognition: mechanisms of stylistic enculturation.

PubMed

Pearce, Marcus T

2018-05-11

Music perception depends on internal psychological models derived through exposure to a musical culture. It is hypothesized that this musical enculturation depends on two cognitive processes: (1) statistical learning, in which listeners acquire internal cognitive models of statistical regularities present in the music to which they are exposed; and (2) probabilistic prediction based on these learned models that enables listeners to organize and process their mental representations of music. To corroborate these hypotheses, I review research that uses a computational model of probabilistic prediction based on statistical learning (the information dynamics of music (IDyOM) model) to simulate data from empirical studies of human listeners. The results show that a broad range of psychological processes involved in music perception-expectation, emotion, memory, similarity, segmentation, and meter-can be understood in terms of a single, underlying process of probabilistic prediction using learned statistical models. Furthermore, IDyOM simulations of listeners from different musical cultures demonstrate that statistical learning can plausibly predict causal effects of differential cultural exposure to musical styles, providing a quantitative model of cultural distance. Understanding the neural basis of musical enculturation will benefit from close coordination between empirical neuroimaging and computational modeling of underlying mechanisms, as outlined here. © 2018 The Authors. Annals of the New York Academy of Sciences published by Wiley Periodicals, Inc. on behalf of New York Academy of Sciences.
Neural Systems with Numerically Matched Input-Output Statistic: Isotonic Bivariate Statistical Modeling

PubMed Central

Fiori, Simone

2007-01-01

Bivariate statistical modeling from incomplete data is a useful statistical tool that allows to discover the model underlying two data sets when the data in the two sets do not correspond in size nor in ordering. Such situation may occur when the sizes of the two data sets do not match (i.e., there are “holes” in the data) or when the data sets have been acquired independently. Also, statistical modeling is useful when the amount of available data is enough to show relevant statistical features of the phenomenon underlying the data. We propose to tackle the problem of statistical modeling via a neural (nonlinear) system that is able to match its input-output statistic to the statistic of the available data sets. A key point of the new implementation proposed here is that it is based on look-up-table (LUT) neural systems, which guarantee a computationally advantageous way of implementing neural systems. A number of numerical experiments, performed on both synthetic and real-world data sets, illustrate the features of the proposed modeling procedure. PMID:18566641
Representing Micro-Macro Linkages by Actor-Based Dynamic Network Models

ERIC Educational Resources Information Center

Snijders, Tom A. B.; Steglich, Christian E. G.

2015-01-01

Stochastic actor-based models for network dynamics have the primary aim of statistical inference about processes of network change, but may be regarded as a kind of agent-based models. Similar to many other agent-based models, they are based on local rules for actor behavior. Different from many other agent-based models, by including elements of…
Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students.

PubMed

Saadati, Farzaneh; Ahmad Tarmizi, Rohani; Mohd Ayub, Ahmad Fauzi; Abu Bakar, Kamariah

2015-01-01

Because students' ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is 'value added' because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students' problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students.

Interpretation of the results of statistical measurements. [search for basic probability model

NASA Technical Reports Server (NTRS)

Olshevskiy, V. V.

1973-01-01

For random processes, the calculated probability characteristic, and the measured statistical estimate are used in a quality functional, which defines the difference between the two functions. Based on the assumption that the statistical measurement procedure is organized so that the parameters for a selected model are optimized, it is shown that the interpretation of experimental research is a search for a basic probability model.
Estimating regional plant biodiversity with GIS modelling

Treesearch

Louis R. Iverson; Anantha M. Prasad; Anantha M. Prasad

1998-01-01

We analyzed a statewide species database together with a county-level geographic information system to build a model based on well-surveyed areas to estimate species richness in less surveyed counties. The model involved GIS (Arc/Info) and statistics (S-PLUS), including spatial statistics (S+SpatialStats).
Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

NASA Astrophysics Data System (ADS)

Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

2017-12-01

Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.
Nonlinear wave chaos: statistics of second harmonic fields.

PubMed

Zhou, Min; Ott, Edward; Antonsen, Thomas M; Anlage, Steven M

2017-10-01

Concepts from the field of wave chaos have been shown to successfully predict the statistical properties of linear electromagnetic fields in electrically large enclosures. The Random Coupling Model (RCM) describes these properties by incorporating both universal features described by Random Matrix Theory and the system-specific features of particular system realizations. In an effort to extend this approach to the nonlinear domain, we add an active nonlinear frequency-doubling circuit to an otherwise linear wave chaotic system, and we measure the statistical properties of the resulting second harmonic fields. We develop an RCM-based model of this system as two linear chaotic cavities coupled by means of a nonlinear transfer function. The harmonic field strengths are predicted to be the product of two statistical quantities and the nonlinearity characteristics. Statistical results from measurement-based calculation, RCM-based simulation, and direct experimental measurements are compared and show good agreement over many decades of power.
Factors Influencing the Behavioural Intention to Use Statistical Software: The Perspective of the Slovenian Students of Social Sciences

ERIC Educational Resources Information Center

Brezavšcek, Alenka; Šparl, Petra; Žnidaršic, Anja

2017-01-01

The aim of the paper is to investigate the main factors influencing the adoption and continuous utilization of statistical software among university social sciences students in Slovenia. Based on the Technology Acceptance Model (TAM), a conceptual model was derived where five external variables were taken into account: statistical software…
Statistical colour models: an automated digital image analysis method for quantification of histological biomarkers.

PubMed

Shu, Jie; Dolman, G E; Duan, Jiang; Qiu, Guoping; Ilyas, Mohammad

2016-04-27

Colour is the most important feature used in quantitative immunohistochemistry (IHC) image analysis; IHC is used to provide information relating to aetiology and to confirm malignancy. Statistical modelling is a technique widely used for colour detection in computer vision. We have developed a statistical model of colour detection applicable to detection of stain colour in digital IHC images. Model was first trained by massive colour pixels collected semi-automatically. To speed up the training and detection processes, we removed luminance channel, Y channel of YCbCr colour space and chose 128 histogram bins which is the optimal number. A maximum likelihood classifier is used to classify pixels in digital slides into positively or negatively stained pixels automatically. The model-based tool was developed within ImageJ to quantify targets identified using IHC and histochemistry. The purpose of evaluation was to compare the computer model with human evaluation. Several large datasets were prepared and obtained from human oesophageal cancer, colon cancer and liver cirrhosis with different colour stains. Experimental results have demonstrated the model-based tool achieves more accurate results than colour deconvolution and CMYK model in the detection of brown colour, and is comparable to colour deconvolution in the detection of pink colour. We have also demostrated the proposed model has little inter-dataset variations. A robust and effective statistical model is introduced in this paper. The model-based interactive tool in ImageJ, which can create a visual representation of the statistical model and detect a specified colour automatically, is easy to use and available freely at http://rsb.info.nih.gov/ij/plugins/ihc-toolbox/index.html . Testing to the tool by different users showed only minor inter-observer variations in results.
Statistical Methods for Assessments in Simulations and Serious Games. Research Report. ETS RR-14-12

ERIC Educational Resources Information Center

Fu, Jianbin; Zapata, Diego; Mavronikolas, Elia

2014-01-01

Simulation or game-based assessments produce outcome data and process data. In this article, some statistical models that can potentially be used to analyze data from simulation or game-based assessments are introduced. Specifically, cognitive diagnostic models that can be used to estimate latent skills from outcome data so as to scale these…
A hierarchical fire frequency model to simulate temporal patterns of fire regimes in LANDIS

Treesearch

Jian Yang; Hong S. He; Eric J. Gustafson

2004-01-01

Fire disturbance has important ecological effects in many forest landscapes. Existing statistically based approaches can be used to examine the effects of a fire regime on forest landscape dynamics. Most examples of statistically based fire models divide a fire occurrence into two stages--fire ignition and fire initiation. However, the exponential and Weibull fire-...
TOWARDS REFINED USE OF TOXICITY DATA IN ...

EPA Pesticide Factsheets

In 2003, an International Life Sciences Institute (ILSI) Working Group examined the potential of statistically based structure-activity relationship (SAR) models for use in screening environmental contaminants for possible developmental toxicants. In 2003, an International Life Sciences Institute (ILSI) Working Group examined the potential of statistically based structure-activity relationship (SAR) models for use in screening environmental contaminants for possible developmental toxicants.
Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

NASA Astrophysics Data System (ADS)

Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

2017-04-01

Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a pixel spacing of 40 meters near Prydz Bay area, East Antarctica. Main work is listed as follows: 1) A mixture statistical distribution based CRF algorithm has been developed for leads detection from Sentinel-1A dual polarization images. 2) The assessment of the proposed mixture statistical distribution based CRF method and single distribution based CRF algorithm has been presented. 3) The preferable parameters sets including statistical distributions, the aspect ratio threshold and spatial smoothing window size have been provided. In the future, the proposed algorithm will be developed for the operational Sentinel series data sets processing due to its less time consuming cost and high accuracy in leads detection.
Two Paradoxes in Linear Regression Analysis.

PubMed

Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong

2016-12-25

Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
Assessment of credit risk based on fuzzy relations

NASA Astrophysics Data System (ADS)

Tsabadze, Teimuraz

2017-06-01

The purpose of this paper is to develop a new approach for an assessment of the credit risk to corporate borrowers. There are different models for borrowers' risk assessment. These models are divided into two groups: statistical and theoretical. When assessing the credit risk for corporate borrowers, statistical model is unacceptable due to the lack of sufficiently large history of defaults. At the same time, we cannot use some theoretical models due to the lack of stock exchange. In those cases, when studying a particular borrower given that statistical base does not exist, the decision-making process is always of expert nature. The paper describes a new approach that may be used in group decision-making. An example of the application of the proposed approach is given.
Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos

NASA Astrophysics Data System (ADS)

Ganz, Melanie; Nielsen, Mads; Brandt, Sami

We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.
Assessment of corneal properties based on statistical modeling of OCT speckle.

PubMed

Jesus, Danilo A; Iskander, D Robert

2017-01-01

A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.
Metrological traceability in education: A practical online system for measuring and managing middle school mathematics instruction

NASA Astrophysics Data System (ADS)

Torres Irribarra, D.; Freund, R.; Fisher, W.; Wilson, M.

2015-02-01

Computer-based, online assessments modelled, designed, and evaluated for adaptively administered invariant measurement are uniquely suited to defining and maintaining traceability to standardized units in education. An assessment of this kind is embedded in the Assessing Data Modeling and Statistical Reasoning (ADM) middle school mathematics curriculum. Diagnostic information about middle school students' learning of statistics and modeling is provided via computer-based formative assessments for seven constructs that comprise a learning progression for statistics and modeling from late elementary through the middle school grades. The seven constructs are: Data Display, Meta-Representational Competence, Conceptions of Statistics, Chance, Modeling Variability, Theory of Measurement, and Informal Inference. The end product is a web-delivered system built with Ruby on Rails for use by curriculum development teams working with classroom teachers in designing, developing, and delivering formative assessments. The online accessible system allows teachers to accurately diagnose students' unique comprehension and learning needs in a common language of real-time assessment, logging, analysis, feedback, and reporting.
Performance of Bootstrapping Approaches To Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling.

ERIC Educational Resources Information Center

Nevitt, Jonathan; Hancock, Gregory R.

2001-01-01

Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…
Modelling Complexity: Making Sense of Leadership Issues in 14-19 Education

ERIC Educational Resources Information Center

Briggs, Ann R. J.

2008-01-01

Modelling of statistical data is a well established analytical strategy. Statistical data can be modelled to represent, and thereby predict, the forces acting upon a structure or system. For the rapidly changing systems in the world of education, modelling enables the researcher to understand, to predict and to enable decisions to be based upon…
Testing prediction methods: Earthquake clustering versus the Poisson model

USGS Publications Warehouse

Michael, A.J.

1997-01-01

Testing earthquake prediction methods requires statistical techniques that compare observed success to random chance. One technique is to produce simulated earthquake catalogs and measure the relative success of predicting real and simulated earthquakes. The accuracy of these tests depends on the validity of the statistical model used to simulate the earthquakes. This study tests the effect of clustering in the statistical earthquake model on the results. Three simulation models were used to produce significance levels for a VLF earthquake prediction method. As the degree of simulated clustering increases, the statistical significance drops. Hence, the use of a seismicity model with insufficient clustering can lead to overly optimistic results. A successful method must pass the statistical tests with a model that fully replicates the observed clustering. However, a method can be rejected based on tests with a model that contains insufficient clustering. U.S. copyright. Published in 1997 by the American Geophysical Union.
Stochastic Individual-Based Modeling of Bacterial Growth and Division Using Flow Cytometry.

PubMed

García, Míriam R; Vázquez, José A; Teixeira, Isabel G; Alonso, Antonio A

2017-01-01

A realistic description of the variability in bacterial growth and division is critical to produce reliable predictions of safety risks along the food chain. Individual-based modeling of bacteria provides the theoretical framework to deal with this variability, but it requires information about the individual behavior of bacteria inside populations. In this work, we overcome this problem by estimating the individual behavior of bacteria from population statistics obtained with flow cytometry. For this objective, a stochastic individual-based modeling framework is defined based on standard assumptions during division and exponential growth. The unknown single-cell parameters required for running the individual-based modeling simulations, such as cell size growth rate, are estimated from the flow cytometry data. Instead of using directly the individual-based model, we make use of a modified Fokker-Plank equation. This only equation simulates the population statistics in function of the unknown single-cell parameters. We test the validity of the approach by modeling the growth and division of Pediococcus acidilactici within the exponential phase. Estimations reveal the statistics of cell growth and division using only data from flow cytometry at a given time. From the relationship between the mother and daughter volumes, we also predict that P. acidilactici divide into two successive parallel planes.
A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

PubMed Central

Avalappampatty Sivasamy, Aneetha; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

PubMed

Sivasamy, Aneetha Avalappampatty; Sundan, Bose

2015-01-01

The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.

PubMed

Chatterjee, Snehamoy

2014-01-01

This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Medial-based deformable models in nonconvex shape-spaces for medical image segmentation.

PubMed

McIntosh, Chris; Hamarneh, Ghassan

2012-01-01

We explore the application of genetic algorithms (GA) to deformable models through the proposition of a novel method for medical image segmentation that combines GA with nonconvex, localized, medial-based shape statistics. We replace the more typical gradient descent optimizer used in deformable models with GA, and the convex, implicit, global shape statistics with nonconvex, explicit, localized ones. Specifically, we propose GA to reduce typical deformable model weaknesses pertaining to model initialization, pose estimation and local minima, through the simultaneous evolution of a large number of models. Furthermore, we constrain the evolution, and thus reduce the size of the search-space, by using statistically-based deformable models whose deformations are intuitive (stretch, bulge, bend) and are driven in terms of localized principal modes of variation, instead of modes of variation across the entire shape that often fail to capture localized shape changes. Although GA are not guaranteed to achieve the global optima, our method compares favorably to the prevalent optimization techniques, convex/nonconvex gradient-based optimizers and to globally optimal graph-theoretic combinatorial optimization techniques, when applied to the task of corpus callosum segmentation in 50 mid-sagittal brain magnetic resonance images.
An order statistics approach to the halo model for galaxies

NASA Astrophysics Data System (ADS)

Paul, Niladri; Paranjape, Aseem; Sheth, Ravi K.

2017-04-01

We use the halo model to explore the implications of assuming that galaxy luminosities in groups are randomly drawn from an underlying luminosity function. We show that even the simplest of such order statistics models - one in which this luminosity function p(L) is universal - naturally produces a number of features associated with previous analyses based on the 'central plus Poisson satellites' hypothesis. These include the monotonic relation of mean central luminosity with halo mass, the lognormal distribution around this mean and the tight relation between the central and satellite mass scales. In stark contrast to observations of galaxy clustering; however, this model predicts no luminosity dependence of large-scale clustering. We then show that an extended version of this model, based on the order statistics of a halo mass dependent luminosity function p(L|m), is in much better agreement with the clustering data as well as satellite luminosities, but systematically underpredicts central luminosities. This brings into focus the idea that central galaxies constitute a distinct population that is affected by different physical processes than are the satellites. We model this physical difference as a statistical brightening of the central luminosities, over and above the order statistics prediction. The magnitude gap between the brightest and second brightest group galaxy is predicted as a by-product, and is also in good agreement with observations. We propose that this order statistics framework provides a useful language in which to compare the halo model for galaxies with more physically motivated galaxy formation models.
Applying quantitative adiposity feature analysis models to predict benefit of bevacizumab-based chemotherapy in ovarian cancer patients

NASA Astrophysics Data System (ADS)

Wang, Yunzhi; Qiu, Yuchen; Thai, Theresa; More, Kathleen; Ding, Kai; Liu, Hong; Zheng, Bin

2016-03-01

How to rationally identify epithelial ovarian cancer (EOC) patients who will benefit from bevacizumab or other antiangiogenic therapies is a critical issue in EOC treatments. The motivation of this study is to quantitatively measure adiposity features from CT images and investigate the feasibility of predicting potential benefit of EOC patients with or without receiving bevacizumab-based chemotherapy treatment using multivariate statistical models built based on quantitative adiposity image features. A dataset involving CT images from 59 advanced EOC patients were included. Among them, 32 patients received maintenance bevacizumab after primary chemotherapy and the remaining 27 patients did not. We developed a computer-aided detection (CAD) scheme to automatically segment subcutaneous fat areas (VFA) and visceral fat areas (SFA) and then extracted 7 adiposity-related quantitative features. Three multivariate data analysis models (linear regression, logistic regression and Cox proportional hazards regression) were performed respectively to investigate the potential association between the model-generated prediction results and the patients' progression-free survival (PFS) and overall survival (OS). The results show that using all 3 statistical models, a statistically significant association was detected between the model-generated results and both of the two clinical outcomes in the group of patients receiving maintenance bevacizumab (p<0.01), while there were no significant association for both PFS and OS in the group of patients without receiving maintenance bevacizumab. Therefore, this study demonstrated the feasibility of using quantitative adiposity-related CT image features based statistical prediction models to generate a new clinical marker and predict the clinical outcome of EOC patients receiving maintenance bevacizumab-based chemotherapy.
Statistical Downscaling and Bias Correction of Climate Model Outputs for Climate Change Impact Assessment in the U.S. Northeast

NASA Technical Reports Server (NTRS)

Ahmed, Kazi Farzan; Wang, Guiling; Silander, John; Wilson, Adam M.; Allen, Jenica M.; Horton, Radley; Anyah, Richard

2013-01-01

Statistical downscaling can be used to efficiently downscale a large number of General Circulation Model (GCM) outputs to a fine temporal and spatial scale. To facilitate regional impact assessments, this study statistically downscales (to 1/8deg spatial resolution) and corrects the bias of daily maximum and minimum temperature and daily precipitation data from six GCMs and four Regional Climate Models (RCMs) for the northeast United States (US) using the Statistical Downscaling and Bias Correction (SDBC) approach. Based on these downscaled data from multiple models, five extreme indices were analyzed for the future climate to quantify future changes of climate extremes. For a subset of models and indices, results based on raw and bias corrected model outputs for the present-day climate were compared with observations, which demonstrated that bias correction is important not only for GCM outputs, but also for RCM outputs. For future climate, bias correction led to a higher level of agreements among the models in predicting the magnitude and capturing the spatial pattern of the extreme climate indices. We found that the incorporation of dynamical downscaling as an intermediate step does not lead to considerable differences in the results of statistical downscaling for the study domain.
Genomic similarity and kernel methods I: advancements by building on mathematical and statistical foundations.

PubMed

Schaid, Daniel J

2010-01-01

Measures of genomic similarity are the basis of many statistical analytic methods. We review the mathematical and statistical basis of similarity methods, particularly based on kernel methods. A kernel function converts information for a pair of subjects to a quantitative value representing either similarity (larger values meaning more similar) or distance (smaller values meaning more similar), with the requirement that it must create a positive semidefinite matrix when applied to all pairs of subjects. This review emphasizes the wide range of statistical methods and software that can be used when similarity is based on kernel methods, such as nonparametric regression, linear mixed models and generalized linear mixed models, hierarchical models, score statistics, and support vector machines. The mathematical rigor for these methods is summarized, as is the mathematical framework for making kernels. This review provides a framework to move from intuitive and heuristic approaches to define genomic similarities to more rigorous methods that can take advantage of powerful statistical modeling and existing software. A companion paper reviews novel approaches to creating kernels that might be useful for genomic analyses, providing insights with examples [1]. Copyright © 2010 S. Karger AG, Basel.
The Balance-Scale Task Revisited: A Comparison of Statistical Models for Rule-Based and Information-Integration Theories of Proportional Reasoning

PubMed Central

Hofman, Abe D.; Visser, Ingmar; Jansen, Brenda R. J.; van der Maas, Han L. J.

2015-01-01

We propose and test three statistical models for the analysis of children’s responses to the balance scale task, a seminal task to study proportional reasoning. We use a latent class modelling approach to formulate a rule-based latent class model (RB LCM) following from a rule-based perspective on proportional reasoning and a new statistical model, the Weighted Sum Model, following from an information-integration approach. Moreover, a hybrid LCM using item covariates is proposed, combining aspects of both a rule-based and information-integration perspective. These models are applied to two different datasets, a standard paper-and-pencil test dataset (N = 779), and a dataset collected within an online learning environment that included direct feedback, time-pressure, and a reward system (N = 808). For the paper-and-pencil dataset the RB LCM resulted in the best fit, whereas for the online dataset the hybrid LCM provided the best fit. The standard paper-and-pencil dataset yielded more evidence for distinct solution rules than the online data set in which quantitative item characteristics are more prominent in determining responses. These results shed new light on the discussion on sequential rule-based and information-integration perspectives of cognitive development. PMID:26505905
The Development of Statistics Textbook Supported with ICT and Portfolio-Based Assessment

NASA Astrophysics Data System (ADS)

Hendikawati, Putriaji; Yuni Arini, Florentina

2016-02-01

This research was development research that aimed to develop and produce a Statistics textbook model that supported with information and communication technology (ICT) and Portfolio-Based Assessment. This book was designed for students of mathematics at the college to improve students’ ability in mathematical connection and communication. There were three stages in this research i.e. define, design, and develop. The textbooks consisted of 10 chapters which each chapter contains introduction, core materials and include examples and exercises. The textbook developed phase begins with the early stages of designed the book (draft 1) which then validated by experts. Revision of draft 1 produced draft 2 which then limited test for readability test book. Furthermore, revision of draft 2 produced textbook draft 3 which simulated on a small sample to produce a valid model textbook. The data were analysed with descriptive statistics. The analysis showed that the Statistics textbook model that supported with ICT and Portfolio-Based Assessment valid and fill up the criteria of practicality.
10 CFR 431.173 - Requirements applicable to all manufacturers.

Code of Federal Regulations, 2011 CFR

2011-01-01

... COMMERCIAL AND INDUSTRIAL EQUIPMENT Provisions for Commercial Heating, Ventilating, Air-Conditioning and... is based on engineering or statistical analysis, computer simulation or modeling, or other analytic... method or methods used; (B) The mathematical model, the engineering or statistical analysis, computer...
New robust statistical procedures for the polytomous logistic regression models.

PubMed

Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro

2018-05-17

This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
Two Paradoxes in Linear Regression Analysis

PubMed Central

FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong

2016-01-01

Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

NASA Astrophysics Data System (ADS)

Goel, Amit; Montgomery, Michele

2015-08-01

Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.
Comparing the Fit of Item Response Theory and Factor Analysis Models

ERIC Educational Resources Information Center

Maydeu-Olivares, Alberto; Cai, Li; Hernandez, Adolfo

2011-01-01

Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be…
Assessment of the scale effect on statistical downscaling quality at a station scale using a weather generator-based model

USDA-ARS?s Scientific Manuscript database

The resolution of General Circulation Models (GCMs) is too coarse to assess the fine scale or site-specific impacts of climate change. Downscaling approaches including dynamical and statistical downscaling have been developed to meet this requirement. As the resolution of climate model increases, it...
A Modeling Approach to the Development of Students' Informal Inferential Reasoning

ERIC Educational Resources Information Center

Doerr, Helen M.; Delmas, Robert; Makar, Katie

2017-01-01

Teaching from an informal statistical inference perspective can address the challenge of teaching statistics in a coherent way. We argue that activities that promote model-based reasoning address two additional challenges: providing a coherent sequence of topics and promoting the application of knowledge to novel situations. We take a models and…
ASCS online fault detection and isolation based on an improved MPCA

NASA Astrophysics Data System (ADS)

Peng, Jianxin; Liu, Haiou; Hu, Yuhui; Xi, Junqiang; Chen, Huiyan

2014-09-01

Multi-way principal component analysis (MPCA) has received considerable attention and been widely used in process monitoring. A traditional MPCA algorithm unfolds multiple batches of historical data into a two-dimensional matrix and cut the matrix along the time axis to form subspaces. However, low efficiency of subspaces and difficult fault isolation are the common disadvantages for the principal component model. This paper presents a new subspace construction method based on kernel density estimation function that can effectively reduce the storage amount of the subspace information. The MPCA model and the knowledge base are built based on the new subspace. Then, fault detection and isolation with the squared prediction error (SPE) statistic and the Hotelling ( T 2) statistic are also realized in process monitoring. When a fault occurs, fault isolation based on the SPE statistic is achieved by residual contribution analysis of different variables. For fault isolation of subspace based on the T 2 statistic, the relationship between the statistic indicator and state variables is constructed, and the constraint conditions are presented to check the validity of fault isolation. Then, to improve the robustness of fault isolation to unexpected disturbances, the statistic method is adopted to set the relation between single subspace and multiple subspaces to increase the corrective rate of fault isolation. Finally fault detection and isolation based on the improved MPCA is used to monitor the automatic shift control system (ASCS) to prove the correctness and effectiveness of the algorithm. The research proposes a new subspace construction method to reduce the required storage capacity and to prove the robustness of the principal component model, and sets the relationship between the state variables and fault detection indicators for fault isolation.
When mechanism matters: Bayesian forecasting using models of ecological diffusion

USGS Publications Warehouse

Hefley, Trevor J.; Hooten, Mevin B.; Russell, Robin E.; Walsh, Daniel P.; Powell, James A.

2017-01-01

Ecological diffusion is a theory that can be used to understand and forecast spatio-temporal processes such as dispersal, invasion, and the spread of disease. Hierarchical Bayesian modelling provides a framework to make statistical inference and probabilistic forecasts, using mechanistic ecological models. To illustrate, we show how hierarchical Bayesian models of ecological diffusion can be implemented for large data sets that are distributed densely across space and time. The hierarchical Bayesian approach is used to understand and forecast the growth and geographic spread in the prevalence of chronic wasting disease in white-tailed deer (Odocoileus virginianus). We compare statistical inference and forecasts from our hierarchical Bayesian model to phenomenological regression-based methods that are commonly used to analyse spatial occurrence data. The mechanistic statistical model based on ecological diffusion led to important ecological insights, obviated a commonly ignored type of collinearity, and was the most accurate method for forecasting.
Improving UWB-Based Localization in IoT Scenarios with Statistical Models of Distance Error.

PubMed

Monica, Stefania; Ferrari, Gianluigi

2018-05-17

Interest in the Internet of Things (IoT) is rapidly increasing, as the number of connected devices is exponentially growing. One of the application scenarios envisaged for IoT technologies involves indoor localization and context awareness. In this paper, we focus on a localization approach that relies on a particular type of communication technology, namely Ultra Wide Band (UWB). UWB technology is an attractive choice for indoor localization, owing to its high accuracy. Since localization algorithms typically rely on estimated inter-node distances, the goal of this paper is to evaluate the improvement brought by a simple (linear) statistical model of the distance error. On the basis of an extensive experimental measurement campaign, we propose a general analytical framework, based on a Least Square (LS) method, to derive a novel statistical model for the range estimation error between a pair of UWB nodes. The proposed statistical model is then applied to improve the performance of a few illustrative localization algorithms in various realistic scenarios. The obtained experimental results show that the use of the proposed statistical model improves the accuracy of the considered localization algorithms with a reduction of the localization error up to 66%.
Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students

PubMed Central

Saadati, Farzaneh; Ahmad Tarmizi, Rohani

2015-01-01

Because students’ ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is ‘value added’ because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students’ problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students. PMID:26132553

A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

NASA Astrophysics Data System (ADS)

Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

1990-10-01

Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.
Statistical prediction of September Arctic Sea Ice minimum based on stable teleconnections with global climate and oceanic patterns

NASA Astrophysics Data System (ADS)

Ionita, M.; Grosfeld, K.; Scholz, P.; Lohmann, G.

2016-12-01

Sea ice in both Polar Regions is an important indicator for the expression of global climate change and its polar amplification. Consequently, a broad information interest exists on sea ice, its coverage, variability and long term change. Knowledge on sea ice requires high quality data on ice extent, thickness and its dynamics. However, its predictability depends on various climate parameters and conditions. In order to provide insights into the potential development of a monthly/seasonal signal, we developed a robust statistical model based on ocean heat content, sea surface temperature and atmospheric variables to calculate an estimate of the September minimum sea ice extent for every year. Although previous statistical attempts at monthly/seasonal forecasts of September sea ice minimum show a relatively reduced skill, here it is shown that more than 97% (r = 0.98) of the September sea ice extent can predicted three months in advance by using previous months conditions via a multiple linear regression model based on global sea surface temperature (SST), mean sea level pressure (SLP), air temperature at 850hPa (TT850), surface winds and sea ice extent persistence. The statistical model is based on the identification of regions with stable teleconnections between the predictors (climatological parameters) and the predictand (here sea ice extent). The results based on our statistical model contribute to the sea ice prediction network for the sea ice outlook report (https://www.arcus.org/sipn) and could provide a tool for identifying relevant regions and climate parameters that are important for the sea ice development in the Arctic and for detecting sensitive and critical regions in global coupled climate models with focus on sea ice formation.
Statistical representation of multiphase flow

NASA Astrophysics Data System (ADS)

Subramaniam

2000-11-01

The relationship between two common statistical representations of multiphase flow, namely, the single--point Eulerian statistical representation of two--phase flow (D. A. Drew, Ann. Rev. Fluid Mech. (15), 1983), and the Lagrangian statistical representation of a spray using the dropet distribution function (F. A. Williams, Phys. Fluids 1 (6), 1958) is established for spherical dispersed--phase elements. This relationship is based on recent work which relates the droplet distribution function to single--droplet pdfs starting from a Liouville description of a spray (Subramaniam, Phys. Fluids 10 (12), 2000). The Eulerian representation, which is based on a random--field model of the flow, is shown to contain different statistical information from the Lagrangian representation, which is based on a point--process model. The two descriptions are shown to be simply related for spherical, monodisperse elements in statistically homogeneous two--phase flow, whereas such a simple relationship is precluded by the inclusion of polydispersity and statistical inhomogeneity. The common origin of these two representations is traced to a more fundamental statistical representation of a multiphase flow, whose concepts derive from a theory for dense sprays recently proposed by Edwards (Atomization and Sprays 10 (3--5), 2000). The issue of what constitutes a minimally complete statistical representation of a multiphase flow is resolved.
The Effect of Project Based Learning on the Statistical Literacy Levels of Student 8th Grade

ERIC Educational Resources Information Center

Koparan, Timur; Güven, Bülent

2014-01-01

This study examines the effect of project based learning on 8th grade students' statistical literacy levels. A performance test was developed for this aim. Quasi-experimental research model was used in this article. In this context, the statistics were taught with traditional method in the control group and it was taught using project based…
Regional climate model downscaling may improve the prediction of alien plant species distributions

NASA Astrophysics Data System (ADS)

Liu, Shuyan; Liang, Xin-Zhong; Gao, Wei; Stohlgren, Thomas J.

2014-12-01

Distributions of invasive species are commonly predicted with species distribution models that build upon the statistical relationships between observed species presence data and climate data. We used field observations, climate station data, and Maximum Entropy species distribution models for 13 invasive plant species in the United States, and then compared the models with inputs from a General Circulation Model (hereafter GCM-based models) and a downscaled Regional Climate Model (hereafter, RCM-based models).We also compared species distributions based on either GCM-based or RCM-based models for the present (1990-1999) to the future (2046-2055). RCM-based species distribution models replicated observed distributions remarkably better than GCM-based models for all invasive species under the current climate. This was shown for the presence locations of the species, and by using four common statistical metrics to compare modeled distributions. For two widespread invasive taxa ( Bromus tectorum or cheatgrass, and Tamarix spp. or tamarisk), GCM-based models failed miserably to reproduce observed species distributions. In contrast, RCM-based species distribution models closely matched observations. Future species distributions may be significantly affected by using GCM-based inputs. Because invasive plants species often show high resilience and low rates of local extinction, RCM-based species distribution models may perform better than GCM-based species distribution models for planning containment programs for invasive species.
Evaluating statistical consistency in the ocean model component of the Community Earth System Model (pyCECT v2.0)

NASA Astrophysics Data System (ADS)

Baker, Allison H.; Hu, Yong; Hammerling, Dorit M.; Tseng, Yu-heng; Xu, Haiying; Huang, Xiaomeng; Bryan, Frank O.; Yang, Guangwen

2016-07-01

The Parallel Ocean Program (POP), the ocean model component of the Community Earth System Model (CESM), is widely used in climate research. Most current work in CESM-POP focuses on improving the model's efficiency or accuracy, such as improving numerical methods, advancing parameterization, porting to new architectures, or increasing parallelism. Since ocean dynamics are chaotic in nature, achieving bit-for-bit (BFB) identical results in ocean solutions cannot be guaranteed for even tiny code modifications, and determining whether modifications are admissible (i.e., statistically consistent with the original results) is non-trivial. In recent work, an ensemble-based statistical approach was shown to work well for software verification (i.e., quality assurance) on atmospheric model data. The general idea of the ensemble-based statistical consistency testing is to use a qualitative measurement of the variability of the ensemble of simulations as a metric with which to compare future simulations and make a determination of statistical distinguishability. The capability to determine consistency without BFB results boosts model confidence and provides the flexibility needed, for example, for more aggressive code optimizations and the use of heterogeneous execution environments. Since ocean and atmosphere models have differing characteristics in term of dynamics, spatial variability, and timescales, we present a new statistical method to evaluate ocean model simulation data that requires the evaluation of ensemble means and deviations in a spatial manner. In particular, the statistical distribution from an ensemble of CESM-POP simulations is used to determine the standard score of any new model solution at each grid point. Then the percentage of points that have scores greater than a specified threshold indicates whether the new model simulation is statistically distinguishable from the ensemble simulations. Both ensemble size and composition are important. Our experiments indicate that the new POP ensemble consistency test (POP-ECT) tool is capable of distinguishing cases that should be statistically consistent with the ensemble and those that should not, as well as providing a simple, subjective and systematic way to detect errors in CESM-POP due to the hardware or software stack, positively contributing to quality assurance for the CESM-POP code.
Statistical wind analysis for near-space applications

NASA Astrophysics Data System (ADS)

Roney, Jason A.

2007-09-01

Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.
Statistical properties of several models of fractional random point processes

NASA Astrophysics Data System (ADS)

Bendjaballah, C.

2011-08-01

Statistical properties of several models of fractional random point processes have been analyzed from the counting and time interval statistics points of view. Based on the criterion of the reduced variance, it is seen that such processes exhibit nonclassical properties. The conditions for these processes to be treated as conditional Poisson processes are examined. Numerical simulations illustrate part of the theoretical calculations.
The Use of a Context-Based Information Retrieval Technique

DTIC Science & Technology

2009-07-01

provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic
Nonelastic nuclear reactions and accompanying gamma radiation

NASA Technical Reports Server (NTRS)

Snow, R.; Rosner, H. R.; George, M. C.; Hayes, J. D.

1971-01-01

Several aspects of nonelastic nuclear reactions which proceed through the formation of a compound nucleus are dealt with. The full statistical model and the partial statistical model are described and computer programs based on these models are presented along with operating instructions and input and output for sample problems. A theoretical development of the expression for the reaction cross section for the hybrid case which involves a combination of the continuum aspects of the full statistical model with the discrete level aspects of the partial statistical model is presented. Cross sections for level excitation and gamma production by neutron inelastic scattering from the nuclei Al-27, Fe-56, Si-28, and Pb-208 are calculated and compared with avaliable experimental data.
Walking through the statistical black boxes of plant breeding.

PubMed

Xavier, Alencar; Muir, William M; Craig, Bruce; Rainey, Katy Martin

2016-10-01

The main statistical procedures in plant breeding are based on Gaussian process and can be computed through mixed linear models. Intelligent decision making relies on our ability to extract useful information from data to help us achieve our goals more efficiently. Many plant breeders and geneticists perform statistical analyses without understanding the underlying assumptions of the methods or their strengths and pitfalls. In other words, they treat these statistical methods (software and programs) like black boxes. Black boxes represent complex pieces of machinery with contents that are not fully understood by the user. The user sees the inputs and outputs without knowing how the outputs are generated. By providing a general background on statistical methodologies, this review aims (1) to introduce basic concepts of machine learning and its applications to plant breeding; (2) to link classical selection theory to current statistical approaches; (3) to show how to solve mixed models and extend their application to pedigree-based and genomic-based prediction; and (4) to clarify how the algorithms of genome-wide association studies work, including their assumptions and limitations.
The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

PubMed

Tendeiro, Jorge N

2017-01-01

Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.
Comparison of the predictive validity of diagnosis-based risk adjusters for clinical outcomes.

PubMed

Petersen, Laura A; Pietz, Kenneth; Woodard, LeChauncy D; Byrne, Margaret

2005-01-01

Many possible methods of risk adjustment exist, but there is a dearth of comparative data on their performance. We compared the predictive validity of 2 widely used methods (Diagnostic Cost Groups [DCGs] and Adjusted Clinical Groups [ACGs]) for 2 clinical outcomes using a large national sample of patients. We studied all patients who used Veterans Health Administration (VA) medical services in fiscal year (FY) 2001 (n = 3,069,168) and assigned both a DCG and an ACG to each. We used logistic regression analyses to compare predictive ability for death or long-term care (LTC) hospitalization for age/gender models, DCG models, and ACG models. We also assessed the effect of adding age to the DCG and ACG models. Patients in the highest DCG categories, indicating higher severity of illness, were more likely to die or to require LTC hospitalization. Surprisingly, the age/gender model predicted death slightly more accurately than the ACG model (c-statistic of 0.710 versus 0.700, respectively). The addition of age to the ACG model improved the c-statistic to 0.768. The highest c-statistic for prediction of death was obtained with a DCG/age model (0.830). The lowest c-statistics were obtained for age/gender models for LTC hospitalization (c-statistic 0.593). The c-statistic for use of ACGs to predict LTC hospitalization was 0.783, and improved to 0.792 with the addition of age. The c-statistics for use of DCGs and DCG/age to predict LTC hospitalization were 0.885 and 0.890, respectively, indicating the best prediction. We found that risk adjusters based upon diagnoses predicted an increased likelihood of death or LTC hospitalization, exhibiting good predictive validity. In this comparative analysis using VA data, DCG models were generally superior to ACG models in predicting clinical outcomes, although ACG model performance was enhanced by the addition of age.
Global Sensitivity Analysis of Environmental Systems via Multiple Indices based on Statistical Moments of Model Outputs

NASA Astrophysics Data System (ADS)

Guadagnini, A.; Riva, M.; Dell'Oca, A.

2017-12-01

We propose to ground sensitivity of uncertain parameters of environmental models on a set of indices based on the main (statistical) moments, i.e., mean, variance, skewness and kurtosis, of the probability density function (pdf) of a target model output. This enables us to perform Global Sensitivity Analysis (GSA) of a model in terms of multiple statistical moments and yields a quantification of the impact of model parameters on features driving the shape of the pdf of model output. Our GSA approach includes the possibility of being coupled with the construction of a reduced complexity model that allows approximating the full model response at a reduced computational cost. We demonstrate our approach through a variety of test cases. These include a commonly used analytical benchmark, a simplified model representing pumping in a coastal aquifer, a laboratory-scale tracer experiment, and the migration of fracturing fluid through a naturally fractured reservoir (source) to reach an overlying formation (target). Our strategy allows discriminating the relative importance of model parameters to the four statistical moments considered. We also provide an appraisal of the error associated with the evaluation of our sensitivity metrics by replacing the original system model through the selected surrogate model. Our results suggest that one might need to construct a surrogate model with increasing level of accuracy depending on the statistical moment considered in the GSA. The methodological framework we propose can assist the development of analysis techniques targeted to model calibration, design of experiment, uncertainty quantification and risk assessment.
Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

NASA Astrophysics Data System (ADS)

Eroglu, Sertac

2014-10-01

The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.
Performance comparison of LUR and OK in PM2.5 concentration mapping: a multidimensional perspective

PubMed Central

Zou, Bin; Luo, Yanqing; Wan, Neng; Zheng, Zhong; Sternberg, Troy; Liao, Yilan

2015-01-01

Methods of Land Use Regression (LUR) modeling and Ordinary Kriging (OK) interpolation have been widely used to offset the shortcomings of PM2.5 data observed at sparse monitoring sites. However, traditional point-based performance evaluation strategy for these methods remains stagnant, which could cause unreasonable mapping results. To address this challenge, this study employs ‘information entropy’, an area-based statistic, along with traditional point-based statistics (e.g. error rate, RMSE) to evaluate the performance of LUR model and OK interpolation in mapping PM2.5 concentrations in Houston from a multidimensional perspective. The point-based validation reveals significant differences between LUR and OK at different test sites despite the similar end-result accuracy (e.g. error rate 6.13% vs. 7.01%). Meanwhile, the area-based validation demonstrates that the PM2.5 concentrations simulated by the LUR model exhibits more detailed variations than those interpolated by the OK method (i.e. information entropy, 7.79 vs. 3.63). Results suggest that LUR modeling could better refine the spatial distribution scenario of PM2.5 concentrations compared to OK interpolation. The significance of this study primarily lies in promoting the integration of point- and area-based statistics for model performance evaluation in air pollution mapping. PMID:25731103
Automated finite element modeling of the lumbar spine: Using a statistical shape model to generate a virtual population of models.

PubMed

Campbell, J Q; Petrella, A J

2016-09-06

Population-based modeling of the lumbar spine has the potential to be a powerful clinical tool. However, developing a fully parameterized model of the lumbar spine with accurate geometry has remained a challenge. The current study used automated methods for landmark identification to create a statistical shape model of the lumbar spine. The shape model was evaluated using compactness, generalization ability, and specificity. The primary shape modes were analyzed visually, quantitatively, and biomechanically. The biomechanical analysis was performed by using the statistical shape model with an automated method for finite element model generation to create a fully parameterized finite element model of the lumbar spine. Functional finite element models of the mean shape and the extreme shapes (±3 standard deviations) of all 17 shape modes were created demonstrating the robust nature of the methods. This study represents an advancement in finite element modeling of the lumbar spine and will allow population-based modeling in the future. Copyright © 2016 Elsevier Ltd. All rights reserved.
A smoothed residual based goodness-of-fit statistic for nest-survival models

Treesearch

Rodney X. Sturdivant; Jay J. Rotella; Robin E. Russell

2008-01-01

Estimating nest success and identifying important factors related to nest-survival rates is an essential goal for many wildlife researchers interested in understanding avian population dynamics. Advances in statistical methods have led to a number of estimation methods and approaches to modeling this problem. Recently developed models allow researchers to include a...
Residuals and the Residual-Based Statistic for Testing Goodness of Fit of Structural Equation Models

ERIC Educational Resources Information Center

Foldnes, Njal; Foss, Tron; Olsson, Ulf Henning

2012-01-01

The residuals obtained from fitting a structural equation model are crucial ingredients in obtaining chi-square goodness-of-fit statistics for the model. The authors present a didactic discussion of the residuals, obtaining a geometrical interpretation by recognizing the residuals as the result of oblique projections. This sheds light on the…
Estimation of Mouse Organ Locations Through Registration of a Statistical Mouse Atlas With Micro-CT Images

PubMed Central

Stout, David B.; Chatziioannou, Arion F.

2012-01-01

Micro-CT is widely used in preclinical studies of small animals. Due to the low soft-tissue contrast in typical studies, segmentation of soft tissue organs from noncontrast enhanced micro-CT images is a challenging problem. Here, we propose an atlas-based approach for estimating the major organs in mouse micro-CT images. A statistical atlas of major trunk organs was constructed based on 45 training subjects. The statistical shape model technique was used to include inter-subject anatomical variations. The shape correlations between different organs were described using a conditional Gaussian model. For registration, first the high-contrast organs in micro-CT images were registered by fitting the statistical shape model, while the low-contrast organs were subsequently estimated from the high-contrast organs using the conditional Gaussian model. The registration accuracy was validated based on 23 noncontrast-enhanced and 45 contrast-enhanced micro-CT images. Three different accuracy metrics (Dice coefficient, organ volume recovery coefficient, and surface distance) were used for evaluation. The Dice coefficients vary from 0.45 ± 0.18 for the spleen to 0.90 ± 0.02 for the lungs, the volume recovery coefficients vary from for the liver to 1.30 ± 0.75 for the spleen, the surface distances vary from 0.18 ± 0.01 mm for the lungs to 0.72 ± 0.42 mm for the spleen. The registration accuracy of the statistical atlas was compared with two publicly available single-subject mouse atlases, i.e., the MOBY phantom and the DIGIMOUSE atlas, and the results proved that the statistical atlas is more accurate than the single atlases. To evaluate the influence of the training subject size, different numbers of training subjects were used for atlas construction and registration. The results showed an improvement of the registration accuracy when more training subjects were used for the atlas construction. The statistical atlas-based registration was also compared with the thin-plate spline based deformable registration, commonly used in mouse atlas registration. The results revealed that the statistical atlas has the advantage of improving the estimation of low-contrast organs. PMID:21859613

Can climate variability information constrain a hydrological model for an ungauged Costa Rican catchment?

NASA Astrophysics Data System (ADS)

Quesada-Montano, Beatriz; Westerberg, Ida K.; Fuentes-Andino, Diana; Hidalgo-Leon, Hugo; Halldin, Sven

2017-04-01

Long-term hydrological data are key to understanding catchment behaviour and for decision making within water management and planning. Given the lack of observed data in many regions worldwide, hydrological models are an alternative for reproducing historical streamflow series. Additional types of information - to locally observed discharge - can be used to constrain model parameter uncertainty for ungauged catchments. Climate variability exerts a strong influence on streamflow variability on long and short time scales, in particular in the Central-American region. We therefore explored the use of climate variability knowledge to constrain the simulated discharge uncertainty of a conceptual hydrological model applied to a Costa Rican catchment, assumed to be ungauged. To reduce model uncertainty we first rejected parameter relationships that disagreed with our understanding of the system. We then assessed how well climate-based constraints applied at long-term, inter-annual and intra-annual time scales could constrain model uncertainty. Finally, we compared the climate-based constraints to a constraint on low-flow statistics based on information obtained from global maps. We evaluated our method in terms of the ability of the model to reproduce the observed hydrograph and the active catchment processes in terms of two efficiency measures, a statistical consistency measure, a spread measure and 17 hydrological signatures. We found that climate variability knowledge was useful for reducing model uncertainty, in particular, unrealistic representation of deep groundwater processes. The constraints based on global maps of low-flow statistics provided more constraining information than those based on climate variability, but the latter rejected slow rainfall-runoff representations that the low flow statistics did not reject. The use of such knowledge, together with information on low-flow statistics and constraints on parameter relationships showed to be useful to constrain model uncertainty for an - assumed to be - ungauged basin. This shows that our method is promising for reconstructing long-term flow data for ungauged catchments on the Pacific side of Central America, and that similar methods can be developed for ungauged basins in other regions where climate variability exerts a strong control on streamflow variability.
CoMFA analyses of C-2 position salvinorin A analogs at the kappa-opioid receptor provides insights into epimer selectivity.

PubMed

McGovern, Donna L; Mosier, Philip D; Roth, Bryan L; Westkaemper, Richard B

2010-04-01

The highly potent and kappa-opioid (KOP) receptor-selective hallucinogen Salvinorin A and selected analogs have been analyzed using the 3D quantitative structure-affinity relationship technique Comparative Molecular Field Analysis (CoMFA) in an effort to derive a statistically significant and predictive model of salvinorin affinity at the KOP receptor and to provide additional statistical support for the validity of previously proposed structure-based interaction models. Two CoMFA models of Salvinorin A analogs substituted at the C-2 position are presented. Separate models were developed based on the radioligand used in the kappa-opioid binding assay, [(3)H]diprenorphine or [(125)I]6 beta-iodo-3,14-dihydroxy-17-cyclopropylmethyl-4,5 alpha-epoxymorphinan ([(125)I]IOXY). For each dataset, three methods of alignment were employed: a receptor-docked alignment derived from the structure-based docking algorithm GOLD, another from the ligand-based alignment algorithm FlexS, and a rigid realignment of the poses from the receptor-docked alignment. The receptor-docked alignment produced statistically superior results compared to either the FlexS alignment or the realignment in both datasets. The [(125)I]IOXY set (Model 1) and [(3)H]diprenorphine set (Model 2) gave q(2) values of 0.592 and 0.620, respectively, using the receptor-docked alignment, and both models produced similar CoMFA contour maps that reflected the stereoelectronic features of the receptor model from which they were derived. Each model gave significantly predictive CoMFA statistics (Model 1 PSET r(2)=0.833; Model 2 PSET r(2)=0.813). Based on the CoMFA contour maps, a binding mode was proposed for amine-containing Salvinorin A analogs that provides a rationale for the observation that the beta-epimers (R-configuration) of protonated amines at the C-2 position have a higher affinity than the corresponding alpha-epimers (S-configuration). (c) 2010. Published by Elsevier Inc.
Subject-enabled analytics model on measurement statistics in health risk expert system for public health informatics.

PubMed

Chung, Chi-Jung; Kuo, Yu-Chen; Hsieh, Yun-Yu; Li, Tsai-Chung; Lin, Cheng-Chieh; Liang, Wen-Miin; Liao, Li-Na; Li, Chia-Ing; Lin, Hsueh-Chun

2017-11-01

This study applied open source technology to establish a subject-enabled analytics model that can enhance measurement statistics of case studies with the public health data in cloud computing. The infrastructure of the proposed model comprises three domains: 1) the health measurement data warehouse (HMDW) for the case study repository, 2) the self-developed modules of online health risk information statistics (HRIStat) for cloud computing, and 3) the prototype of a Web-based process automation system in statistics (PASIS) for the health risk assessment of case studies with subject-enabled evaluation. The system design employed freeware including Java applications, MySQL, and R packages to drive a health risk expert system (HRES). In the design, the HRIStat modules enforce the typical analytics methods for biomedical statistics, and the PASIS interfaces enable process automation of the HRES for cloud computing. The Web-based model supports both modes, step-by-step analysis and auto-computing process, respectively for preliminary evaluation and real time computation. The proposed model was evaluated by computing prior researches in relation to the epidemiological measurement of diseases that were caused by either heavy metal exposures in the environment or clinical complications in hospital. The simulation validity was approved by the commercial statistics software. The model was installed in a stand-alone computer and in a cloud-server workstation to verify computing performance for a data amount of more than 230K sets. Both setups reached efficiency of about 10 5 sets per second. The Web-based PASIS interface can be used for cloud computing, and the HRIStat module can be flexibly expanded with advanced subjects for measurement statistics. The analytics procedure of the HRES prototype is capable of providing assessment criteria prior to estimating the potential risk to public health. Copyright © 2017 Elsevier B.V. All rights reserved.
Grain-Size Based Additivity Models for Scaling Multi-rate Uranyl Surface Complexation in Subsurface Sediments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Xiaoying; Liu, Chongxuan; Hu, Bill X.

This study statistically analyzed a grain-size based additivity model that has been proposed to scale reaction rates and parameters from laboratory to field. The additivity model assumed that reaction properties in a sediment including surface area, reactive site concentration, reaction rate, and extent can be predicted from field-scale grain size distribution by linearly adding reaction properties for individual grain size fractions. This study focused on the statistical analysis of the additivity model with respect to reaction rate constants using multi-rate uranyl (U(VI)) surface complexation reactions in a contaminated sediment as an example. Experimental data of rate-limited U(VI) desorption in amore » stirred flow-cell reactor were used to estimate the statistical properties of multi-rate parameters for individual grain size fractions. The statistical properties of the rate constants for the individual grain size fractions were then used to analyze the statistical properties of the additivity model to predict rate-limited U(VI) desorption in the composite sediment, and to evaluate the relative importance of individual grain size fractions to the overall U(VI) desorption. The result indicated that the additivity model provided a good prediction of the U(VI) desorption in the composite sediment. However, the rate constants were not directly scalable using the additivity model, and U(VI) desorption in individual grain size fractions have to be simulated in order to apply the additivity model. An approximate additivity model for directly scaling rate constants was subsequently proposed and evaluated. The result found that the approximate model provided a good prediction of the experimental results within statistical uncertainty. This study also found that a gravel size fraction (2-8mm), which is often ignored in modeling U(VI) sorption and desorption, is statistically significant to the U(VI) desorption in the sediment.« less
Statistical Emulation of Climate Model Projections Based on Precomputed GCM Runs*

DOE PAGES

Castruccio, Stefano; McInerney, David J.; Stein, Michael L.; ...

2014-02-24

The authors describe a new approach for emulating the output of a fully coupled climate model under arbitrary forcing scenarios that is based on a small set of precomputed runs from the model. Temperature and precipitation are expressed as simple functions of the past trajectory of atmospheric CO 2 concentrations, and a statistical model is fit using a limited set of training runs. The approach is demonstrated to be a useful and computationally efficient alternative to pattern scaling and captures the nonlinear evolution of spatial patterns of climate anomalies inherent in transient climates. The approach does as well as patternmore » scaling in all circumstances and substantially better in many; it is not computationally demanding; and, once the statistical model is fit, it produces emulated climate output effectively instantaneously. In conclusion, it may therefore find wide application in climate impacts assessments and other policy analyses requiring rapid climate projections.« less
Damages detection in cylindrical metallic specimens by means of statistical baseline models and updated daily temperature profiles

NASA Astrophysics Data System (ADS)

Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo

2017-05-01

In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.
Assessment of corneal properties based on statistical modeling of OCT speckle

PubMed Central

Jesus, Danilo A.; Iskander, D. Robert

2016-01-01

A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409
Performance impact of stop lists and morphological decomposition on word-word corpus-based semantic space models.

PubMed

Keith, Jeff; Westbury, Chris; Goldman, James

2015-09-01

Corpus-based semantic space models, which primarily rely on lexical co-occurrence statistics, have proven effective in modeling and predicting human behavior in a number of experimental paradigms that explore semantic memory representation. The most widely studied extant models, however, are strongly influenced by orthographic word frequency (e.g., Shaoul & Westbury, Behavior Research Methods, 38, 190-195, 2006). This has the implication that high-frequency closed-class words can potentially bias co-occurrence statistics. Because these closed-class words are purported to carry primarily syntactic, rather than semantic, information, the performance of corpus-based semantic space models may be improved by excluding closed-class words (using stop lists) from co-occurrence statistics, while retaining their syntactic information through other means (e.g., part-of-speech tagging and/or affixes from inflected word forms). Additionally, very little work has been done to explore the effect of employing morphological decomposition on the inflected forms of words in corpora prior to compiling co-occurrence statistics, despite (controversial) evidence that humans perform early morphological decomposition in semantic processing. In this study, we explored the impact of these factors on corpus-based semantic space models. From this study, morphological decomposition appears to significantly improve performance in word-word co-occurrence semantic space models, providing some support for the claim that sublexical information-specifically, word morphology-plays a role in lexical semantic processing. An overall decrease in performance was observed in models employing stop lists (e.g., excluding closed-class words). Furthermore, we found some evidence that weakens the claim that closed-class words supply primarily syntactic information in word-word co-occurrence semantic space models.
A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

NASA Astrophysics Data System (ADS)

Xu, Lei; Chen, Nengcheng; Zhang, Xiang

2018-02-01

Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.
A Comparison of the Performance of Advanced Statistical Techniques for the Refinement of Day-ahead and Longer NWP-based Wind Power Forecasts

NASA Astrophysics Data System (ADS)

Zack, J. W.

2015-12-01

Predictions from Numerical Weather Prediction (NWP) models are the foundation for wind power forecasts for day-ahead and longer forecast horizons. The NWP models directly produce three-dimensional wind forecasts on their respective computational grids. These can be interpolated to the location and time of interest. However, these direct predictions typically contain significant systematic errors ("biases"). This is due to a variety of factors including the limited space-time resolution of the NWP models and shortcomings in the model's representation of physical processes. It has become common practice to attempt to improve the raw NWP forecasts by statistically adjusting them through a procedure that is widely known as Model Output Statistics (MOS). The challenge is to identify complex patterns of systematic errors and then use this knowledge to adjust the NWP predictions. The MOS-based improvements are the basis for much of the value added by commercial wind power forecast providers. There are an enormous number of statistical approaches that can be used to generate the MOS adjustments to the raw NWP forecasts. In order to obtain insight into the potential value of some of the newer and more sophisticated statistical techniques often referred to as "machine learning methods" a MOS-method comparison experiment has been performed for wind power generation facilities in 6 wind resource areas of California. The underlying NWP models that provided the raw forecasts were the two primary operational models of the US National Weather Service: the GFS and NAM models. The focus was on 1- and 2-day ahead forecasts of the hourly wind-based generation. The statistical methods evaluated included: (1) screening multiple linear regression, which served as a baseline method, (2) artificial neural networks, (3) a decision-tree approach called random forests, (4) gradient boosted regression based upon an decision-tree algorithm, (5) support vector regression and (6) analog ensemble, which is a case-matching scheme. The presentation will provide (1) an overview of each method and the experimental design, (2) performance comparisons based on standard metrics such as bias, MAE and RMSE, (3) a summary of the performance characteristics of each approach and (4) a preview of further experiments to be conducted.
The propagation of inventory-based positional errors into statistical landslide susceptibility models

NASA Astrophysics Data System (ADS)

Steger, Stefan; Brenning, Alexander; Bell, Rainer; Glade, Thomas

2016-12-01

There is unanimous agreement that a precise spatial representation of past landslide occurrences is a prerequisite to produce high quality statistical landslide susceptibility models. Even though perfectly accurate landslide inventories rarely exist, investigations of how landslide inventory-based errors propagate into subsequent statistical landslide susceptibility models are scarce. The main objective of this research was to systematically examine whether and how inventory-based positional inaccuracies of different magnitudes influence modelled relationships, validation results, variable importance and the visual appearance of landslide susceptibility maps. The study was conducted for a landslide-prone site located in the districts of Amstetten and Waidhofen an der Ybbs, eastern Austria, where an earth-slide point inventory was available. The methodological approach comprised an artificial introduction of inventory-based positional errors into the present landslide data set and an in-depth evaluation of subsequent modelling results. Positional errors were introduced by artificially changing the original landslide position by a mean distance of 5, 10, 20, 50 and 120 m. The resulting differently precise response variables were separately used to train logistic regression models. Odds ratios of predictor variables provided insights into modelled relationships. Cross-validation and spatial cross-validation enabled an assessment of predictive performances and permutation-based variable importance. All analyses were additionally carried out with synthetically generated data sets to further verify the findings under rather controlled conditions. The results revealed that an increasing positional inventory-based error was generally related to increasing distortions of modelling and validation results. However, the findings also highlighted that interdependencies between inventory-based spatial inaccuracies and statistical landslide susceptibility models are complex. The systematic comparisons of 12 models provided valuable evidence that the respective error-propagation was not only determined by the degree of positional inaccuracy inherent in the landslide data, but also by the spatial representation of landslides and the environment, landslide magnitude, the characteristics of the study area, the selected classification method and an interplay of predictors within multiple variable models. Based on the results, we deduced that a direct propagation of minor to moderate inventory-based positional errors into modelling results can be partly counteracted by adapting the modelling design (e.g. generalization of input data, opting for strongly generalizing classifiers). Since positional errors within landslide inventories are common and subsequent modelling and validation results are likely to be distorted, the potential existence of inventory-based positional inaccuracies should always be considered when assessing landslide susceptibility by means of empirical models.
Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

NASA Astrophysics Data System (ADS)

Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

2018-07-01

A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.
No-reference image quality assessment based on natural scene statistics and gradient magnitude similarity

NASA Astrophysics Data System (ADS)

Jia, Huizhen; Sun, Quansen; Ji, Zexuan; Wang, Tonghan; Chen, Qiang

2014-11-01

The goal of no-reference/blind image quality assessment (NR-IQA) is to devise a perceptual model that can accurately predict the quality of a distorted image as human opinions, in which feature extraction is an important issue. However, the features used in the state-of-the-art "general purpose" NR-IQA algorithms are usually natural scene statistics (NSS) based or are perceptually relevant; therefore, the performance of these models is limited. To further improve the performance of NR-IQA, we propose a general purpose NR-IQA algorithm which combines NSS-based features with perceptually relevant features. The new method extracts features in both the spatial and gradient domains. In the spatial domain, we extract the point-wise statistics for single pixel values which are characterized by a generalized Gaussian distribution model to form the underlying features. In the gradient domain, statistical features based on neighboring gradient magnitude similarity are extracted. Then a mapping is learned to predict quality scores using a support vector regression. The experimental results on the benchmark image databases demonstrate that the proposed algorithm correlates highly with human judgments of quality and leads to significant performance improvements over state-of-the-art methods.
Finding Groups Using Model-Based Cluster Analysis: Heterogeneous Emotional Self-Regulatory Processes and Heavy Alcohol Use Risk

ERIC Educational Resources Information Center

Mun, Eun Young; von Eye, Alexander; Bates, Marsha E.; Vaschillo, Evgeny G.

2008-01-01

Model-based cluster analysis is a new clustering procedure to investigate population heterogeneity utilizing finite mixture multivariate normal densities. It is an inferentially based, statistically principled procedure that allows comparison of nonnested models using the Bayesian information criterion to compare multiple models and identify the…
Probabilistic Evaluation of Competing Climate Models

NASA Astrophysics Data System (ADS)

Braverman, A. J.; Chatterjee, S.; Heyman, M.; Cressie, N.

2017-12-01

A standard paradigm for assessing the quality of climate model simulations is to compare what these models produce for past and present time periods, to observations of the past and present. Many of these comparisons are based on simple summary statistics called metrics. Here, we propose an alternative: evaluation of competing climate models through probabilities derived from tests of the hypothesis that climate-model-simulated and observed time sequences share common climate-scale signals. The probabilities are based on the behavior of summary statistics of climate model output and observational data, over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise sequences. The statistics we choose come from working in the space of decorrelated and dimension-reduced wavelet coefficients. We compare monthly sequences of CMIP5 model output of average global near-surface temperature anomalies to similar sequences obtained from the well-known HadCRUT4 data set, as an illustration.
Identification of reliable gridded reference data for statistical downscaling methods in Alberta

NASA Astrophysics Data System (ADS)

Eum, H. I.; Gupta, A.

2017-12-01

Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system was developed to allow users to easily select the most reliable reference climate data at each target point based on the elevation of grid cell. By constructing the best combination of reference data for the study domain, the accurate and reliable statistically downscaled climate projections could be significantly improved.
A question of separation: disentangling tracer bias and gravitational non-linearity with counts-in-cells statistics

NASA Astrophysics Data System (ADS)

Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.

2018-02-01

Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.
Uniting statistical and individual-based approaches for animal movement modelling.

PubMed

Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

2014-01-01

The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.
Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

PubMed Central

Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

2014-01-01

The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047
Modelling 1-minute directional observations of the global irradiance.

NASA Astrophysics Data System (ADS)

Thejll, Peter; Pagh Nielsen, Kristian; Andersen, Elsa; Furbo, Simon

2016-04-01

Direct and diffuse irradiances from the sky has been collected at 1-minute intervals for about a year from the experimental station at the Technical University of Denmark for the IEA project "Solar Resource Assessment and Forecasting". These data were gathered by pyrheliometers tracking the Sun, as well as with apertured pyranometers gathering 1/8th and 1/16th of the light from the sky in 45 degree azimuthal ranges pointed around the compass. The data are gathered in order to develop detailed models of the potentially available solar energy and its variations at high temporal resolution in order to gain a more detailed understanding of the solar resource. This is important for a better understanding of the sub-grid scale cloud variation that cannot be resolved with climate and weather models. It is also important for optimizing the operation of active solar energy systems such as photovoltaic plants and thermal solar collector arrays, and for passive solar energy and lighting to buildings. We present regression-based modelling of the observed data, and focus, here, on the statistical properties of the model fits. Using models based on the one hand on what is found in the literature and on physical expectations, and on the other hand on purely statistical models, we find solutions that can explain up to 90% of the variance in global radiation. The models leaning on physical insights include terms for the direct solar radiation, a term for the circum-solar radiation, a diffuse term and a term for the horizon brightening/darkening. The purely statistical model is found using data- and formula-validation approaches picking model expressions from a general catalogue of possible formulae. The method allows nesting of expressions, and the results found are dependent on and heavily constrained by the cross-validation carried out on statistically independent testing and training data-sets. Slightly better fits -- in terms of variance explained -- is found using the purely statistical fitting/searching approach. We describe the methods applied, results found, and discuss the different potentials of the physics- and statistics-only based model-searches.

Bayesian inference based on dual generalized order statistics from the exponentiated Weibull model

NASA Astrophysics Data System (ADS)

Al Sobhi, Mashail M.

2015-02-01

Bayesian estimation for the two parameters and the reliability function of the exponentiated Weibull model are obtained based on dual generalized order statistics (DGOS). Also, Bayesian prediction bounds for future DGOS from exponentiated Weibull model are obtained. The symmetric and asymmetric loss functions are considered for Bayesian computations. The Markov chain Monte Carlo (MCMC) methods are used for computing the Bayes estimates and prediction bounds. The results have been specialized to the lower record values. Comparisons are made between Bayesian and maximum likelihood estimators via Monte Carlo simulation.
Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

PubMed

Austin, Peter C

2010-04-22

Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.
Incorporating GIS and remote sensing for census population disaggregation

NASA Astrophysics Data System (ADS)

Wu, Shuo-Sheng'derek'

Census data are the primary source of demographic data for a variety of researches and applications. For confidentiality issues and administrative purposes, census data are usually released to the public by aggregated areal units. In the United States, the smallest census unit is census blocks. Due to data aggregation, users of census data may have problems in visualizing population distribution within census blocks and estimating population counts for areas not coinciding with census block boundaries. The main purpose of this study is to develop methodology for estimating sub-block areal populations and assessing the estimation errors. The City of Austin, Texas was used as a case study area. Based on tax parcel boundaries and parcel attributes derived from ancillary GIS and remote sensing data, detailed urban land use classes were first classified using a per-field approach. After that, statistical models by land use classes were built to infer population density from other predictor variables, including four census demographic statistics (the Hispanic percentage, the married percentage, the unemployment rate, and per capita income) and three physical variables derived from remote sensing images and building footprints vector data (a landscape heterogeneity statistics, a building pattern statistics, and a building volume statistics). In addition to statistical models, deterministic models were proposed to directly infer populations from building volumes and three housing statistics, including the average space per housing unit, the housing unit occupancy rate, and the average household size. After population models were derived or proposed, how well the models predict populations for another set of sample blocks was assessed. The results show that deterministic models were more accurate than statistical models. Further, by simulating the base unit for modeling from aggregating blocks, I assessed how well the deterministic models estimate sub-unit-level populations. I also assessed the aggregation effects and the resealing effects on sub-unit estimates. Lastly, from another set of mixed-land-use sample blocks, a mixed-land-use model was derived and compared with a residential-land-use model. The results of per-field land use classification are satisfactory with a Kappa accuracy statistics of 0.747. Model Assessments by land use show that population estimates for multi-family land use areas have higher errors than those for single-family land use areas, and population estimates for mixed land use areas have higher errors than those for residential land use areas. The assessments of sub-unit estimates using a simulation approach indicate that smaller areas show higher estimation errors, estimation errors do not relate to the base unit size, and resealing improves all levels of sub-unit estimates.
A new in silico classification model for ready biodegradability, based on molecular fragments.

PubMed

Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina

2014-08-01

Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.
Environmental Concern and Sociodemographic Variables: A Study of Statistical Models

ERIC Educational Resources Information Center

Xiao, Chenyang; McCright, Aaron M.

2007-01-01

Studies of the social bases of environmental concern over the past 30 years have produced somewhat inconsistent results regarding the effects of sociodemographic variables, such as gender, income, and place of residence. The authors argue that model specification errors resulting from violation of two statistical assumptions (interval-level…
Electrospining of polyaniline/poly(lactic acid) ultrathin fibers: process and statistical modeling using a non-gaussian approach

USDA-ARS?s Scientific Manuscript database

Cover: The electrospinning technique was employed to obtain conducting nanofibers based on polyaniline and poly(lactic acid). A statistical model was employed to describe how the process factors (solution concentration, applied voltage, and flow rate) govern the fiber dimensions. Nanofibers down to ...
The epistemology of mathematical and statistical modeling: a quiet methodological revolution.

PubMed

Rodgers, Joseph Lee

2010-01-01

A quiet methodological revolution, a modeling revolution, has occurred over the past several decades, almost without discussion. In contrast, the 20th century ended with contentious argument over the utility of null hypothesis significance testing (NHST). The NHST controversy may have been at least partially irrelevant, because in certain ways the modeling revolution obviated the NHST argument. I begin with a history of NHST and modeling and their relation to one another. Next, I define and illustrate principles involved in developing and evaluating mathematical models. Following, I discuss the difference between using statistical procedures within a rule-based framework and building mathematical models from a scientific epistemology. Only the former is treated carefully in most psychology graduate training. The pedagogical implications of this imbalance and the revised pedagogy required to account for the modeling revolution are described. To conclude, I discuss how attention to modeling implies shifting statistical practice in certain progressive ways. The epistemological basis of statistics has moved away from being a set of procedures, applied mechanistically, and moved toward building and evaluating statistical and scientific models. Copyrigiht 2009 APA, all rights reserved.
Statistical analysis of target acquisition sensor modeling experiments

NASA Astrophysics Data System (ADS)

Deaver, Dawne M.; Moyer, Steve

2015-05-01

The U.S. Army RDECOM CERDEC NVESD Modeling and Simulation Division is charged with the development and advancement of military target acquisition models to estimate expected soldier performance when using all types of imaging sensors. Two elements of sensor modeling are (1) laboratory-based psychophysical experiments used to measure task performance and calibrate the various models and (2) field-based experiments used to verify the model estimates for specific sensors. In both types of experiments, it is common practice to control or measure environmental, sensor, and target physical parameters in order to minimize uncertainty of the physics based modeling. Predicting the minimum number of test subjects required to calibrate or validate the model should be, but is not always, done during test planning. The objective of this analysis is to develop guidelines for test planners which recommend the number and types of test samples required to yield a statistically significant result.
Maximum entropy models of ecosystem functioning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bertram, Jason, E-mail: jason.bertram@anu.edu.au

2014-12-05

Using organism-level traits to deduce community-level relationships is a fundamental problem in theoretical ecology. This problem parallels the physical one of using particle properties to deduce macroscopic thermodynamic laws, which was successfully achieved with the development of statistical physics. Drawing on this parallel, theoretical ecologists from Lotka onwards have attempted to construct statistical mechanistic theories of ecosystem functioning. Jaynes’ broader interpretation of statistical mechanics, which hinges on the entropy maximisation algorithm (MaxEnt), is of central importance here because the classical foundations of statistical physics do not have clear ecological analogues (e.g. phase space, dynamical invariants). However, models based on themore » information theoretic interpretation of MaxEnt are difficult to interpret ecologically. Here I give a broad discussion of statistical mechanical models of ecosystem functioning and the application of MaxEnt in these models. Emphasising the sample frequency interpretation of MaxEnt, I show that MaxEnt can be used to construct models of ecosystem functioning which are statistical mechanical in the traditional sense using a savanna plant ecology model as an example.« less
Manifold parametrization of the left ventricle for a statistical modelling of its complete anatomy

NASA Astrophysics Data System (ADS)

Gil, D.; Garcia-Barnes, J.; Hernández-Sabate, A.; Marti, E.

2010-03-01

Distortion of Left Ventricle (LV) external anatomy is related to some dysfunctions, such as hypertrophy. The architecture of myocardial fibers determines LV electromechanical activation patterns as well as mechanics. Thus, their joined modelling would allow the design of specific interventions (such as peacemaker implantation and LV remodelling) and therapies (such as resynchronization). On one hand, accurate modelling of external anatomy requires either a dense sampling or a continuous infinite dimensional approach, which requires non-Euclidean statistics. On the other hand, computation of fiber models requires statistics on Riemannian spaces. Most approaches compute separate statistical models for external anatomy and fibers architecture. In this work we propose a general mathematical framework based on differential geometry concepts for computing a statistical model including, both, external and fiber anatomy. Our framework provides a continuous approach to external anatomy supporting standard statistics. We also provide a straightforward formula for the computation of the Riemannian fiber statistics. We have applied our methodology to the computation of complete anatomical atlas of canine hearts from diffusion tensor studies. The orientation of fibers over the average external geometry agrees with the segmental description of orientations reported in the literature.
Tour-based model development for TxDOT : implementation steps for the tour-based model design option and the data needs.

DOT National Transportation Integrated Search

2009-10-01

Travel demand modeling, in recent years, has seen a paradigm shift with an emphasis on analyzing travel at the : individual level rather than using direct statistical projections of aggregate travel demand as in the trip-based : approach. Specificall...
The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

PubMed

Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

2013-05-01

When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.
Consequences of Base Time for Redundant Signals Experiments

PubMed Central

Townsend, James T.; Honey, Christopher

2007-01-01

We report analytical and computational investigations into the effects of base time on the diagnosticity of two popular theoretical tools in the redundant signals literature: (1) the race model inequality and (2) the capacity coefficient. We show analytically and without distributional assumptions that the presence of base time decreases the sensitivity of both of these measures to model violations. We further use simulations to investigate the statistical power model selection tools based on the race model inequality, both with and without base time. Base time decreases statistical power, and biases the race model test toward conservatism. The magnitude of this biasing effect increases as we increase the proportion of total reaction time variance contributed by base time. We marshal empirical evidence to suggest that the proportion of reaction time variance contributed by base time is relatively small, and that the effects of base time on the diagnosticity of our model-selection tools are therefore likely to be minor. However, uncertainty remains concerning the magnitude and even the definition of base time. Experimentalists should continue to be alert to situations in which base time may contribute a large proportion of the total reaction time variance. PMID:18670591
Statistical dielectronic recombination rates for multielectron ions in plasma

NASA Astrophysics Data System (ADS)

Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.

2017-10-01

We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.
Hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis method for mid-frequency analysis of built-up systems with epistemic uncertainties

NASA Astrophysics Data System (ADS)

Yin, Shengwen; Yu, Dejie; Yin, Hui; Lü, Hui; Xia, Baizhan

2017-09-01

Considering the epistemic uncertainties within the hybrid Finite Element/Statistical Energy Analysis (FE/SEA) model when it is used for the response analysis of built-up systems in the mid-frequency range, the hybrid Evidence Theory-based Finite Element/Statistical Energy Analysis (ETFE/SEA) model is established by introducing the evidence theory. Based on the hybrid ETFE/SEA model and the sub-interval perturbation technique, the hybrid Sub-interval Perturbation and Evidence Theory-based Finite Element/Statistical Energy Analysis (SIP-ETFE/SEA) approach is proposed. In the hybrid ETFE/SEA model, the uncertainty in the SEA subsystem is modeled by a non-parametric ensemble, while the uncertainty in the FE subsystem is described by the focal element and basic probability assignment (BPA), and dealt with evidence theory. Within the hybrid SIP-ETFE/SEA approach, the mid-frequency response of interest, such as the ensemble average of the energy response and the cross-spectrum response, is calculated analytically by using the conventional hybrid FE/SEA method. Inspired by the probability theory, the intervals of the mean value, variance and cumulative distribution are used to describe the distribution characteristics of mid-frequency responses of built-up systems with epistemic uncertainties. In order to alleviate the computational burdens for the extreme value analysis, the sub-interval perturbation technique based on the first-order Taylor series expansion is used in ETFE/SEA model to acquire the lower and upper bounds of the mid-frequency responses over each focal element. Three numerical examples are given to illustrate the feasibility and effectiveness of the proposed method.
Predicting adsorptive removal of chlorophenol from aqueous solution using artificial intelligence based modeling approaches.

PubMed

Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali

2013-04-01

The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.
RooStatsCms: A tool for analysis modelling, combination and statistical studies

NASA Astrophysics Data System (ADS)

Piparo, D.; Schott, G.; Quast, G.

2010-04-01

RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.
Animal movement: Statistical models for telemetry data

USGS Publications Warehouse

Hooten, Mevin B.; Johnson, Devin S.; McClintock, Brett T.; Morales, Juan M.

2017-01-01

The study of animal movement has always been a key element in ecological science, because it is inherently linked to critical processes that scale from individuals to populations and communities to ecosystems. Rapid improvements in biotelemetry data collection and processing technology have given rise to a variety of statistical methods for characterizing animal movement. The book serves as a comprehensive reference for the types of statistical models used to study individual-based animal movement.
Evaluating statistical cloud schemes: What can we gain from ground-based remote sensing?

NASA Astrophysics Data System (ADS)

Grützun, V.; Quaas, J.; Morcrette, C. J.; Ament, F.

2013-09-01

Statistical cloud schemes with prognostic probability distribution functions have become more important in atmospheric modeling, especially since they are in principle scale adaptive and capture cloud physics in more detail. While in theory the schemes have a great potential, their accuracy is still questionable. High-resolution three-dimensional observational data of water vapor and cloud water, which could be used for testing them, are missing. We explore the potential of ground-based remote sensing such as lidar, microwave, and radar to evaluate prognostic distribution moments using the "perfect model approach." This means that we employ a high-resolution weather model as virtual reality and retrieve full three-dimensional atmospheric quantities and virtual ground-based observations. We then use statistics from the virtual observation to validate the modeled 3-D statistics. Since the data are entirely consistent, any discrepancy occurring is due to the method. Focusing on total water mixing ratio, we find that the mean ratio can be evaluated decently but that it strongly depends on the meteorological conditions as to whether the variance and skewness are reliable. Using some simple schematic description of different synoptic conditions, we show how statistics obtained from point or line measurements can be poor at representing the full three-dimensional distribution of water in the atmosphere. We argue that a careful analysis of measurement data and detailed knowledge of the meteorological situation is necessary to judge whether we can use the data for an evaluation of higher moments of the humidity distribution used by a statistical cloud scheme.
Assessing socioeconomic vulnerability to dengue fever in Cali, Colombia: statistical vs expert-based modeling

PubMed Central

2013-01-01

Background As a result of changes in climatic conditions and greater resistance to insecticides, many regions across the globe, including Colombia, have been facing a resurgence of vector-borne diseases, and dengue fever in particular. Timely information on both (1) the spatial distribution of the disease, and (2) prevailing vulnerabilities of the population are needed to adequately plan targeted preventive intervention. We propose a methodology for the spatial assessment of current socioeconomic vulnerabilities to dengue fever in Cali, a tropical urban environment of Colombia. Methods Based on a set of socioeconomic and demographic indicators derived from census data and ancillary geospatial datasets, we develop a spatial approach for both expert-based and purely statistical-based modeling of current vulnerability levels across 340 neighborhoods of the city using a Geographic Information System (GIS). The results of both approaches are comparatively evaluated by means of spatial statistics. A web-based approach is proposed to facilitate the visualization and the dissemination of the output vulnerability index to the community. Results The statistical and the expert-based modeling approach exhibit a high concordance, globally, and spatially. The expert-based approach indicates a slightly higher vulnerability mean (0.53) and vulnerability median (0.56) across all neighborhoods, compared to the purely statistical approach (mean = 0.48; median = 0.49). Both approaches reveal that high values of vulnerability tend to cluster in the eastern, north-eastern, and western part of the city. These are poor neighborhoods with high percentages of young (i.e., < 15 years) and illiterate residents, as well as a high proportion of individuals being either unemployed or doing housework. Conclusions Both modeling approaches reveal similar outputs, indicating that in the absence of local expertise, statistical approaches could be used, with caution. By decomposing identified vulnerability “hotspots” into their underlying factors, our approach provides valuable information on both (1) the location of neighborhoods, and (2) vulnerability factors that should be given priority in the context of targeted intervention strategies. The results support decision makers to allocate resources in a manner that may reduce existing susceptibilities and strengthen resilience, and thus help to reduce the burden of vector-borne diseases. PMID:23945265

SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

PubMed

Chu, Annie; Cui, Jenny; Dinov, Ivo D

2009-03-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
New statistical potential for quality assessment of protein models and a survey of energy functions

PubMed Central

2010-01-01

Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality. PMID:20226048
Set statistics in conductive bridge random access memory device with Cu/HfO{sub 2}/Pt structure

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Meiyun; Long, Shibing, E-mail: longshibing@ime.ac.cn; Wang, Guoming

2014-11-10

The switching parameter variation of resistive switching memory is one of the most important challenges in its application. In this letter, we have studied the set statistics of conductive bridge random access memory with a Cu/HfO{sub 2}/Pt structure. The experimental distributions of the set parameters in several off resistance ranges are shown to nicely fit a Weibull model. The Weibull slopes of the set voltage and current increase and decrease logarithmically with off resistance, respectively. This experimental behavior is perfectly captured by a Monte Carlo simulator based on the cell-based set voltage statistics model and the Quantum Point Contact electronmore » transport model. Our work provides indications for the improvement of the switching uniformity.« less
On prognostic models, artificial intelligence and censored observations.

PubMed

Anand, S S; Hamilton, P W; Hughes, J G; Bell, D A

2001-03-01

The development of prognostic models for assisting medical practitioners with decision making is not a trivial task. Models need to possess a number of desirable characteristics and few, if any, current modelling approaches based on statistical or artificial intelligence can produce models that display all these characteristics. The inability of modelling techniques to provide truly useful models has led to interest in these models being purely academic in nature. This in turn has resulted in only a very small percentage of models that have been developed being deployed in practice. On the other hand, new modelling paradigms are being proposed continuously within the machine learning and statistical community and claims, often based on inadequate evaluation, being made on their superiority over traditional modelling methods. We believe that for new modelling approaches to deliver true net benefits over traditional techniques, an evaluation centric approach to their development is essential. In this paper we present such an evaluation centric approach to developing extensions to the basic k-nearest neighbour (k-NN) paradigm. We use standard statistical techniques to enhance the distance metric used and a framework based on evidence theory to obtain a prediction for the target example from the outcome of the retrieved exemplars. We refer to this new k-NN algorithm as Censored k-NN (Ck-NN). This reflects the enhancements made to k-NN that are aimed at providing a means for handling censored observations within k-NN.
Towards Accurate Modelling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

NASA Astrophysics Data System (ADS)

Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

2018-04-01

Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter halos. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the "accurate" regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard ΛCDM + halo model against the clustering of SDSS DR7 galaxies. Specifically, we use the projected correlation function, group multiplicity function and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir halos) matches the clustering of low luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the "standard" halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Statistical validation of normal tissue complication probability models.

PubMed

Xu, Cheng-Jian; van der Schaaf, Arjen; Van't Veld, Aart A; Langendijk, Johannes A; Schilstra, Cornelis

2012-09-01

To investigate the applicability and value of double cross-validation and permutation tests as established statistical approaches in the validation of normal tissue complication probability (NTCP) models. A penalized regression method, LASSO (least absolute shrinkage and selection operator), was used to build NTCP models for xerostomia after radiation therapy treatment of head-and-neck cancer. Model assessment was based on the likelihood function and the area under the receiver operating characteristic curve. Repeated double cross-validation showed the uncertainty and instability of the NTCP models and indicated that the statistical significance of model performance can be obtained by permutation testing. Repeated double cross-validation and permutation tests are recommended to validate NTCP models before clinical use. Copyright © 2012 Elsevier Inc. All rights reserved.
New approach in the quantum statistical parton distribution

NASA Astrophysics Data System (ADS)

Sohaily, Sozha; Vaziri (Khamedi), Mohammad

2017-12-01

An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.
Statistically optimal perception and learning: from behavior to neural representations

PubMed Central

Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté

2010-01-01

Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
Concept of Fractal Dimension use of Multifractal Cloud Liquid Models Based on Real Data as Input to Monte Carlo Radiation Models

NASA Technical Reports Server (NTRS)

Wiscombe, W.

1999-01-01

The purpose of this paper is discuss the concept of fractal dimension; multifractal statistics as an extension of this; the use of simple multifractal statistics (power spectrum, structure function) to characterize cloud liquid water data; and to understand the use of multifractal cloud liquid water models based on real data as input to Monte Carlo radiation models of shortwave radiation transfer in 3D clouds, and the consequences of this in two areas: the design of aircraft field programs to measure cloud absorptance; and the explanation of the famous "Landsat scale break" in measured radiance.
Local sensitivity analysis for inverse problems solved by singular value decomposition

USGS Publications Warehouse

Hill, M.C.; Nolan, B.T.

2010-01-01

Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.
Theoretic aspects of the identification of the parameters in the optimal control model

NASA Technical Reports Server (NTRS)

Vanwijk, R. A.; Kok, J. J.

1977-01-01

The identification of the parameters of the optimal control model from input-output data of the human operator is considered. Accepting the basic structure of the model as a cascade of a full-order observer and a feedback law, and suppressing the inherent optimality of the human controller, the parameters to be identified are the feedback matrix, the observer gain matrix, and the intensity matrices of the observation noise and the motor noise. The identification of the parameters is a statistical problem, because the system and output are corrupted by noise, and therefore the solution must be based on the statistics (probability density function) of the input and output data of the human operator. However, based on the statistics of the input-output data of the human operator, no distinction can be made between the observation and the motor noise, which shows that the model suffers from overparameterization.
Nonlinear multi-analysis of agent-based financial market dynamics by epidemic system

NASA Astrophysics Data System (ADS)

Lu, Yunfan; Wang, Jun; Niu, Hongli

2015-10-01

Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.
Applying the multivariate time-rescaling theorem to neural population models

PubMed Central

Gerhard, Felipe; Haslinger, Robert; Pipa, Gordon

2011-01-01

Statistical models of neural activity are integral to modern neuroscience. Recently, interest has grown in modeling the spiking activity of populations of simultaneously recorded neurons to study the effects of correlations and functional connectivity on neural information processing. However any statistical model must be validated by an appropriate goodness-of-fit test. Kolmogorov-Smirnov tests based upon the time-rescaling theorem have proven to be useful for evaluating point-process-based statistical models of single-neuron spike trains. Here we discuss the extension of the time-rescaling theorem to the multivariate (neural population) case. We show that even in the presence of strong correlations between spike trains, models which neglect couplings between neurons can be erroneously passed by the univariate time-rescaling test. We present the multivariate version of the time-rescaling theorem, and provide a practical step-by-step procedure for applying it towards testing the sufficiency of neural population models. Using several simple analytically tractable models and also more complex simulated and real data sets, we demonstrate that important features of the population activity can only be detected using the multivariate extension of the test. PMID:21395436
Improving the Validity of Activity of Daily Living Dependency Risk Assessment

PubMed Central

Clark, Daniel O.; Stump, Timothy E.; Tu, Wanzhu; Miller, Douglas K.

2015-01-01

Objectives Efforts to prevent activity of daily living (ADL) dependency may be improved through models that assess older adults’ dependency risk. We evaluated whether cognition and gait speed measures improve the predictive validity of interview-based models. Method Participants were 8,095 self-respondents in the 2006 Health and Retirement Survey who were aged 65 years or over and independent in five ADLs. Incident ADL dependency was determined from the 2008 interview. Models were developed using random 2/3rd cohorts and validated in the remaining 1/3rd. Results Compared to a c-statistic of 0.79 in the best interview model, the model including cognitive measures had c-statistics of 0.82 and 0.80 while the best fitting gait speed model had c-statistics of 0.83 and 0.79 in the development and validation cohorts, respectively. Conclusion Two relatively brief models, one that requires an in-person assessment and one that does not, had excellent validity for predicting incident ADL dependency but did not significantly improve the predictive validity of the best fitting interview-based models. PMID:24652867
BCM: toolkit for Bayesian analysis of Computational Models using samplers.

PubMed

Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

2016-10-21

Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Development of a funding, cost, and spending model for satellite projects

NASA Technical Reports Server (NTRS)

Johnson, Jesse P.

1989-01-01

The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.
2008 GEM Modeling Challenge: Metrics Study of the Dst Index in Physics-Based Magnetosphere and Ring Current Models and in Statistical and Analytic Specifications

NASA Technical Reports Server (NTRS)

Rastaetter, L.; Kuznetsova, M.; Hesse, M.; Pulkkinen, A.; Glocer, A.; Yu, Y.; Meng, X.; Raeder, J.; Wiltberger, M.; Welling, D.;

2011-01-01

In this paper the metrics-based results of the Dst part of the 2008-2009 GEM Metrics Challenge are reported. The Metrics Challenge asked modelers to submit results for 4 geomagnetic storm events and 5 different types of observations that can be modeled by statistical or climatological or physics-based (e.g. MHD) models of the magnetosphere-ionosphere system. We present the results of over 25 model settings that were run at the Community Coordinated Modeling Center (CCMC) and at the institutions of various modelers for these events. To measure the performance of each of the models against the observations we use comparisons of one-hour averaged model data with the Dst index issued by the World Data Center for Geomagnetism, Kyoto, Japan, and direct comparison of one-minute model data with the one-minute Dst index calculated by the United States Geologic Survey (USGS).

Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

NASA Astrophysics Data System (ADS)

Sergeenko, N. P.

2017-11-01

An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
A Model for Investigating Predictive Validity at Highly Selective Institutions.

ERIC Educational Resources Information Center

Gross, Alan L.; And Others

A statistical model for investigating predictive validity at highly selective institutions is described. When the selection ratio is small, one must typically deal with a data set containing relatively large amounts of missing data on both criterion and predictor variables. Standard statistical approaches are based on the strong assumption that…
The Development and Demonstration of Multiple Regression Models for Operant Conditioning Questions.

ERIC Educational Resources Information Center

Fanning, Fred; Newman, Isadore

Based on the assumption that inferential statistics can make the operant conditioner more sensitive to possible significant relationships, regressions models were developed to test the statistical significance between slopes and Y intercepts of the experimental and control group subjects. These results were then compared to the traditional operant…

The Effects of Selection Strategies for Bivariate Loglinear Smoothing Models on NEAT Equating Functions

ERIC Educational Resources Information Center

Moses, Tim; Holland, Paul W.

2010-01-01

In this study, eight statistical strategies were evaluated for selecting the parameterizations of loglinear models for smoothing the bivariate test score distributions used in nonequivalent groups with anchor test (NEAT) equating. Four of the strategies were based on significance tests of chi-square statistics (Likelihood Ratio, Pearson,…
GPCC - A weather generator-based statistical downscaling tool for site-specific assessment of climate change impacts

USDA-ARS?s Scientific Manuscript database

Resolution of climate model outputs are too coarse to be used as direct inputs to impact models for assessing climate change impacts on agricultural production, water resources, and eco-system services at local or site-specific scales. Statistical downscaling approaches are usually used to bridge th...
Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies.

PubMed

Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

2018-03-15

Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. We propose a methodology based on Cox mixed models and written under the R language. This semiparametric model is indeed flexible enough to fit duration data. To compare log-linear and Cox mixed models in terms of goodness-of-fit on real data sets, we also provide a procedure based on simulations and quantile-quantile plots. We present two examples from a data set of speech and gesture interactions, which illustrate the limitations of linear and log-linear mixed models, as compared to Cox models. The linear models are not validated on our data, whereas Cox models are. Moreover, in the second example, the Cox model exhibits a significant effect that the linear model does not. We provide methods to select the best-fitting models for repeated duration data and to compare statistical methodologies. In this study, we show that Cox models are best suited to the analysis of our data set.
Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

NASA Astrophysics Data System (ADS)

Cianciara, Aleksander

2016-09-01

The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.
Directional statistics-based reflectance model for isotropic bidirectional reflectance distribution functions.

PubMed

Nishino, Ko; Lombardi, Stephen

2011-01-01

We introduce a novel parametric bidirectional reflectance distribution function (BRDF) model that can accurately encode a wide variety of real-world isotropic BRDFs with a small number of parameters. The key observation we make is that a BRDF may be viewed as a statistical distribution on a unit hemisphere. We derive a novel directional statistics distribution, which we refer to as the hemispherical exponential power distribution, and model real-world isotropic BRDFs as mixtures of it. We derive a canonical probabilistic method for estimating the parameters, including the number of components, of this novel directional statistics BRDF model. We show that the model captures the full spectrum of real-world isotropic BRDFs with high accuracy, but a small footprint. We also demonstrate the advantages of the novel BRDF model by showing its use for reflection component separation and for exploring the space of isotropic BRDFs.
Silica exposure during construction activities: statistical modeling of task-based measurements from the literature.

PubMed

Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme

2013-05-01

Many construction activities can put workers at risk of breathing silica containing dusts, and there is an important body of literature documenting exposure levels using a task-based strategy. In this study, statistical modeling was used to analyze a data set containing 1466 task-based, personal respirable crystalline silica (RCS) measurements gathered from 46 sources to estimate exposure levels during construction tasks and the effects of determinants of exposure. Monte-Carlo simulation was used to recreate individual exposures from summary parameters, and the statistical modeling involved multimodel inference with Tobit models containing combinations of the following exposure variables: sampling year, sampling duration, construction sector, project type, workspace, ventilation, and controls. Exposure levels by task were predicted based on the median reported duration by activity, the year 1998, absence of source control methods, and an equal distribution of the other determinants of exposure. The model containing all the variables explained 60% of the variability and was identified as the best approximating model. Of the 27 tasks contained in the data set, abrasive blasting, masonry chipping, scabbling concrete, tuck pointing, and tunnel boring had estimated geometric means above 0.1mg m(-3) based on the exposure scenario developed. Water-fed tools and local exhaust ventilation were associated with a reduction of 71 and 69% in exposure levels compared with no controls, respectively. The predictive model developed can be used to estimate RCS concentrations for many construction activities in a wide range of circumstances.
Improving satellite-based PM2.5 estimates in China using Gaussian processes modeling in a Bayesian hierarchical setting.

PubMed

Yu, Wenxi; Liu, Yang; Ma, Zongwei; Bi, Jun

2017-08-01

Using satellite-based aerosol optical depth (AOD) measurements and statistical models to estimate ground-level PM 2.5 is a promising way to fill the areas that are not covered by ground PM 2.5 monitors. The statistical models used in previous studies are primarily Linear Mixed Effects (LME) and Geographically Weighted Regression (GWR) models. In this study, we developed a new regression model between PM 2.5 and AOD using Gaussian processes in a Bayesian hierarchical setting. Gaussian processes model the stochastic nature of the spatial random effects, where the mean surface and the covariance function is specified. The spatial stochastic process is incorporated under the Bayesian hierarchical framework to explain the variation of PM 2.5 concentrations together with other factors, such as AOD, spatial and non-spatial random effects. We evaluate the results of our model and compare them with those of other, conventional statistical models (GWR and LME) by within-sample model fitting and out-of-sample validation (cross validation, CV). The results show that our model possesses a CV result (R 2 = 0.81) that reflects higher accuracy than that of GWR and LME (0.74 and 0.48, respectively). Our results indicate that Gaussian process models have the potential to improve the accuracy of satellite-based PM 2.5 estimates.
A neural network model of metaphor understanding with dynamic interaction based on a statistical language analysis: targeting a human-like model.

PubMed

Terai, Asuka; Nakagawa, Masanori

2007-08-01

The purpose of this paper is to construct a model that represents the human process of understanding metaphors, focusing specifically on similes of the form an "A like B". Generally speaking, human beings are able to generate and understand many sorts of metaphors. This study constructs the model based on a probabilistic knowledge structure for concepts which is computed from a statistical analysis of a large-scale corpus. Consequently, this model is able to cover the many kinds of metaphors that human beings can generate. Moreover, the model implements the dynamic process of metaphor understanding by using a neural network with dynamic interactions. Finally, the validity of the model is confirmed by comparing model simulations with the results from a psychological experiment.
Comparison of two statistical methods for probability prediction of monthly precipitation during summer over Huaihe River Basin in China, and applications in runoff prediction based on hydrological model

NASA Astrophysics Data System (ADS)

Liu, L.; Du, L.; Liao, Y.

2017-12-01

Based on the ensemble hindcast dataset of CSM1.1m by NCC, CMA, Bayesian merging models and a two-step statistical model are developed and employed to predict monthly grid/station precipitation in the Huaihe River China during summer at the lead-time of 1 to 3 months. The hindcast datasets span a period of 1991 to 2014. The skill of the two models is evaluated using area under the ROC curve (AUC) in a leave-one-out cross-validation framework, and is compared to the skill of CSM1.1m. CSM1.1m has highest skill for summer precipitation from April while lowest from May, and has highest skill for precipitation in June but lowest for precipitation in July. Compared with raw outputs of climate models, some schemes of the two approaches have higher skill for the prediction from March and May, but almost schemes have lower skill for prediction from April. Compared to two-step approach, one sampling scheme of Bayesian merging approach has higher skill for the prediction from March, but has lower skill from May. The results suggest that there is potential to apply the two statistical models for monthly precipitation forecast in summer from March and from May over Huaihe River basin, but is potential to apply CSM1.1m forecast from April. Finally, the summer runoff during 1991 to 2014 is simulated based on one hydrological model using the climate hindcast of CSM1.1m and the two statistical models.
A simple rain attenuation model for earth-space radio links operating at 10-35 GHz

NASA Technical Reports Server (NTRS)

Stutzman, W. L.; Yon, K. M.

1986-01-01

The simple attenuation model has been improved from an earlier version and now includes the effect of wave polarization. The model is for the prediction of rain attenuation statistics on earth-space communication links operating in the 10-35 GHz band. Simple calculations produce attenuation values as a function of average rain rate. These together with rain rate statistics (either measured or predicted) can be used to predict annual rain attenuation statistics. In this paper model predictions are compared to measured data from a data base of 62 experiments performed in the U.S., Europe, and Japan. Comparisons are also made to predictions from other models.
Examination of Solar Cycle Statistical Model and New Prediction of Solar Cycle 23

NASA Technical Reports Server (NTRS)

Kim, Myung-Hee Y.; Wilson, John W.

2000-01-01

Sunspot numbers in the current solar cycle 23 were estimated by using a statistical model with the accumulating cycle sunspot data based on the odd-even behavior of historical sunspot cycles from 1 to 22. Since cycle 23 has progressed and the accurate solar minimum occurrence has been defined, the statistical model is validated by comparing the previous prediction with the new measured sunspot number; the improved sunspot projection in short range of future time is made accordingly. The current cycle is expected to have a moderate level of activity. Errors of this model are shown to be self-correcting as cycle observations become available.
Pattern-Based Inverse Modeling for Characterization of Subsurface Flow Models with Complex Geologic Heterogeneity

NASA Astrophysics Data System (ADS)

Golmohammadi, A.; Jafarpour, B.; M Khaninezhad, M. R.

2017-12-01

Calibration of heterogeneous subsurface flow models leads to ill-posed nonlinear inverse problems, where too many unknown parameters are estimated from limited response measurements. When the underlying parameters form complex (non-Gaussian) structured spatial connectivity patterns, classical variogram-based geostatistical techniques cannot describe the underlying connectivity patterns. Modern pattern-based geostatistical methods that incorporate higher-order spatial statistics are more suitable for describing such complex spatial patterns. Moreover, when the underlying unknown parameters are discrete (geologic facies distribution), conventional model calibration techniques that are designed for continuous parameters cannot be applied directly. In this paper, we introduce a novel pattern-based model calibration method to reconstruct discrete and spatially complex facies distributions from dynamic flow response data. To reproduce complex connectivity patterns during model calibration, we impose a feasibility constraint to ensure that the solution follows the expected higher-order spatial statistics. For model calibration, we adopt a regularized least-squares formulation, involving data mismatch, pattern connectivity, and feasibility constraint terms. Using an alternating directions optimization algorithm, the regularized objective function is divided into a continuous model calibration problem, followed by mapping the solution onto the feasible set. The feasibility constraint to honor the expected spatial statistics is implemented using a supervised machine learning algorithm. The two steps of the model calibration formulation are repeated until the convergence criterion is met. Several numerical examples are used to evaluate the performance of the developed method.
Quantifying Variation in Gait Features from Wearable Inertial Sensors Using Mixed Effects Models

PubMed Central

Cresswell, Kellen Garrison; Shin, Yongyun; Chen, Shanshan

2017-01-01

The emerging technology of wearable inertial sensors has shown its advantages in collecting continuous longitudinal gait data outside laboratories. This freedom also presents challenges in collecting high-fidelity gait data. In the free-living environment, without constant supervision from researchers, sensor-based gait features are susceptible to variation from confounding factors such as gait speed and mounting uncertainty, which are challenging to control or estimate. This paper is one of the first attempts in the field to tackle such challenges using statistical modeling. By accepting the uncertainties and variation associated with wearable sensor-based gait data, we shift our efforts from detecting and correcting those variations to modeling them statistically. From gait data collected on one healthy, non-elderly subject during 48 full-factorial trials, we identified four major sources of variation, and quantified their impact on one gait outcome—range per cycle—using a random effects model and a fixed effects model. The methodology developed in this paper lays the groundwork for a statistical framework to account for sources of variation in wearable gait data, thus facilitating informative statistical inference for free-living gait analysis. PMID:28245602
Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

NASA Astrophysics Data System (ADS)

Yao, Ruigen; Pakzad, Shamim N.

2012-08-01

Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.
A Statistical Framework for Protein Quantitation in Bottom-Up MS-Based Proteomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpievitch, Yuliya; Stanley, Jeffrey R.; Taverner, Thomas

2009-08-15

Motivation: Quantitative mass spectrometry-based proteomics requires protein-level estimates and associated confidence measures. Challenges include the presence of low quality or incorrectly identified peptides and informative missingness. Furthermore, models are required for rolling peptide-level information up to the protein level. Results: We present a statistical model that carefully accounts for informative missingness in peak intensities and allows unbiased, model-based, protein-level estimation and inference. The model is applicable to both label-based and label-free quantitation experiments. We also provide automated, model-based, algorithms for filtering of proteins and peptides as well as imputation of missing values. Two LC/MS datasets are used to illustrate themore » methods. In simulation studies, our methods are shown to achieve substantially more discoveries than standard alternatives. Availability: The software has been made available in the opensource proteomics platform DAnTE (http://omics.pnl.gov/software/). Contact: adabney@stat.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.« less
Environmental Interfaces in Teaching Economic Statistics

ERIC Educational Resources Information Center

Campos, Celso; Wodewotzki, Maria Lucia; Jacobini, Otavio; Ferrira, Denise

2016-01-01

The objective of this article is, based on the Critical Statistics Education assumptions, to value some environmental interfaces in teaching Statistics by modeling projects. Due to this, we present a practical case, one in which we address an environmental issue, placed in the context of the teaching of index numbers, within the Statistics…
Statistical Compression for Climate Model Output

NASA Astrophysics Data System (ADS)

Hammerling, D.; Guinness, J.; Soh, Y. J.

2017-12-01

Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.
Computational Analysis for Rocket-Based Combined-Cycle Systems During Rocket-Only Operation

NASA Technical Reports Server (NTRS)

Steffen, C. J., Jr.; Smith, T. D.; Yungster, S.; Keller, D. J.

2000-01-01

A series of Reynolds-averaged Navier-Stokes calculations were employed to study the performance of rocket-based combined-cycle systems operating in an all-rocket mode. This parametric series of calculations were executed within a statistical framework, commonly known as design of experiments. The parametric design space included four geometric and two flowfield variables set at three levels each, for a total of 729 possible combinations. A D-optimal design strategy was selected. It required that only 36 separate computational fluid dynamics (CFD) solutions be performed to develop a full response surface model, which quantified the linear, bilinear, and curvilinear effects of the six experimental variables. The axisymmetric, Reynolds-averaged Navier-Stokes simulations were executed with the NPARC v3.0 code. The response used in the statistical analysis was created from Isp efficiency data integrated from the 36 CFD simulations. The influence of turbulence modeling was analyzed by using both one- and two-equation models. Careful attention was also given to quantify the influence of mesh dependence, iterative convergence, and artificial viscosity upon the resulting statistical model. Thirteen statistically significant effects were observed to have an influence on rocket-based combined-cycle nozzle performance. It was apparent that the free-expansion process, directly downstream of the rocket nozzle, can influence the Isp efficiency. Numerical schlieren images and particle traces have been used to further understand the physical phenomena behind several of the statistically significant results.
Adapting an Agent-Based Model of Socio-Technical Systems to Analyze Security Failures

DTIC Science & Technology

2016-10-17

total number of non-blackouts differed from the total number in the baseline data to a statistically significant extent with a p- valueɘ.0003...the total number of non-blackouts differed from the total number in the baseline data to a statistically significant extent with a p-valueɘ.0003...I. Nikolic, and Z. Lukszo, Eds., Agent-based modelling of socio-technical systems. Springer Science & Business Media, 2013, vol. 9. [12] A. P. Shaw
A general science-based framework for dynamical spatio-temporal models

USGS Publications Warehouse

Wikle, C.K.; Hooten, M.B.

2010-01-01

Spatio-temporal statistical models are increasingly being used across a wide variety of scientific disciplines to describe and predict spatially-explicit processes that evolve over time. Correspondingly, in recent years there has been a significant amount of research on new statistical methodology for such models. Although descriptive models that approach the problem from the second-order (covariance) perspective are important, and innovative work is being done in this regard, many real-world processes are dynamic, and it can be more efficient in some cases to characterize the associated spatio-temporal dependence by the use of dynamical models. The chief challenge with the specification of such dynamical models has been related to the curse of dimensionality. Even in fairly simple linear, first-order Markovian, Gaussian error settings, statistical models are often over parameterized. Hierarchical models have proven invaluable in their ability to deal to some extent with this issue by allowing dependency among groups of parameters. In addition, this framework has allowed for the specification of science based parameterizations (and associated prior distributions) in which classes of deterministic dynamical models (e. g., partial differential equations (PDEs), integro-difference equations (IDEs), matrix models, and agent-based models) are used to guide specific parameterizations. Most of the focus for the application of such models in statistics has been in the linear case. The problems mentioned above with linear dynamic models are compounded in the case of nonlinear models. In this sense, the need for coherent and sensible model parameterizations is not only helpful, it is essential. Here, we present an overview of a framework for incorporating scientific information to motivate dynamical spatio-temporal models. First, we illustrate the methodology with the linear case. We then develop a general nonlinear spatio-temporal framework that we call general quadratic nonlinearity and demonstrate that it accommodates many different classes of scientific-based parameterizations as special cases. The model is presented in a hierarchical Bayesian framework and is illustrated with examples from ecology and oceanography. ?? 2010 Sociedad de Estad??stica e Investigaci??n Operativa.

Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

USGS Publications Warehouse

Royle, J. Andrew; Dorazio, Robert M.

2008-01-01

A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.
Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

USGS Publications Warehouse

Lee, L.; Helsel, D.

2005-01-01

Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
In silico model-based inference: a contemporary approach for hypothesis testing in network biology

PubMed Central

Klinke, David J.

2014-01-01

Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900’s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. PMID:25139179
In silico model-based inference: a contemporary approach for hypothesis testing in network biology.

PubMed

Klinke, David J

2014-01-01

Inductive inference plays a central role in the study of biological systems where one aims to increase their understanding of the system by reasoning backwards from uncertain observations to identify causal relationships among components of the system. These causal relationships are postulated from prior knowledge as a hypothesis or simply a model. Experiments are designed to test the model. Inferential statistics are used to establish a level of confidence in how well our postulated model explains the acquired data. This iterative process, commonly referred to as the scientific method, either improves our confidence in a model or suggests that we revisit our prior knowledge to develop a new model. Advances in technology impact how we use prior knowledge and data to formulate models of biological networks and how we observe cellular behavior. However, the approach for model-based inference has remained largely unchanged since Fisher, Neyman and Pearson developed the ideas in the early 1900s that gave rise to what is now known as classical statistical hypothesis (model) testing. Here, I will summarize conventional methods for model-based inference and suggest a contemporary approach to aid in our quest to discover how cells dynamically interpret and transmit information for therapeutic aims that integrates ideas drawn from high performance computing, Bayesian statistics, and chemical kinetics. © 2014 American Institute of Chemical Engineers.
Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

DOE PAGES

Calvin, Kate; Fisher-Vanden, Karen

2017-10-27

Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparisonmore » between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between - 12% and + 15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.« less
Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

NASA Astrophysics Data System (ADS)

Calvin, Kate; Fisher-Vanden, Karen

2017-11-01

Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparison between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between -12% and +15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.
Quantifying the indirect impacts of climate on agriculture: an inter-method comparison

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calvin, Kate; Fisher-Vanden, Karen

Climate change and increases in CO2 concentration affect the productivity of land, with implications for land use, land cover, and agricultural production. Much of the literature on the effect of climate on agriculture has focused on linking projections of changes in climate to process-based or statistical crop models. However, the changes in productivity have broader economic implications that cannot be quantified in crop models alone. How important are these socio-economic feedbacks to a comprehensive assessment of the impacts of climate change on agriculture? In this paper, we attempt to measure the importance of these interaction effects through an inter-method comparisonmore » between process models, statistical models, and integrated assessment model (IAMs). We find the impacts on crop yields vary widely between these three modeling approaches. Yield impacts generated by the IAMs are 20%-40% higher than the yield impacts generated by process-based or statistical crop models, with indirect climate effects adjusting yields by between - 12% and + 15% (e.g. input substitution and crop switching). The remaining effects are due to technological change.« less
Towards a General Turbulence Model for Planetary Boundary Layers Based on Direct Statistical Simulation

NASA Astrophysics Data System (ADS)

Skitka, J.; Marston, B.; Fox-Kemper, B.

2016-02-01

Sub-grid turbulence models for planetary boundary layers are typically constructed additively, starting with local flow properties and including non-local (KPP) or higher order (Mellor-Yamada) parameters until a desired level of predictive capacity is achieved or a manageable threshold of complexity is surpassed. Such approaches are necessarily limited in general circumstances, like global circulation models, by their being optimized for particular flow phenomena. By building a model reductively, starting with the infinite hierarchy of turbulence statistics, truncating at a given order, and stripping degrees of freedom from the flow, we offer the prospect a turbulence model and investigative tool that is equally applicable to all flow types and able to take full advantage of the wealth of nonlocal information in any flow. Direct statistical simulation (DSS) that is based upon expansion in equal-time cumulants can be used to compute flow statistics of arbitrary order. We investigate the feasibility of a second-order closure (CE2) by performing simulations of the ocean boundary layer in a quasi-linear approximation for which CE2 is exact. As oceanographic examples, wind-driven Langmuir turbulence and thermal convection are studied by comparison of the quasi-linear and fully nonlinear statistics. We also characterize the computational advantages and physical uncertainties of CE2 defined on a reduced basis determined via proper orthogonal decomposition (POD) of the flow fields.
Identifiability of PBPK Models with Applications to ...

EPA Pesticide Factsheets

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

PubMed

Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

2014-07-01

The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.
A consistent framework for Horton regression statistics that leads to a modified Hack's law

USGS Publications Warehouse

Furey, P.R.; Troutman, B.M.

2008-01-01

A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
Statistical validation and an empirical model of hydrogen production enhancement found by utilizing passive flow disturbance in the steam-reformation process

DOE Office of Scientific and Technical Information (OSTI.GOV)

Erickson, Paul A.; Liao, Chang-hsien

2007-11-15

A passive flow disturbance has been proven to enhance the conversion of fuel in a methanol-steam reformer. This study presents a statistical validation of the experiment based on a standard 2{sup k} factorial experiment design and the resulting empirical model of the enhanced hydrogen producing process. A factorial experiment design was used to statistically analyze the effects and interactions of various input factors in the experiment. Three input factors, including the number of flow disturbers, catalyst size, and reactant flow rate were investigated for their effects on the fuel conversion in the steam-reformation process. Based on the experimental results, anmore » empirical model was developed and further evaluated with an uncertainty analysis and interior point data. (author)« less
Structure-Specific Statistical Mapping of White Matter Tracts

PubMed Central

Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

2008-01-01

We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524
Towards accurate modelling of galaxy clustering on small scales: testing the standard ΛCDM + halo model

NASA Astrophysics Data System (ADS)

Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron K.; Scoccimarro, Roman; Piscionere, Jennifer A.; Wibking, Benjamin D.

2018-07-01

Interpreting the small-scale clustering of galaxies with halo models can elucidate the connection between galaxies and dark matter haloes. Unfortunately, the modelling is typically not sufficiently accurate for ruling out models statistically. It is thus difficult to use the information encoded in small scales to test cosmological models or probe subtle features of the galaxy-halo connection. In this paper, we attempt to push halo modelling into the `accurate' regime with a fully numerical mock-based methodology and careful treatment of statistical and systematic errors. With our forward-modelling approach, we can incorporate clustering statistics beyond the traditional two-point statistics. We use this modelling methodology to test the standard Λ cold dark matter (ΛCDM) + halo model against the clustering of Sloan Digital Sky Survey (SDSS) seventh data release (DR7) galaxies. Specifically, we use the projected correlation function, group multiplicity function, and galaxy number density as constraints. We find that while the model fits each statistic separately, it struggles to fit them simultaneously. Adding group statistics leads to a more stringent test of the model and significantly tighter constraints on model parameters. We explore the impact of varying the adopted halo definition and cosmological model and find that changing the cosmology makes a significant difference. The most successful model we tried (Planck cosmology with Mvir haloes) matches the clustering of low-luminosity galaxies, but exhibits a 2.3σ tension with the clustering of luminous galaxies, thus providing evidence that the `standard' halo model needs to be extended. This work opens the door to adding interesting freedom to the halo model and including additional clustering statistics as constraints.
Sensitivity to the Sampling Process Emerges From the Principle of Efficiency.

PubMed

Jara-Ettinger, Julian; Sun, Felix; Schulz, Laura; Tenenbaum, Joshua B

2018-05-01

Humans can seamlessly infer other people's preferences, based on what they do. Broadly, two types of accounts have been proposed to explain different aspects of this ability. The first account focuses on spatial information: Agents' efficient navigation in space reveals what they like. The second account focuses on statistical information: Uncommon choices reveal stronger preferences. Together, these two lines of research suggest that we have two distinct capacities for inferring preferences. Here we propose that this is not the case, and that spatial-based and statistical-based preference inferences can be explained by the assumption that agents are efficient alone. We show that people's sensitivity to spatial and statistical information when they infer preferences is best predicted by a computational model of the principle of efficiency, and that this model outperforms dual-system models, even when the latter are fit to participant judgments. Our results suggest that, as adults, a unified understanding of agency under the principle of efficiency underlies our ability to infer preferences. Copyright © 2018 Cognitive Science Society, Inc.
Mourning dove hunting regulation strategy based on annual harvest statistics and banding data

USGS Publications Warehouse

Otis, D.L.

2006-01-01

Although managers should strive to base game bird harvest management strategies on mechanistic population models, monitoring programs required to build and continuously update these models may not be in place. Alternatively, If estimates of total harvest and harvest rates are available, then population estimates derived from these harvest data can serve as the basis for making hunting regulation decisions based on population growth rates derived from these estimates. I present a statistically rigorous approach for regulation decision-making using a hypothesis-testing framework and an assumed framework of 3 hunting regulation alternatives. I illustrate and evaluate the technique with historical data on the mid-continent mallard (Anas platyrhynchos) population. I evaluate the statistical properties of the hypothesis-testing framework using the best available data on mourning doves (Zenaida macroura). I use these results to discuss practical implementation of the technique as an interim harvest strategy for mourning doves until reliable mechanistic population models and associated monitoring programs are developed.
A Statistical Model for Misreported Binary Outcomes in Clustered RCTs of Education Interventions

ERIC Educational Resources Information Center

Schochet, Peter Z.

2013-01-01

In education randomized control trials (RCTs), the misreporting of student outcome data could lead to biased estimates of average treatment effects (ATEs) and their standard errors. This article discusses a statistical model that adjusts for misreported binary outcomes for two-level, school-based RCTs, where it is assumed that misreporting could…
Statistical Mechanics of Prion Diseases

NASA Astrophysics Data System (ADS)

Slepoy, A.; Singh, R. R.; Pázmándi, F.; Kulkarni, R. V.; Cox, D. L.

2001-07-01

We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ``species barriers'' to prion infection and assess a related treatment protocol.
Binomial test statistics using Psi functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowman, Kimiko o

2007-01-01

For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.
Design of a Model-Based Online Management Information System for Interlibrary Loan Networks.

ERIC Educational Resources Information Center

Rouse, Sandra H.; Rouse, William B.

1979-01-01

Discusses the design of a model-based management information system in terms of mathematical/statistical, information processing, and human factors issues and presents a prototype system for interlibrary loan networks. (Author/CWM)

Model-based Small Area Estimates of Cancer Risk Factors and Screening Behaviors - Small Area Estimates

Cancer.gov

These model-based estimates use two surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS). The two surveys are combined using novel statistical methodology.
Multi-model comparison on the effects of climate change on tree species in the eastern U.S.: results from an enhanced niche model and process-based ecosystem and landscape models

Treesearch

Louis R. Iverson; Frank R. Thompson; Stephen Matthews; Matthew Peters; Anantha Prasad; William D. Dijak; Jacob Fraser; Wen J. Wang; Brice Hanberry; Hong He; Maria Janowiak; Patricia Butler; Leslie Brandt; Chris Swanston

2016-01-01

Context. Species distribution models (SDM) establish statistical relationships between the current distribution of species and key attributes whereas process-based models simulate ecosystem and tree species dynamics based on representations of physical and biological processes. TreeAtlas, which uses DISTRIB SDM, and Linkages and LANDIS PRO, process...
An Update on Statistical Boosting in Biomedicine.

PubMed

Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

2017-01-01

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.
Design of a testing strategy using non-animal based test methods: lessons learnt from the ACuteTox project.

PubMed

Kopp-Schneider, Annette; Prieto, Pilar; Kinsner-Ovaskainen, Agnieszka; Stanzel, Sven

2013-06-01

In the framework of toxicology, a testing strategy can be viewed as a series of steps which are taken to come to a final prediction about a characteristic of a compound under study. The testing strategy is performed as a single-step procedure, usually called a test battery, using simultaneously all information collected on different endpoints, or as tiered approach in which a decision tree is followed. Design of a testing strategy involves statistical considerations, such as the development of a statistical prediction model. During the EU FP6 ACuteTox project, several prediction models were proposed on the basis of statistical classification algorithms which we illustrate here. The final choice of testing strategies was not based on statistical considerations alone. However, without thorough statistical evaluations a testing strategy cannot be identified. We present here a number of observations made from the statistical viewpoint which relate to the development of testing strategies. The points we make were derived from problems we had to deal with during the evaluation of this large research project. A central issue during the development of a prediction model is the danger of overfitting. Procedures are presented to deal with this challenge. Copyright © 2012 Elsevier Ltd. All rights reserved.
New powerful statistics for alignment-free sequence comparison under a pattern transfer model.

PubMed

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S; Sun, Fengzhu

2011-09-07

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D*2 and D(s)2 showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D*2 and D(s)2 by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. Copyright © 2011 Elsevier Ltd. All rights reserved.
New Powerful Statistics for Alignment-free Sequence Comparison Under a Pattern Transfer Model

PubMed Central

Liu, Xuemei; Wan, Lin; Li, Jing; Reinert, Gesine; Waterman, Michael S.; Sun, Fengzhu

2011-01-01

Alignment-free sequence comparison is widely used for comparing gene regulatory regions and for identifying horizontally transferred genes. Recent studies on the power of a widely used alignment-free comparison statistic D2 and its variants D2∗ and D2s showed that their power approximates a limit smaller than 1 as the sequence length tends to infinity under a pattern transfer model. We develop new alignment-free statistics based on D2, D2∗ and D2s by comparing local sequence pairs and then summing over all the local sequence pairs of certain length. We show that the new statistics are much more powerful than the corresponding statistics and the power tends to 1 as the sequence length tends to infinity under the pattern transfer model. PMID:21723298
“Plateau”-related summary statistics are uninformative for comparing working memory models

PubMed Central

van den Berg, Ronald; Ma, Wei Ji

2014-01-01

Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon. Zhang and Luck (2008) and Anderson, Vogel, and Awh (2011) noticed that as more items need to be remembered, “memory noise” seems to first increase and then reach a “stable plateau.” They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided, at most, 0.15% of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99% correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. At realistic numbers of trials, plateau-related summary statistics are completely unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (2011), we found that the evidence in the summary statistics was, at most, 0.12% of the evidence in the raw data and far too weak to warrant any conclusions. These findings call into question claims about working memory that are based on summary statistics. PMID:24719235
Effects of Instructional Design with Mental Model Analysis on Learning.

ERIC Educational Resources Information Center

Hong, Eunsook

This paper presents a model for systematic instructional design that includes mental model analysis together with the procedures used in developing computer-based instructional materials in the area of statistical hypothesis testing. The instructional design model is based on the premise that the objective for learning is to achieve expert-like…
A statistical approach to develop a detailed soot growth model using PAH characteristics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Raj, Abhijeet; Celnik, Matthew; Shirley, Raphael

A detailed PAH growth model is developed, which is solved using a kinetic Monte Carlo algorithm. The model describes the structure and growth of planar PAH molecules, and is referred to as the kinetic Monte Carlo-aromatic site (KMC-ARS) model. A detailed PAH growth mechanism based on reactions at radical sites available in the literature, and additional reactions obtained from quantum chemistry calculations are used to model the PAH growth processes. New rates for the reactions involved in the cyclodehydrogenation process for the formation of 6-member rings on PAHs are calculated in this work based on density functional theory simulations. Themore » KMC-ARS model is validated by comparing experimentally observed ensembles on PAHs with the computed ensembles for a C{sub 2}H{sub 2} and a C{sub 6}H{sub 6} flame at different heights above the burner. The motivation for this model is the development of a detailed soot particle population balance model which describes the evolution of an ensemble of soot particles based on their PAH structure. However, at present incorporating such a detailed model into a population balance is computationally unfeasible. Therefore, a simpler model referred to as the site-counting model has been developed, which replaces the structural information of the PAH molecules by their functional groups augmented with statistical closure expressions. This closure is obtained from the KMC-ARS model, which is used to develop correlations and statistics in different flame environments which describe such PAH structural information. These correlations and statistics are implemented in the site-counting model, and results from the site-counting model and the KMC-ARS model are in good agreement. Additionally the effect of steric hindrance in large PAH structures is investigated and correlations for sites unavailable for reaction are presented. (author)« less
Statistical virtual eye model based on wavefront aberration

PubMed Central

Wang, Jie-Mei; Liu, Chun-Ling; Luo, Yi-Ning; Liu, Yi-Guang; Hu, Bing-Jie

2012-01-01

Wavefront aberration affects the quality of retinal image directly. This paper reviews the representation and reconstruction of wavefront aberration, as well as the construction of virtual eye model based on Zernike polynomial coefficients. In addition, the promising prospect of virtual eye model is emphasized. PMID:23173112
Avalanches, loading and finite size effects in 2D amorphous plasticity: results from a finite element model

NASA Astrophysics Data System (ADS)

Sandfeld, Stefan; Budrikis, Zoe; Zapperi, Stefano; Fernandez Castellanos, David

2015-02-01

Crystalline plasticity is strongly interlinked with dislocation mechanics and nowadays is relatively well understood. Concepts and physical models of plastic deformation in amorphous materials on the other hand—where the concept of linear lattice defects is not applicable—still are lagging behind. We introduce an eigenstrain-based finite element lattice model for simulations of shear band formation and strain avalanches. Our model allows us to study the influence of surfaces and finite size effects on the statistics of avalanches. We find that even with relatively complex loading conditions and open boundary conditions, critical exponents describing avalanche statistics are unchanged, which validates the use of simpler scalar lattice-based models to study these phenomena.
Robust Combining of Disparate Classifiers Through Order Statistics

NASA Technical Reports Server (NTRS)

Tumer, Kagan; Ghosh, Joydeep

2001-01-01

Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Physics-based statistical learning approach to mesoscopic model selection.

PubMed

Taverniers, Søren; Haut, Terry S; Barros, Kipton; Alexander, Francis J; Lookman, Turab

2015-11-01

In materials science and many other research areas, models are frequently inferred without considering their generalization to unseen data. We apply statistical learning using cross-validation to obtain an optimally predictive coarse-grained description of a two-dimensional kinetic nearest-neighbor Ising model with Glauber dynamics (GD) based on the stochastic Ginzburg-Landau equation (sGLE). The latter is learned from GD "training" data using a log-likelihood analysis, and its predictive ability for various complexities of the model is tested on GD "test" data independent of the data used to train the model on. Using two different error metrics, we perform a detailed analysis of the error between magnetization time trajectories simulated using the learned sGLE coarse-grained description and those obtained using the GD model. We show that both for equilibrium and out-of-equilibrium GD training trajectories, the standard phenomenological description using a quartic free energy does not always yield the most predictive coarse-grained model. Moreover, increasing the amount of training data can shift the optimal model complexity to higher values. Our results are promising in that they pave the way for the use of statistical learning as a general tool for materials modeling and discovery.
Statistical Inferences from Formaldehyde Dna-Protein Cross-Link Data

EPA Science Inventory

Physiologically-based pharmacokinetic (PBPK) modeling has reached considerable sophistication in its application in the pharmacological and environmental health areas. Yet, mature methodologies for making statistical inferences have not been routinely incorporated in these applic...
Probabilistic Graphical Model Representation in Phylogenetics

PubMed Central

Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

2014-01-01

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559
Statistical appearance models based on probabilistic correspondences.

PubMed

Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

2017-04-01

Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.
A Quantitative Comparative Study of Blended and Traditional Models in the Secondary Advanced Placement Statistics Classroom

ERIC Educational Resources Information Center

Owens, Susan T.

2017-01-01

Technology is becoming an integral tool in the classroom and can make a positive impact on how the students learn. This quantitative comparative research study examined gender-based differences among secondary Advanced Placement (AP) Statistic students comparing Educational Testing Service (ETS) College Board AP Statistic examination scores…
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.

ERIC Educational Resources Information Center

Ojeda, Mario Miguel; Sahai, Hardeo

2002-01-01

Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
On a Calculus-Based Statistics Course for Life Science Students

ERIC Educational Resources Information Center

Watkins, Joseph C.

2010-01-01

The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the…
Joint inversion of marine seismic AVA and CSEM data using statistical rock-physics models and Markov random fields: Stochastic inversion of AVA and CSEM data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, J.; Hoversten, G.M.

2011-09-15

Joint inversion of seismic AVA and CSEM data requires rock-physics relationships to link seismic attributes to electrical properties. Ideally, we can connect them through reservoir parameters (e.g., porosity and water saturation) by developing physical-based models, such as Gassmann’s equations and Archie’s law, using nearby borehole logs. This could be difficult in the exploration stage because information available is typically insufficient for choosing suitable rock-physics models and for subsequently obtaining reliable estimates of the associated parameters. The use of improper rock-physics models and the inaccuracy of the estimates of model parameters may cause misleading inversion results. Conversely, it is easy tomore » derive statistical relationships among seismic and electrical attributes and reservoir parameters from distant borehole logs. In this study, we develop a Bayesian model to jointly invert seismic AVA and CSEM data for reservoir parameter estimation using statistical rock-physics models; the spatial dependence of geophysical and reservoir parameters are carried out by lithotypes through Markov random fields. We apply the developed model to a synthetic case, which simulates a CO{sub 2} monitoring application. We derive statistical rock-physics relations from borehole logs at one location and estimate seismic P- and S-wave velocity ratio, acoustic impedance, density, electrical resistivity, lithotypes, porosity, and water saturation at three different locations by conditioning to seismic AVA and CSEM data. Comparison of the inversion results with their corresponding true values shows that the correlation-based statistical rock-physics models provide significant information for improving the joint inversion results.« less

A Survey of Statistical Models for Reverse Engineering Gene Regulatory Networks

PubMed Central

Huang, Yufei; Tienda-Luna, Isabel M.; Wang, Yufeng

2009-01-01

Statistical models for reverse engineering gene regulatory networks are surveyed in this article. To provide readers with a system-level view of the modeling issues in this research, a graphical modeling framework is proposed. This framework serves as the scaffolding on which the review of different models can be systematically assembled. Based on the framework, we review many existing models for many aspects of gene regulation; the pros and cons of each model are discussed. In addition, network inference algorithms are also surveyed under the graphical modeling framework by the categories of point solutions and probabilistic solutions and the connections and differences among the algorithms are provided. This survey has the potential to elucidate the development and future of reverse engineering GRNs and bring statistical signal processing closer to the core of this research. PMID:20046885
The construction and assessment of a statistical model for the prediction of protein assay data.

PubMed

Pittman, J; Sacks, J; Young, S Stanley

2002-01-01

The focus of this work is the development of a statistical model for a bioinformatics database whose distinctive structure makes model assessment an interesting and challenging problem. The key components of the statistical methodology, including a fast approximation to the singular value decomposition and the use of adaptive spline modeling and tree-based methods, are described, and preliminary results are presented. These results are shown to compare favorably to selected results achieved using comparitive methods. An attempt to determine the predictive ability of the model through the use of cross-validation experiments is discussed. In conclusion a synopsis of the results of these experiments and their implications for the analysis of bioinformatic databases in general is presented.
Two statistics for evaluating parameter identifiability and error reduction

USGS Publications Warehouse

Doherty, John; Hunt, Randall J.

2009-01-01

Two statistics are presented that can be used to rank input parameters utilized by a model in terms of their relative identifiability based on a given or possible future calibration dataset. Identifiability is defined here as the capability of model calibration to constrain parameters used by a model. Both statistics require that the sensitivity of each model parameter be calculated for each model output for which there are actual or presumed field measurements. Singular value decomposition (SVD) of the weighted sensitivity matrix is then undertaken to quantify the relation between the parameters and observations that, in turn, allows selection of calibration solution and null spaces spanned by unit orthogonal vectors. The first statistic presented, "parameter identifiability", is quantitatively defined as the direction cosine between a parameter and its projection onto the calibration solution space. This varies between zero and one, with zero indicating complete non-identifiability and one indicating complete identifiability. The second statistic, "relative error reduction", indicates the extent to which the calibration process reduces error in estimation of a parameter from its pre-calibration level where its value must be assigned purely on the basis of prior expert knowledge. This is more sophisticated than identifiability, in that it takes greater account of the noise associated with the calibration dataset. Like identifiability, it has a maximum value of one (which can only be achieved if there is no measurement noise). Conceptually it can fall to zero; and even below zero if a calibration problem is poorly posed. An example, based on a coupled groundwater/surface-water model, is included that demonstrates the utility of the statistics. ?? 2009 Elsevier B.V.
Students' Perceptions of Computer-Based Learning Environments, Their Attitude towards Business Statistics, and Their Academic Achievement: Implications from a UK University

ERIC Educational Resources Information Center

Nguyen, ThuyUyen H.; Charity, Ian; Robson, Andrew

2016-01-01

This study investigates students' perceptions of computer-based learning environments, their attitude towards business statistics, and their academic achievement in higher education. Guided by learning environments concepts and attitudinal theory, a theoretical model was proposed with two instruments, one for measuring the learning environment and…
The FORE-SCE model: a practical approach for projecting land cover change using scenario-based modeling

USGS Publications Warehouse

Sohl, Terry L.; Sayler, Kristi L.; Drummond, Mark A.; Loveland, Thomas R.

2007-01-01

A wide variety of ecological applications require spatially explicit, historic, current, and projected land use and land cover data. The U.S. Land Cover Trends project is analyzing contemporary (1973–2000) land-cover change in the conterminous United States. The newly developed FORE-SCE model used Land Cover Trends data and theoretical, statistical, and deterministic modeling techniques to project future land cover change through 2020 for multiple plausible scenarios. Projected proportions of future land use were initially developed, and then sited on the lands with the highest potential for supporting that land use and land cover using a statistically based stochastic allocation procedure. Three scenarios of 2020 land cover were mapped for the western Great Plains in the US. The model provided realistic, high-resolution, scenario-based land-cover products suitable for multiple applications, including studies of climate and weather variability, carbon dynamics, and regional hydrology.
A Review of Statistical Failure Time Models with Application of a Discrete Hazard Based Model to 1Cr1Mo-0.25V Steel for Turbine Rotors and Shafts

PubMed Central

2017-01-01

Producing predictions of the probabilistic risks of operating materials for given lengths of time at stated operating conditions requires the assimilation of existing deterministic creep life prediction models (that only predict the average failure time) with statistical models that capture the random component of creep. To date, these approaches have rarely been combined to achieve this objective. The first half of this paper therefore provides a summary review of some statistical models to help bridge the gap between these two approaches. The second half of the paper illustrates one possible assimilation using 1Cr1Mo-0.25V steel. The Wilshire equation for creep life prediction is integrated into a discrete hazard based statistical model—the former being chosen because of its novelty and proven capability in accurately predicting average failure times and the latter being chosen because of its flexibility in modelling the failure time distribution. Using this model it was found that, for example, if this material had been in operation for around 15 years at 823 K and 130 MPa, the chances of failure in the next year is around 35%. However, if this material had been in operation for around 25 years, the chance of failure in the next year rises dramatically to around 80%. PMID:29039773
Agent-Based Models in Empirical Social Research

ERIC Educational Resources Information Center

Bruch, Elizabeth; Atwell, Jon

2015-01-01

Agent-based modeling has become increasingly popular in recent years, but there is still no codified set of recommendations or practices for how to use these models within a program of empirical research. This article provides ideas and practical guidelines drawn from sociology, biology, computer science, epidemiology, and statistics. We first…
Nakagami-based total variation method for speckle reduction in thyroid ultrasound images.

PubMed

Koundal, Deepika; Gupta, Savita; Singh, Sukhwinder

2016-02-01

A good statistical model is necessary for the reduction in speckle noise. The Nakagami model is more general than the Rayleigh distribution for statistical modeling of speckle in ultrasound images. In this article, the Nakagami-based noise removal method is presented to enhance thyroid ultrasound images and to improve clinical diagnosis. The statistics of log-compressed image are derived from the Nakagami distribution following a maximum a posteriori estimation framework. The minimization problem is solved by optimizing an augmented Lagrange and Chambolle's projection method. The proposed method is evaluated on both artificial speckle-simulated and real ultrasound images. The experimental findings reveal the superiority of the proposed method both quantitatively and qualitatively in comparison with other speckle reduction methods reported in the literature. The proposed method yields an average signal-to-noise ratio gain of more than 2.16 dB over the non-convex regularizer-based speckle noise removal method, 3.83 dB over the Aubert-Aujol model, 1.71 dB over the Shi-Osher model and 3.21 dB over the Rudin-Lions-Osher model on speckle-simulated synthetic images. Furthermore, visual evaluation of the despeckled images shows that the proposed method suppresses speckle noise well while preserving the textures and fine details. © IMechE 2015.
Can Counter-Gang Models be Applied to Counter ISIS’s Internet Recruitment Campaign

DTIC Science & Technology

2016-06-10

limitation that exists is the lack of reliable statistics from social media companies in regards to the quantity of ISIS-affiliated sites, which exist on... statistics , they have approximately 320-million monthly active users with thirty-five-plus languages supported and 77 percent of accounts located...Justice and Delinquency Prevention program. For deterrence-based models, the primary point of research is focused deterrence models with emphasis placed
Modeling Complex Phenomena Using Multiscale Time Sequences

DTIC Science & Technology

2009-08-24

measures based on Hurst and Holder exponents , auto-regressive methods and Fourier and wavelet decomposition methods. The applications for this technology...relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and Holder exponents , auto-regressive...different scales and how these scales relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and
Trends in study design and the statistical methods employed in a leading general medicine journal.

PubMed

Gosho, M; Sato, Y; Nagashima, K; Takahashi, S

2018-02-01

Study design and statistical methods have become core components of medical research, and the methodology has become more multifaceted and complicated over time. The study of the comprehensive details and current trends of study design and statistical methods is required to support the future implementation of well-planned clinical studies providing information about evidence-based medicine. Our purpose was to illustrate study design and statistical methods employed in recent medical literature. This was an extension study of Sato et al. (N Engl J Med 2017; 376: 1086-1087), which reviewed 238 articles published in 2015 in the New England Journal of Medicine (NEJM) and briefly summarized the statistical methods employed in NEJM. Using the same database, we performed a new investigation of the detailed trends in study design and individual statistical methods that were not reported in the Sato study. Due to the CONSORT statement, prespecification and justification of sample size are obligatory in planning intervention studies. Although standard survival methods (eg Kaplan-Meier estimator and Cox regression model) were most frequently applied, the Gray test and Fine-Gray proportional hazard model for considering competing risks were sometimes used for a more valid statistical inference. With respect to handling missing data, model-based methods, which are valid for missing-at-random data, were more frequently used than single imputation methods. These methods are not recommended as a primary analysis, but they have been applied in many clinical trials. Group sequential design with interim analyses was one of the standard designs, and novel design, such as adaptive dose selection and sample size re-estimation, was sometimes employed in NEJM. Model-based approaches for handling missing data should replace single imputation methods for primary analysis in the light of the information found in some publications. Use of adaptive design with interim analyses is increasing after the presentation of the FDA guidance for adaptive design. © 2017 John Wiley & Sons Ltd.
Mathematic model analysis of Gaussian beam propagation through an arbitrary thickness random phase screen.

PubMed

Tian, Yuzhen; Guo, Jin; Wang, Rui; Wang, Tingfeng

2011-09-12

In order to research the statistical properties of Gaussian beam propagation through an arbitrary thickness random phase screen for adaptive optics and laser communication application in the laboratory, we establish mathematic models of statistical quantities, which are based on the Rytov method and the thin phase screen model, involved in the propagation process. And the analytic results are developed for an arbitrary thickness phase screen based on the Kolmogorov power spectrum. The comparison between the arbitrary thickness phase screen and the thin phase screen shows that it is more suitable for our results to describe the generalized case, especially the scintillation index.
Comparison of simulation modeling and satellite techniques for monitoring ecological processes

NASA Technical Reports Server (NTRS)

Box, Elgene O.

1988-01-01

In 1985 improvements were made in the world climatic data base for modeling and predictive mapping; in individual process models and the overall carbon-balance models; and in the interface software for mapping the simulation results. Statistical analysis of the data base was begun. In 1986 mapping was shifted to NASA-Goddard. The initial approach involving pattern comparisons was modified to a more statistical approach. A major accomplishment was the expansion and improvement of a global data base of measurements of biomass and primary production, to complement the simulation data. The main accomplishments during 1987 included: production of a master tape with all environmental and satellite data and model results for the 1600 sites; development of a complete mapping system used for the initial color maps comparing annual and monthly patterns of Normalized Difference Vegetation Index (NDVI), actual evapotranspiration, net primary productivity, gross primary productivity, and net ecosystem production; collection of more biosphere measurements for eventual improvement of the biological models; and development of some initial monthly models for primary productivity, based on satellite data.
Teaching Statistics--Despite Its Applications

ERIC Educational Resources Information Center

Ridgway, Jim; Nicholson, James; McCusker, Sean

2007-01-01

Evidence-based policy requires sophisticated modelling and reasoning about complex social data. The current UK statistics curricula do not equip tomorrow's citizens to understand such reasoning. We advocate radical curriculum reform, designed to require students to reason from complex data.
Fast, Statistical Model of Surface Roughness for Ion-Solid Interaction Simulations and Efficient Code Coupling

NASA Astrophysics Data System (ADS)

Drobny, Jon; Curreli, Davide; Ruzic, David; Lasa, Ane; Green, David; Canik, John; Younkin, Tim; Blondel, Sophie; Wirth, Brian

2017-10-01

Surface roughness greatly impacts material erosion, and thus plays an important role in Plasma-Surface Interactions. Developing strategies for efficiently introducing rough surfaces into ion-solid interaction codes will be an important step towards whole-device modeling of plasma devices and future fusion reactors such as ITER. Fractal TRIDYN (F-TRIDYN) is an upgraded version of the Monte Carlo, BCA program TRIDYN developed for this purpose that includes an explicit fractal model of surface roughness and extended input and output options for file-based code coupling. Code coupling with both plasma and material codes has been achieved and allows for multi-scale, whole-device modeling of plasma experiments. These code coupling results will be presented. F-TRIDYN has been further upgraded with an alternative, statistical model of surface roughness. The statistical model is significantly faster than and compares favorably to the fractal model. Additionally, the statistical model compares well to alternative computational surface roughness models and experiments. Theoretical links between the fractal and statistical models are made, and further connections to experimental measurements of surface roughness are explored. This work was supported by the PSI-SciDAC Project funded by the U.S. Department of Energy through contract DOE-DE-SC0008658.
Agent based reasoning for the non-linear stochastic models of long-range memory

NASA Astrophysics Data System (ADS)

Kononovicius, A.; Gontis, V.

2012-02-01

We extend Kirman's model by introducing variable event time scale. The proposed flexible time scale is equivalent to the variable trading activity observed in financial markets. Stochastic version of the extended Kirman's agent based model is compared to the non-linear stochastic models of long-range memory in financial markets. The agent based model providing matching macroscopic description serves as a microscopic reasoning of the earlier proposed stochastic model exhibiting power law statistics.
Prostate segmentation in MRI using a convolutional neural network architecture and training strategy based on statistical shape models.

PubMed

Karimi, Davood; Samei, Golnoosh; Kesch, Claudia; Nir, Guy; Salcudean, Septimiu E

2018-05-15

Most of the existing convolutional neural network (CNN)-based medical image segmentation methods are based on methods that have originally been developed for segmentation of natural images. Therefore, they largely ignore the differences between the two domains, such as the smaller degree of variability in the shape and appearance of the target volume and the smaller amounts of training data in medical applications. We propose a CNN-based method for prostate segmentation in MRI that employs statistical shape models to address these issues. Our CNN predicts the location of the prostate center and the parameters of the shape model, which determine the position of prostate surface keypoints. To train such a large model for segmentation of 3D images using small data (1) we adopt a stage-wise training strategy by first training the network to predict the prostate center and subsequently adding modules for predicting the parameters of the shape model and prostate rotation, (2) we propose a data augmentation method whereby the training images and their prostate surface keypoints are deformed according to the displacements computed based on the shape model, and (3) we employ various regularization techniques. Our proposed method achieves a Dice score of 0.88, which is obtained by using both elastic-net and spectral dropout for regularization. Compared with a standard CNN-based method, our method shows significantly better segmentation performance on the prostate base and apex. Our experiments also show that data augmentation using the shape model significantly improves the segmentation results. Prior knowledge about the shape of the target organ can improve the performance of CNN-based segmentation methods, especially where image features are not sufficient for a precise segmentation. Statistical shape models can also be employed to synthesize additional training data that can ease the training of large CNNs.
"Plateau"-related summary statistics are uninformative for comparing working memory models.

PubMed

van den Berg, Ronald; Ma, Wei Ji

2014-10-01

Performance on visual working memory tasks decreases as more items need to be remembered. Over the past decade, a debate has unfolded between proponents of slot models and slotless models of this phenomenon (Ma, Husain, Bays (Nature Neuroscience 17, 347-356, 2014). Zhang and Luck (Nature 453, (7192), 233-235, 2008) and Anderson, Vogel, and Awh (Attention, Perception, Psychophys 74, (5), 891-910, 2011) noticed that as more items need to be remembered, "memory noise" seems to first increase and then reach a "stable plateau." They argued that three summary statistics characterizing this plateau are consistent with slot models, but not with slotless models. Here, we assess the validity of their methods. We generated synthetic data both from a leading slot model and from a recent slotless model and quantified model evidence using log Bayes factors. We found that the summary statistics provided at most 0.15 % of the expected model evidence in the raw data. In a model recovery analysis, a total of more than a million trials were required to achieve 99 % correct recovery when models were compared on the basis of summary statistics, whereas fewer than 1,000 trials were sufficient when raw data were used. Therefore, at realistic numbers of trials, plateau-related summary statistics are highly unreliable for model comparison. Applying the same analyses to subject data from Anderson et al. (Attention, Perception, Psychophys 74, (5), 891-910, 2011), we found that the evidence in the summary statistics was at most 0.12 % of the evidence in the raw data and far too weak to warrant any conclusions. The evidence in the raw data, in fact, strongly favored the slotless model. These findings call into question claims about working memory that are based on summary statistics.
Cognitive predictors of balance in Parkinson's disease.

PubMed

Fernandes, Ângela; Mendes, Andreia; Rocha, Nuno; Tavares, João Manuel R S

2016-06-01

Postural instability is one of the most incapacitating symptoms of Parkinson's disease (PD) and appears to be related to cognitive deficits. This study aims to determine the cognitive factors that can predict deficits in static and dynamic balance in individuals with PD. A sociodemographic questionnaire characterized 52 individuals with PD for this work. The Trail Making Test, Rule Shift Cards Test, and Digit Span Test assessed the executive functions. The static balance was assessed using a plantar pressure platform, and dynamic balance was based on the Timed Up and Go Test. The results were statistically analysed using SPSS Statistics software through linear regression analysis. The results show that a statistically significant model based on cognitive outcomes was able to explain the variance of motor variables. Also, the explanatory value of the model tended to increase with the addition of individual and clinical variables, although the resulting model was not statistically significant The model explained 25-29% of the variability of the Timed Up and Go Test, while for the anteroposterior displacement it was 23-34%, and for the mediolateral displacement it was 24-39%. From the findings, we conclude that the cognitive performance, especially the executive functions, is a predictor of balance deficit in individuals with PD.
Predictive data modeling of human type II diabetes related statistics

NASA Astrophysics Data System (ADS)

Jaenisch, Kristina L.; Jaenisch, Holger M.; Handley, James W.; Albritton, Nathaniel G.

2009-04-01

During the course of routine Type II treatment of one of the authors, it was decided to derive predictive analytical Data Models of the daily sampled vital statistics: namely weight, blood pressure, and blood sugar, to determine if the covariance among the observed variables could yield a descriptive equation based model, or better still, a predictive analytical model that could forecast the expected future trend of the variables and possibly eliminate the number of finger stickings required to montior blood sugar levels. The personal history and analysis with resulting models are presented.

Multilevel Model Prediction

ERIC Educational Resources Information Center

Frees, Edward W.; Kim, Jee-Seon

2006-01-01

Multilevel models are proven tools in social research for modeling complex, hierarchical systems. In multilevel modeling, statistical inference is based largely on quantification of random variables. This paper distinguishes among three types of random variables in multilevel modeling--model disturbances, random coefficients, and future response…
An empirical comparison of statistical tests for assessing the proportional hazards assumption of Cox's model.

PubMed

Ng'andu, N H

1997-03-30

In the analysis of survival data using the Cox proportional hazard (PH) model, it is important to verify that the explanatory variables analysed satisfy the proportional hazard assumption of the model. This paper presents results of a simulation study that compares five test statistics to check the proportional hazard assumption of Cox's model. The test statistics were evaluated under proportional hazards and the following types of departures from the proportional hazard assumption: increasing relative hazards; decreasing relative hazards; crossing hazards; diverging hazards, and non-monotonic hazards. The test statistics compared include those based on partitioning of failure time and those that do not require partitioning of failure time. The simulation results demonstrate that the time-dependent covariate test, the weighted residuals score test and the linear correlation test have equally good power for detection of non-proportionality in the varieties of non-proportional hazards studied. Using illustrative data from the literature, these test statistics performed similarly.
Evaluating Structural Equation Models for Categorical Outcomes: A New Test Statistic and a Practical Challenge of Interpretation.

PubMed

Monroe, Scott; Cai, Li

2015-01-01

This research is concerned with two topics in assessing model fit for categorical data analysis. The first topic involves the application of a limited-information overall test, introduced in the item response theory literature, to structural equation modeling (SEM) of categorical outcome variables. Most popular SEM test statistics assess how well the model reproduces estimated polychoric correlations. In contrast, limited-information test statistics assess how well the underlying categorical data are reproduced. Here, the recently introduced C2 statistic of Cai and Monroe (2014) is applied. The second topic concerns how the root mean square error of approximation (RMSEA) fit index can be affected by the number of categories in the outcome variable. This relationship creates challenges for interpreting RMSEA. While the two topics initially appear unrelated, they may conveniently be studied in tandem since RMSEA is based on an overall test statistic, such as C2. The results are illustrated with an empirical application to data from a large-scale educational survey.
Low-complexity stochastic modeling of wall-bounded shear flows

NASA Astrophysics Data System (ADS)

Zare, Armin

Turbulent flows are ubiquitous in nature and they appear in many engineering applications. Transition to turbulence, in general, increases skin-friction drag in air/water vehicles compromising their fuel-efficiency and reduces the efficiency and longevity of wind turbines. While traditional flow control techniques combine physical intuition with costly experiments, their effectiveness can be significantly enhanced by control design based on low-complexity models and optimization. In this dissertation, we develop a theoretical and computational framework for the low-complexity stochastic modeling of wall-bounded shear flows. Part I of the dissertation is devoted to the development of a modeling framework which incorporates data-driven techniques to refine physics-based models. We consider the problem of completing partially known sample statistics in a way that is consistent with underlying stochastically driven linear dynamics. Neither the statistics nor the dynamics are precisely known. Thus, our objective is to reconcile the two in a parsimonious manner. To this end, we formulate optimization problems to identify the dynamics and directionality of input excitation in order to explain and complete available covariance data. For problem sizes that general-purpose solvers cannot handle, we develop customized optimization algorithms based on alternating direction methods. The solution to the optimization problem provides information about critical directions that have maximal effect in bringing model and statistics in agreement. In Part II, we employ our modeling framework to account for statistical signatures of turbulent channel flow using low-complexity stochastic dynamical models. We demonstrate that white-in-time stochastic forcing is not sufficient to explain turbulent flow statistics and develop models for colored-in-time forcing of the linearized Navier-Stokes equations. We also examine the efficacy of stochastically forced linearized NS equations and their parabolized equivalents in the receptivity analysis of velocity fluctuations to external sources of excitation as well as capturing the effect of the slowly-varying base flow on streamwise streaks and Tollmien-Schlichting waves. In Part III, we develop a model-based approach to design surface actuation of turbulent channel flow in the form of streamwise traveling waves. This approach is capable of identifying the drag reducing trends of traveling waves in a simulation-free manner. We also use the stochastically forced linearized NS equations to examine the Reynolds number independent effects of spanwise wall oscillations on drag reduction in turbulent channel flows. This allows us to extend the predictive capability of our simulation-free approach to high Reynolds numbers.
Theory-based Bayesian models of inductive learning and reasoning.

PubMed

Tenenbaum, Joshua B; Griffiths, Thomas L; Kemp, Charles

2006-07-01

Inductive inference allows humans to make powerful generalizations from sparse data when learning about word meanings, unobserved properties, causal relationships, and many other aspects of the world. Traditional accounts of induction emphasize either the power of statistical learning, or the importance of strong constraints from structured domain knowledge, intuitive theories or schemas. We argue that both components are necessary to explain the nature, use and acquisition of human knowledge, and we introduce a theory-based Bayesian framework for modeling inductive learning and reasoning as statistical inferences over structured knowledge representations.
A BAYESIAN SPATIAL AND TEMPORAL MODELING APPROACH TO MAPPING GEOGRAPHIC VARIATION IN MORTALITY RATES FOR SUBNATIONAL AREAS WITH R-INLA.

PubMed

Khana, Diba; Rossen, Lauren M; Hedegaard, Holly; Warner, Margaret

2018-01-01

Hierarchical Bayes models have been used in disease mapping to examine small scale geographic variation. State level geographic variation for less common causes of mortality outcomes have been reported however county level variation is rarely examined. Due to concerns about statistical reliability and confidentiality, county-level mortality rates based on fewer than 20 deaths are suppressed based on Division of Vital Statistics, National Center for Health Statistics (NCHS) statistical reliability criteria, precluding an examination of spatio-temporal variation in less common causes of mortality outcomes such as suicide rates (SRs) at the county level using direct estimates. Existing Bayesian spatio-temporal modeling strategies can be applied via Integrated Nested Laplace Approximation (INLA) in R to a large number of rare causes of mortality outcomes to enable examination of spatio-temporal variations on smaller geographic scales such as counties. This method allows examination of spatiotemporal variation across the entire U.S., even where the data are sparse. We used mortality data from 2005-2015 to explore spatiotemporal variation in SRs, as one particular application of the Bayesian spatio-temporal modeling strategy in R-INLA to predict year and county-specific SRs. Specifically, hierarchical Bayesian spatio-temporal models were implemented with spatially structured and unstructured random effects, correlated time effects, time varying confounders and space-time interaction terms in the software R-INLA, borrowing strength across both counties and years to produce smoothed county level SRs. Model-based estimates of SRs were mapped to explore geographic variation.
Statistical Techniques to Explore the Quality of Constraints in Constraint-Based Modeling Environments

ERIC Educational Resources Information Center

Gálvez, Jaime; Conejo, Ricardo; Guzmán, Eduardo

2013-01-01

One of the most popular student modeling approaches is Constraint-Based Modeling (CBM). It is an efficient approach that can be easily applied inside an Intelligent Tutoring System (ITS). Even with these characteristics, building new ITSs requires carefully designing the domain model to be taught because different sources of errors could affect…
Statistical analysis of modeling error in structural dynamic systems

NASA Technical Reports Server (NTRS)

Hasselman, T. K.; Chrostowski, J. D.

1990-01-01

The paper presents a generic statistical model of the (total) modeling error for conventional space structures in their launch configuration. Modeling error is defined as the difference between analytical prediction and experimental measurement. It is represented by the differences between predicted and measured real eigenvalues and eigenvectors. Comparisons are made between pre-test and post-test models. Total modeling error is then subdivided into measurement error, experimental error and 'pure' modeling error, and comparisons made between measurement error and total modeling error. The generic statistical model presented in this paper is based on the first four global (primary structure) modes of four different structures belonging to the generic category of Conventional Space Structures (specifically excluding large truss-type space structures). As such, it may be used to evaluate the uncertainty of predicted mode shapes and frequencies, sinusoidal response, or the transient response of other structures belonging to the same generic category.
Accurate landmarking of three-dimensional facial data in the presence of facial expressions and occlusions using a three-dimensional statistical facial feature model.

PubMed

Zhao, Xi; Dellandréa, Emmanuel; Chen, Liming; Kakadiaris, Ioannis A

2011-10-01

Three-dimensional face landmarking aims at automatically localizing facial landmarks and has a wide range of applications (e.g., face recognition, face tracking, and facial expression analysis). Existing methods assume neutral facial expressions and unoccluded faces. In this paper, we propose a general learning-based framework for reliable landmark localization on 3-D facial data under challenging conditions (i.e., facial expressions and occlusions). Our approach relies on a statistical model, called 3-D statistical facial feature model, which learns both the global variations in configurational relationships between landmarks and the local variations of texture and geometry around each landmark. Based on this model, we further propose an occlusion classifier and a fitting algorithm. Results from experiments on three publicly available 3-D face databases (FRGC, BU-3-DFE, and Bosphorus) demonstrate the effectiveness of our approach, in terms of landmarking accuracy and robustness, in the presence of expressions and occlusions.
Cardiac arrest risk standardization using administrative data compared to registry data.

PubMed

Grossestreuer, Anne V; Gaieski, David F; Donnino, Michael W; Nelson, Joshua I M; Mutter, Eric L; Carr, Brendan G; Abella, Benjamin S; Wiebe, Douglas J

2017-01-01

Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Two risk standardization logistic regression models were developed using 2453 patients treated from 2000-2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the "gold standard" with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876-0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895-0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799-0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788-0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA.
Cardiac arrest risk standardization using administrative data compared to registry data

PubMed Central

Gaieski, David F.; Donnino, Michael W.; Nelson, Joshua I. M.; Mutter, Eric L.; Carr, Brendan G.; Abella, Benjamin S.; Wiebe, Douglas J.

2017-01-01

Background Methods for comparing hospitals regarding cardiac arrest (CA) outcomes, vital for improving resuscitation performance, rely on data collected by cardiac arrest registries. However, most CA patients are treated at hospitals that do not participate in such registries. This study aimed to determine whether CA risk standardization modeling based on administrative data could perform as well as that based on registry data. Methods and results Two risk standardization logistic regression models were developed using 2453 patients treated from 2000–2015 at three hospitals in an academic health system. Registry and administrative data were accessed for all patients. The outcome was death at hospital discharge. The registry model was considered the “gold standard” with which to compare the administrative model, using metrics including comparing areas under the curve, calibration curves, and Bland-Altman plots. The administrative risk standardization model had a c-statistic of 0.891 (95% CI: 0.876–0.905) compared to a registry c-statistic of 0.907 (95% CI: 0.895–0.919). When limited to only non-modifiable factors, the administrative model had a c-statistic of 0.818 (95% CI: 0.799–0.838) compared to a registry c-statistic of 0.810 (95% CI: 0.788–0.831). All models were well-calibrated. There was no significant difference between c-statistics of the models, providing evidence that valid risk standardization can be performed using administrative data. Conclusions Risk standardization using administrative data performs comparably to standardization using registry data. This methodology represents a new tool that can enable opportunities to compare hospital performance in specific hospital systems or across the entire US in terms of survival after CA. PMID:28783754
Estimating the Regional Economic Significance of Airports

DTIC Science & Technology

1992-09-01

following three options for estimating induced impacts: the economic base model , an econometric model , and a regional input-output model . One approach to...limitations, however, the economic base model has been widely used for regional economic analysis. A second approach is to develop an econometric model of...analysis is the principal statistical tool used to estimate the economic relationships. Regional econometric models are capable of estimating a single
Functional status predicts acute care readmission in the traumatic spinal cord injury population.

PubMed

Huang, Donna; Slocum, Chloe; Silver, Julie K; Morgan, James W; Goldstein, Richard; Zafonte, Ross; Schneider, Jeffrey C

2018-03-29

Context/objective Acute care readmission has been identified as an important marker of healthcare quality. Most previous models assessing risk prediction of readmission incorporate variables for medical comorbidity. We hypothesized that functional status is a more robust predictor of readmission in the spinal cord injury population than medical comorbidities. Design Retrospective cross-sectional analysis. Setting Inpatient rehabilitation facilities, Uniform Data System for Medical Rehabilitation data from 2002 to 2012 Participants traumatic spinal cord injury patients. Outcome measures A logistic regression model for predicting acute care readmission based on demographic variables and functional status (Functional Model) was compared with models incorporating demographics, functional status, and medical comorbidities (Functional-Plus) or models including demographics and medical comorbidities (Demographic-Comorbidity). The primary outcomes were 3- and 30-day readmission, and the primary measure of model performance was the c-statistic. Results There were a total of 68,395 patients with 1,469 (2.15%) readmitted at 3 days and 7,081 (10.35%) readmitted at 30 days. The c-statistics for the Functional Model were 0.703 and 0.654 for 3 and 30 days. The Functional Model outperformed Demographic-Comorbidity models at 3 days (c-statistic difference: 0.066-0.096) and outperformed two of the three Demographic-Comorbidity models at 30 days (c-statistic difference: 0.029-0.056). The Functional-Plus models exhibited negligible improvements (0.002-0.010) in model performance compared to the Functional models. Conclusion Readmissions are used as a marker of hospital performance. Function-based readmission models in the spinal cord injury population outperform models incorporating medical comorbidities. Readmission risk models for this population would benefit from the inclusion of functional status.
Estimating procedure times for surgeries by determining location parameters for the lognormal model.

PubMed

Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

2004-05-01

We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit

PubMed Central

Chu, Annie; Cui, Jenny; Dinov, Ivo D.

2011-01-01

The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Work domain constraints for modelling surgical performance.

PubMed

Morineau, Thierry; Riffaud, Laurent; Morandi, Xavier; Villain, Jonathan; Jannin, Pierre

2015-10-01

Three main approaches can be identified for modelling surgical performance: a competency-based approach, a task-based approach, both largely explored in the literature, and a less known work domain-based approach. The work domain-based approach first describes the work domain properties that constrain the agent's actions and shape the performance. This paper presents a work domain-based approach for modelling performance during cervical spine surgery, based on the idea that anatomical structures delineate the surgical performance. This model was evaluated through an analysis of junior and senior surgeons' actions. Twenty-four cervical spine surgeries performed by two junior and two senior surgeons were recorded in real time by an expert surgeon. According to a work domain-based model describing an optimal progression through anatomical structures, the degree of adjustment of each surgical procedure to a statistical polynomial function was assessed. Each surgical procedure showed a significant suitability with the model and regression coefficient values around 0.9. However, the surgeries performed by senior surgeons fitted this model significantly better than those performed by junior surgeons. Analysis of the relative frequencies of actions on anatomical structures showed that some specific anatomical structures discriminate senior from junior performances. The work domain-based modelling approach can provide an overall statistical indicator of surgical performance, but in particular, it can highlight specific points of interest among anatomical structures that the surgeons dwelled on according to their level of expertise.
Performance of Bootstrap MCEWMA: Study case of Sukuk Musyarakah data

NASA Astrophysics Data System (ADS)

Safiih, L. Muhamad; Hila, Z. Nurul

2014-07-01

Sukuk Musyarakah is one of several instruments of Islamic bond investment in Malaysia, where the form of this sukuk is actually based on restructuring the conventional bond to become a Syariah compliant bond. The Syariah compliant is based on prohibition of any influence of usury, benefit or fixed return. Despite of prohibition, daily returns of sukuk are non-fixed return and in statistic, the data of sukuk returns are said to be a time series data which is dependent and autocorrelation distributed. This kind of data is a crucial problem whether in statistical and financing field. Returns of sukuk can be statistically viewed by its volatility, whether it has high volatility that describing the dramatically change of price and categorized it as risky bond or else. However, this crucial problem doesn't get serious attention among researcher compared to conventional bond. In this study, MCEWMA chart in Statistical Process Control (SPC) is mainly used to monitor autocorrelated data and its application on daily returns of securities investment data has gained widespread attention among statistician. However, this chart has always been influence by inaccurate estimation, whether on base model or its limit, due to produce large error and high of probability of signalling out-of-control process for false alarm study. To overcome this problem, a bootstrap approach used in this study, by hybridise it on MCEWMA base model to construct a new chart, i.e. Bootstrap MCEWMA (BMCEWMA) chart. The hybrid model, BMCEWMA, will be applied to daily returns of sukuk Musyarakah for Rantau Abang Capital Bhd. The performance of BMCEWMA base model showed that its more effective compare to real model, MCEWMA based on smaller error estimation, shorter the confidence interval and smaller false alarm. In other word, hybrid chart reduce the variability which shown by smaller error and false alarm. It concludes that the application of BMCEWMA is better than MCEWMA.
On a Calculus-based Statistics Course for Life Science Students

PubMed Central

2010-01-01

The choice of pedagogy in statistics should take advantage of the quantitative capabilities and scientific background of the students. In this article, we propose a model for a statistics course that assumes student competency in calculus and a broadening knowledge in biology. We illustrate our methods and practices through examples from the curriculum. PMID:20810962
Combining Statistics and Physics to Improve Climate Downscaling

NASA Astrophysics Data System (ADS)

Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

2017-12-01

Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.
Data-based Non-Markovian Model Inference

NASA Astrophysics Data System (ADS)

Ghil, Michael

2015-04-01

This talk concentrates on obtaining stable and efficient data-based models for simulation and prediction in the geosciences and life sciences. The proposed model derivation relies on using a multivariate time series of partial observations from a large-dimensional system, and the resulting low-order models are compared with the optimal closures predicted by the non-Markovian Mori-Zwanzig formalism of statistical physics. Multilayer stochastic models (MSMs) are introduced as both a very broad generalization and a time-continuous limit of existing multilevel, regression-based approaches to data-based closure, in particular of empirical model reduction (EMR). We show that the multilayer structure of MSMs can provide a natural Markov approximation to the generalized Langevin equation (GLE) of the Mori-Zwanzig formalism. A simple correlation-based stopping criterion for an EMR-MSM model is derived to assess how well it approximates the GLE solution. Sufficient conditions are given for the nonlinear cross-interactions between the constitutive layers of a given MSM to guarantee the existence of a global random attractor. This existence ensures that no blow-up can occur for a very broad class of MSM applications. The EMR-MSM methodology is first applied to a conceptual, nonlinear, stochastic climate model of coupled slow and fast variables, in which only slow variables are observed. The resulting reduced model with energy-conserving nonlinearities captures the main statistical features of the slow variables, even when there is no formal scale separation and the fast variables are quite energetic. Second, an MSM is shown to successfully reproduce the statistics of a partially observed, generalized Lokta-Volterra model of population dynamics in its chaotic regime. The positivity constraint on the solutions' components replaces here the quadratic-energy-preserving constraint of fluid-flow problems and it successfully prevents blow-up. This work is based on a close collaboration with M.D. Chekroun, D. Kondrashov, S. Kravtsov and A.W. Robertson.

Space station software reliability analysis based on failures observed during testing at the multisystem integration facility

NASA Technical Reports Server (NTRS)

Tamayo, Tak Chai

1987-01-01

Quality of software not only is vital to the successful operation of the space station, it is also an important factor in establishing testing requirements, time needed for software verification and integration as well as launching schedules for the space station. Defense of management decisions can be greatly strengthened by combining engineering judgments with statistical analysis. Unlike hardware, software has the characteristics of no wearout and costly redundancies, thus making traditional statistical analysis not suitable in evaluating reliability of software. A statistical model was developed to provide a representation of the number as well as types of failures occur during software testing and verification. From this model, quantitative measure of software reliability based on failure history during testing are derived. Criteria to terminate testing based on reliability objectives and methods to estimate the expected number of fixings required are also presented.
Discrete statistical model of fatigue crack growth in a Ni-base superalloy, capable of life prediction

NASA Astrophysics Data System (ADS)

Boyd-Lee, Ashley; King, Julia

1992-07-01

A discrete statistical model of fatigue crack growth in a nickel base superalloy Waspaloy, which is quantitative from the start of the short crack regime to failure, is presented. Instantaneous crack growth rate distributions and persistence of arrest distributions are used to compute fatigue lives and worst case scenarios without extrapolation. The basis of the model is non-material specific, it provides an improved method of analyzing crack growth rate data. For Waspaloy, the model shows the importance of good bulk fatigue crack growth resistance to resist early short fatigue crack growth and the importance of maximizing crack arrest both by the presence of a proportion of small grains and by maximizing grain boundary corrugation.
Evaluating sufficient similarity for drinking-water disinfection by-product (DBP) mixtures with bootstrap hypothesis test procedures.

PubMed

Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn

2009-01-01

In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
Probabilistic Modeling and Visualization of the Flexibility in Morphable Models

NASA Astrophysics Data System (ADS)

Lüthi, M.; Albrecht, T.; Vetter, T.

Statistical shape models, and in particular morphable models, have gained widespread use in computer vision, computer graphics and medical imaging. Researchers have started to build models of almost any anatomical structure in the human body. While these models provide a useful prior for many image analysis task, relatively little information about the shape represented by the morphable model is exploited. We propose a method for computing and visualizing the remaining flexibility, when a part of the shape is fixed. Our method, which is based on Probabilistic PCA, not only leads to an approach for reconstructing the full shape from partial information, but also allows us to investigate and visualize the uncertainty of a reconstruction. To show the feasibility of our approach we performed experiments on a statistical model of the human face and the femur bone. The visualization of the remaining flexibility allows for greater insight into the statistical properties of the shape.
Statistical alignment: computational properties, homology testing and goodness-of-fit.

PubMed

Hein, J; Wiuf, C; Knudsen, B; Møller, M B; Wibling, G

2000-09-08

The model of insertions and deletions in biological sequences, first formulated by Thorne, Kishino, and Felsenstein in 1991 (the TKF91 model), provides a basis for performing alignment within a statistical framework. Here we investigate this model.Firstly, we show how to accelerate the statistical alignment algorithms several orders of magnitude. The main innovations are to confine likelihood calculations to a band close to the similarity based alignment, to get good initial guesses of the evolutionary parameters and to apply an efficient numerical optimisation algorithm for finding the maximum likelihood estimate. In addition, the recursions originally presented by Thorne, Kishino and Felsenstein can be simplified. Two proteins, about 1500 amino acids long, can be analysed with this method in less than five seconds on a fast desktop computer, which makes this method practical for actual data analysis.Secondly, we propose a new homology test based on this model, where homology means that an ancestor to a sequence pair can be found finitely far back in time. This test has statistical advantages relative to the traditional shuffle test for proteins.Finally, we describe a goodness-of-fit test, that allows testing the proposed insertion-deletion (indel) process inherent to this model and find that real sequences (here globins) probably experience indels longer than one, contrary to what is assumed by the model. Copyright 2000 Academic Press.
Comparison of individual-based model output to data using a model of walleye pollock early life history in the Gulf of Alaska

NASA Astrophysics Data System (ADS)

Hinckley, Sarah; Parada, Carolina; Horne, John K.; Mazur, Michael; Woillez, Mathieu

2016-10-01

Biophysical individual-based models (IBMs) have been used to study aspects of early life history of marine fishes such as recruitment, connectivity of spawning and nursery areas, and marine reserve design. However, there is no consistent approach to validating the spatial outputs of these models. In this study, we hope to rectify this gap. We document additions to an existing individual-based biophysical model for Alaska walleye pollock (Gadus chalcogrammus), some simulations made with this model and methods that were used to describe and compare spatial output of the model versus field data derived from ichthyoplankton surveys in the Gulf of Alaska. We used visual methods (e.g. distributional centroids with directional ellipses), several indices (such as a Normalized Difference Index (NDI), and an Overlap Coefficient (OC), and several statistical methods: the Syrjala method, the Getis-Ord Gi* statistic, and a geostatistical method for comparing spatial indices. We assess the utility of these different methods in analyzing spatial output and comparing model output to data, and give recommendations for their appropriate use. Visual methods are useful for initial comparisons of model and data distributions. Metrics such as the NDI and OC give useful measures of co-location and overlap, but care must be taken in discretizing the fields into bins. The Getis-Ord Gi* statistic is useful to determine the patchiness of the fields. The Syrjala method is an easily implemented statistical measure of the difference between the fields, but does not give information on the details of the distributions. Finally, the geostatistical comparison of spatial indices gives good information of details of the distributions and whether they differ significantly between the model and the data. We conclude that each technique gives quite different information about the model-data distribution comparison, and that some are easy to apply and some more complex. We also give recommendations for a multistep process to validate spatial output from IBMs.
An astronomer's guide to period searching

NASA Astrophysics Data System (ADS)

Schwarzenberg-Czerny, A.

2003-03-01

We concentrate on analysis of unevenly sampled time series, interrupted by periodic gaps, as often encountered in astronomy. While some of our conclusions may appear surprising, all are based on classical statistical principles of Fisher & successors. Except for discussion of the resolution issues, it is best for the reader to forget temporarily about Fourier transforms and to concentrate on problems of fitting of a time series with a model curve. According to their statistical content we divide the issues into several sections, consisting of: (ii) statistical numerical aspects of model fitting, (iii) evaluation of fitted models as hypotheses testing, (iv) the role of the orthogonal models in signal detection (v) conditions for equivalence of periodograms (vi) rating sensitivity by test power. An experienced observer working with individual objects would benefit little from formalized statistical approach. However, we demonstrate the usefulness of this approach in evaluation of performance of periodograms and in quantitative design of large variability surveys.
QSAR models for anti-malarial activity of 4-aminoquinolines.

PubMed

Masand, Vijay H; Toropov, Andrey A; Toropova, Alla P; Mahajan, Devidas T

2014-03-01

In the present study, predictive quantitative structure - activity relationship (QSAR) models for anti-malarial activity of 4-aminoquinolines have been developed. CORAL, which is freely available on internet (http://www.insilico.eu/coral), has been used as a tool of QSAR analysis to establish statistically robust QSAR model of anti-malarial activity of 4-aminoquinolines. Six random splits into the visible sub-system of the training and invisible subsystem of validation were examined. Statistical qualities for these splits vary, but in all these cases, statistical quality of prediction for anti-malarial activity was quite good. The optimal SMILES-based descriptor was used to derive the single descriptor based QSAR model for a data set of 112 aminoquinolones. All the splits had r(2)> 0.85 and r(2)> 0.78 for subtraining and validation sets, respectively. The three parametric multilinear regression (MLR) QSAR model has Q(2) = 0.83, R(2) = 0.84 and F = 190.39. The anti-malarial activity has strong correlation with presence/absence of nitrogen and oxygen at a topological distance of six.
EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 1. OVERVIEW

EPA Science Inventory

The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four report volumes. Moreover, the tests are generally applicable to other model evaluation problem...
Statistical performance and information content of time lag analysis and redundancy analysis in time series modeling.

PubMed

Angeler, David G; Viedma, Olga; Moreno, José M

2009-11-01

Time lag analysis (TLA) is a distance-based approach used to study temporal dynamics of ecological communities by measuring community dissimilarity over increasing time lags. Despite its increased use in recent years, its performance in comparison with other more direct methods (i.e., canonical ordination) has not been evaluated. This study fills this gap using extensive simulations and real data sets from experimental temporary ponds (true zooplankton communities) and landscape studies (landscape categories as pseudo-communities) that differ in community structure and anthropogenic stress history. Modeling time with a principal coordinate of neighborhood matrices (PCNM) approach, the canonical ordination technique (redundancy analysis; RDA) consistently outperformed the other statistical tests (i.e., TLAs, Mantel test, and RDA based on linear time trends) using all real data. In addition, the RDA-PCNM revealed different patterns of temporal change, and the strength of each individual time pattern, in terms of adjusted variance explained, could be evaluated, It also identified species contributions to these patterns of temporal change. This additional information is not provided by distance-based methods. The simulation study revealed better Type I error properties of the canonical ordination techniques compared with the distance-based approaches when no deterministic component of change was imposed on the communities. The simulation also revealed that strong emphasis on uniform deterministic change and low variability at other temporal scales is needed to result in decreased statistical power of the RDA-PCNM approach relative to the other methods. Based on the statistical performance of and information content provided by RDA-PCNM models, this technique serves ecologists as a powerful tool for modeling temporal change of ecological (pseudo-) communities.
Assessment and prediction of inter-joint upper limb movement correlations based on kinematic analysis and statistical regression

NASA Astrophysics Data System (ADS)

Toth-Tascau, Mirela; Balanean, Flavia; Krepelka, Mircea

2013-10-01

Musculoskeletal impairment of the upper limb can cause difficulties in performing basic daily activities. Three dimensional motion analyses can provide valuable data of arm movement in order to precisely determine arm movement and inter-joint coordination. The purpose of this study was to develop a method to evaluate the degree of impairment based on the influence of shoulder movements in the amplitude of elbow flexion and extension based on the assumption that a lack of motion of the elbow joint will be compensated by an increased shoulder activity. In order to develop and validate a statistical model, one healthy young volunteer has been involved in the study. The activity of choice simulated blowing the nose, starting from a slight flexion of the elbow and raising the hand until the middle finger touches the tip of the nose and return to the start position. Inter-joint coordination between the elbow and shoulder movements showed significant correlation. Statistical regression was used to fit an equation model describing the influence of shoulder movements on the elbow mobility. The study provides a brief description of the kinematic analysis protocol and statistical models that may be useful in describing the relation between inter-joint movements of daily activities.
Statistical bias correction method applied on CMIP5 datasets over the Indian region during the summer monsoon season for climate change applications

NASA Astrophysics Data System (ADS)

Prasanna, V.

2018-01-01

This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.
Significance testing of rules in rule-based models of human problem solving

NASA Technical Reports Server (NTRS)

Lewis, C. M.; Hammer, J. M.

1986-01-01

Rule-based models of human problem solving have typically not been tested for statistical significance. Three methods of testing rules - analysis of variance, randomization, and contingency tables - are presented. Advantages and disadvantages of the methods are also described.
Interactive classification and content-based retrieval of tissue images

NASA Astrophysics Data System (ADS)

Aksoy, Selim; Marchisio, Giovanni B.; Tusk, Carsten; Koperski, Krzysztof

2002-11-01

We describe a system for interactive classification and retrieval of microscopic tissue images. Our system models tissues in pixel, region and image levels. Pixel level features are generated using unsupervised clustering of color and texture values. Region level features include shape information and statistics of pixel level feature values. Image level features include statistics and spatial relationships of regions. To reduce the gap between low-level features and high-level expert knowledge, we define the concept of prototype regions. The system learns the prototype regions in an image collection using model-based clustering and density estimation. Different tissue types are modeled using spatial relationships of these regions. Spatial relationships are represented by fuzzy membership functions. The system automatically selects significant relationships from training data and builds models which can also be updated using user relevance feedback. A Bayesian framework is used to classify tissues based on these models. Preliminary experiments show that the spatial relationship models we developed provide a flexible and powerful framework for classification and retrieval of tissue images.
Probabilistic inversion with graph cuts: Application to the Boise Hydrogeophysical Research Site

NASA Astrophysics Data System (ADS)

Pirot, Guillaume; Linde, Niklas; Mariethoz, Grégoire; Bradford, John H.

2017-02-01

Inversion methods that build on multiple-point statistics tools offer the possibility to obtain model realizations that are not only in agreement with field data, but also with conceptual geological models that are represented by training images. A recent inversion approach based on patch-based geostatistical resimulation using graph cuts outperforms state-of-the-art multiple-point statistics methods when applied to synthetic inversion examples featuring continuous and discontinuous property fields. Applications of multiple-point statistics tools to field data are challenging due to inevitable discrepancies between actual subsurface structure and the assumptions made in deriving the training image. We introduce several amendments to the original graph cut inversion algorithm and present a first-ever field application by addressing porosity estimation at the Boise Hydrogeophysical Research Site, Boise, Idaho. We consider both a classical multi-Gaussian and an outcrop-based prior model (training image) that are in agreement with available porosity data. When conditioning to available crosshole ground-penetrating radar data using Markov chain Monte Carlo, we find that the posterior realizations honor overall both the characteristics of the prior models and the geophysical data. The porosity field is inverted jointly with the measurement error and the petrophysical parameters that link dielectric permittivity to porosity. Even though the multi-Gaussian prior model leads to posterior realizations with higher likelihoods, the outcrop-based prior model shows better convergence. In addition, it offers geologically more realistic posterior realizations and it better preserves the full porosity range of the prior.
Manipulating measurement scales in medical statistical analysis and data mining: A review of methodologies

PubMed Central

Marateb, Hamid Reza; Mansourian, Marjan; Adibi, Peyman; Farina, Dario

2014-01-01

Background: selecting the correct statistical test and data mining method depends highly on the measurement scale of data, type of variables, and purpose of the analysis. Different measurement scales are studied in details and statistical comparison, modeling, and data mining methods are studied based upon using several medical examples. We have presented two ordinal–variables clustering examples, as more challenging variable in analysis, using Wisconsin Breast Cancer Data (WBCD). Ordinal-to-Interval scale conversion example: a breast cancer database of nine 10-level ordinal variables for 683 patients was analyzed by two ordinal-scale clustering methods. The performance of the clustering methods was assessed by comparison with the gold standard groups of malignant and benign cases that had been identified by clinical tests. Results: the sensitivity and accuracy of the two clustering methods were 98% and 96%, respectively. Their specificity was comparable. Conclusion: by using appropriate clustering algorithm based on the measurement scale of the variables in the study, high performance is granted. Moreover, descriptive and inferential statistics in addition to modeling approach must be selected based on the scale of the variables. PMID:24672565
Statistical classification of drug incidents due to look-alike sound-alike mix-ups.

PubMed

Wong, Zoie Shui Yee

2016-06-01

It has been recognised that medication names that look or sound similar are a cause of medication errors. This study builds statistical classifiers for identifying medication incidents due to look-alike sound-alike mix-ups. A total of 227 patient safety incident advisories related to medication were obtained from the Canadian Patient Safety Institute's Global Patient Safety Alerts system. Eight feature selection strategies based on frequent terms, frequent drug terms and constituent terms were performed. Statistical text classifiers based on logistic regression, support vector machines with linear, polynomial, radial-basis and sigmoid kernels and decision tree were trained and tested. The models developed achieved an average accuracy of above 0.8 across all the model settings. The receiver operating characteristic curves indicated the classifiers performed reasonably well. The results obtained in this study suggest that statistical text classification can be a feasible method for identifying medication incidents due to look-alike sound-alike mix-ups based on a database of advisories from Global Patient Safety Alerts. © The Author(s) 2014.
OPR-PPR, a Computer Program for Assessing Data Importance to Model Predictions Using Linear Statistics

USGS Publications Warehouse

Tonkin, Matthew J.; Tiedeman, Claire; Ely, D. Matthew; Hill, Mary C.

2007-01-01

The OPR-PPR program calculates the Observation-Prediction (OPR) and Parameter-Prediction (PPR) statistics that can be used to evaluate the relative importance of various kinds of data to simulated predictions. The data considered fall into three categories: (1) existing observations, (2) potential observations, and (3) potential information about parameters. The first two are addressed by the OPR statistic; the third is addressed by the PPR statistic. The statistics are based on linear theory and measure the leverage of the data, which depends on the location, the type, and possibly the time of the data being considered. For example, in a ground-water system the type of data might be a head measurement at a particular location and time. As a measure of leverage, the statistics do not take into account the value of the measurement. As linear measures, the OPR and PPR statistics require minimal computational effort once sensitivities have been calculated. Sensitivities need to be calculated for only one set of parameter values; commonly these are the values estimated through model calibration. OPR-PPR can calculate the OPR and PPR statistics for any mathematical model that produces the necessary OPR-PPR input files. In this report, OPR-PPR capabilities are presented in the context of using the ground-water model MODFLOW-2000 and the universal inverse program UCODE_2005. The method used to calculate the OPR and PPR statistics is based on the linear equation for prediction standard deviation. Using sensitivities and other information, OPR-PPR calculates (a) the percent increase in the prediction standard deviation that results when one or more existing observations are omitted from the calibration data set; (b) the percent decrease in the prediction standard deviation that results when one or more potential observations are added to the calibration data set; or (c) the percent decrease in the prediction standard deviation that results when potential information on one or more parameters is added.
Variability aware compact model characterization for statistical circuit design optimization

NASA Astrophysics Data System (ADS)

Qiao, Ying; Qian, Kun; Spanos, Costas J.

2012-03-01

Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose an efficient variabilityaware compact model characterization methodology based on the linear propagation of variance. Hierarchical spatial variability patterns of selected compact model parameters are directly calculated from transistor array test structures. This methodology has been implemented and tested using transistor I-V measurements and the EKV-EPFL compact model. Calculation results compare well to full-wafer direct model parameter extractions. Further studies are done on the proper selection of both compact model parameters and electrical measurement metrics used in the method.
Applications of spatial statistical network models to stream data

USGS Publications Warehouse

Isaak, Daniel J.; Peterson, Erin E.; Ver Hoef, Jay M.; Wenger, Seth J.; Falke, Jeffrey A.; Torgersen, Christian E.; Sowder, Colin; Steel, E. Ashley; Fortin, Marie-Josée; Jordan, Chris E.; Ruesch, Aaron S.; Som, Nicholas; Monestiez, Pascal

2014-01-01

Streams and rivers host a significant portion of Earth's biodiversity and provide important ecosystem services for human populations. Accurate information regarding the status and trends of stream resources is vital for their effective conservation and management. Most statistical techniques applied to data measured on stream networks were developed for terrestrial applications and are not optimized for streams. A new class of spatial statistical model, based on valid covariance structures for stream networks, can be used with many common types of stream data (e.g., water quality attributes, habitat conditions, biological surveys) through application of appropriate distributions (e.g., Gaussian, binomial, Poisson). The spatial statistical network models account for spatial autocorrelation (i.e., nonindependence) among measurements, which allows their application to databases with clustered measurement locations. Large amounts of stream data exist in many areas where spatial statistical analyses could be used to develop novel insights, improve predictions at unsampled sites, and aid in the design of efficient monitoring strategies at relatively low cost. We review the topic of spatial autocorrelation and its effects on statistical inference, demonstrate the use of spatial statistics with stream datasets relevant to common research and management questions, and discuss additional applications and development potential for spatial statistics on stream networks. Free software for implementing the spatial statistical network models has been developed that enables custom applications with many stream databases.

Survival modeling for the estimation of transition probabilities in model-based economic evaluations in the absence of individual patient data: a tutorial.

PubMed

Diaby, Vakaramoko; Adunlin, Georges; Montero, Alberto J

2014-02-01

Survival modeling techniques are increasingly being used as part of decision modeling for health economic evaluations. As many models are available, it is imperative for interested readers to know about the steps in selecting and using the most suitable ones. The objective of this paper is to propose a tutorial for the application of appropriate survival modeling techniques to estimate transition probabilities, for use in model-based economic evaluations, in the absence of individual patient data (IPD). An illustration of the use of the tutorial is provided based on the final progression-free survival (PFS) analysis of the BOLERO-2 trial in metastatic breast cancer (mBC). An algorithm was adopted from Guyot and colleagues, and was then run in the statistical package R to reconstruct IPD, based on the final PFS analysis of the BOLERO-2 trial. It should be emphasized that the reconstructed IPD represent an approximation of the original data. Afterwards, we fitted parametric models to the reconstructed IPD in the statistical package Stata. Both statistical and graphical tests were conducted to verify the relative and absolute validity of the findings. Finally, the equations for transition probabilities were derived using the general equation for transition probabilities used in model-based economic evaluations, and the parameters were estimated from fitted distributions. The results of the application of the tutorial suggest that the log-logistic model best fits the reconstructed data from the latest published Kaplan-Meier (KM) curves of the BOLERO-2 trial. Results from the regression analyses were confirmed graphically. An equation for transition probabilities was obtained for each arm of the BOLERO-2 trial. In this paper, a tutorial was proposed and used to estimate the transition probabilities for model-based economic evaluation, based on the results of the final PFS analysis of the BOLERO-2 trial in mBC. The results of our study can serve as a basis for any model (Markov) that needs the parameterization of transition probabilities, and only has summary KM plots available.
Prognostic factors in patients with advanced cancer: use of the patient-generated subjective global assessment in survival prediction.

PubMed

Martin, Lisa; Watanabe, Sharon; Fainsinger, Robin; Lau, Francis; Ghosh, Sunita; Quan, Hue; Atkins, Marlis; Fassbender, Konrad; Downing, G Michael; Baracos, Vickie

2010-10-01

To determine whether elements of a standard nutritional screening assessment are independently prognostic of survival in patients with advanced cancer. A prospective nested cohort of patients with metastatic cancer were accrued from different units of a Regional Palliative Care Program. Patients completed a nutritional screen on admission. Data included age, sex, cancer site, height, weight history, dietary intake, 13 nutrition impact symptoms, and patient- and physician-reported performance status (PS). Univariate and multivariate survival analyses were conducted. Concordance statistics (c-statistics) were used to test the predictive accuracy of models based on training and validation sets; a c-statistic of 0.5 indicates the model predicts the outcome as well as chance; perfect prediction has a c-statistic of 1.0. A training set of patients in palliative home care (n = 1,164) was used to identify prognostic variables. Primary disease site, PS, short-term weight change (either gain or loss), dietary intake, and dysphagia predicted survival in multivariate analysis (P < .05). A model including only patients separated by disease site and PS with high c-statistics between predicted and observed responses for survival in the training set (0.90) and validation set (0.88; n = 603). The addition of weight change, dietary intake, and dysphagia did not further improve the c-statistic of the model. The c-statistic was also not altered by substituting physician-rated palliative PS for patient-reported PS. We demonstrate a high probability of concordance between predicted and observed survival for patients in distinct palliative care settings (home care, tertiary inpatient, ambulatory outpatient) based on patient-reported information.
Systems Models for Transportation Problems : Volume 1. Introducing a Systems Science for Transportation Planning.

DOT National Transportation Integrated Search

1976-03-01

This introductory portion of a system science for tranportation planning, which is based on the statistical physics of ensembles, a foundations laid on how statistical mechanics, equilibrium thermodynamics, and near equilbrium thermodynamics can be u...
A Stochastic Fractional Dynamics Model of Rainfall Statistics

NASA Astrophysics Data System (ADS)

Kundu, Prasun; Travis, James

2013-04-01

Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is designed to faithfully reflect the scale dependence and is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. The main restriction is the assumption that the statistics of the precipitation field is spatially homogeneous and isotropic and stationary in time. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of the radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment. Some data sets containing periods of non-stationary behavior that involves occasional anomalously correlated rain events, present a challenge for the model.
Model averaging techniques for quantifying conceptual model uncertainty.

PubMed

Singh, Abhishek; Mishra, Srikanta; Ruskauff, Greg

2010-01-01

In recent years a growing understanding has emerged regarding the need to expand the modeling paradigm to include conceptual model uncertainty for groundwater models. Conceptual model uncertainty is typically addressed by formulating alternative model conceptualizations and assessing their relative likelihoods using statistical model averaging approaches. Several model averaging techniques and likelihood measures have been proposed in the recent literature for this purpose with two broad categories--Monte Carlo-based techniques such as Generalized Likelihood Uncertainty Estimation or GLUE (Beven and Binley 1992) and criterion-based techniques that use metrics such as the Bayesian and Kashyap Information Criteria (e.g., the Maximum Likelihood Bayesian Model Averaging or MLBMA approach proposed by Neuman 2003) and Akaike Information Criterion-based model averaging (AICMA) (Poeter and Anderson 2005). These different techniques can often lead to significantly different relative model weights and ranks because of differences in the underlying statistical assumptions about the nature of model uncertainty. This paper provides a comparative assessment of the four model averaging techniques (GLUE, MLBMA with KIC, MLBMA with BIC, and AIC-based model averaging) mentioned above for the purpose of quantifying the impacts of model uncertainty on groundwater model predictions. Pros and cons of each model averaging technique are examined from a practitioner's perspective using two groundwater modeling case studies. Recommendations are provided regarding the use of these techniques in groundwater modeling practice.
Sample Size and Statistical Conclusions from Tests of Fit to the Rasch Model According to the Rasch Unidimensional Measurement Model (Rumm) Program in Health Outcome Measurement.

PubMed

Hagell, Peter; Westergren, Albert

Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
Bladder cancer mapping in Libya based on standardized morbidity ratio and log-normal model

NASA Astrophysics Data System (ADS)

Alhdiri, Maryam Ahmed; Samat, Nor Azah; Mohamed, Zulkifley

2017-05-01

Disease mapping contains a set of statistical techniques that detail maps of rates based on estimated mortality, morbidity, and prevalence. A traditional approach to measure the relative risk of the disease is called Standardized Morbidity Ratio (SMR). It is the ratio of an observed and expected number of accounts in an area, which has the greatest uncertainty if the disease is rare or if geographical area is small. Therefore, Bayesian models or statistical smoothing based on Log-normal model are introduced which might solve SMR problem. This study estimates the relative risk for bladder cancer incidence in Libya from 2006 to 2007 based on the SMR and log-normal model, which were fitted to data using WinBUGS software. This study starts with a brief review of these models, starting with the SMR method and followed by the log-normal model, which is then applied to bladder cancer incidence in Libya. All results are compared using maps and tables. The study concludes that the log-normal model gives better relative risk estimates compared to the classical method. The log-normal model has can overcome the SMR problem when there is no observed bladder cancer in an area.
A statistical framework for protein quantitation in bottom-up MS-based proteomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karpievitch, Yuliya; Stanley, Jeffrey R.; Taverner, Thomas

2009-08-15

ABSTRACT Motivation: Quantitative mass spectrometry-based proteomics requires protein-level estimates and confidence measures. Challenges include the presence of low-quality or incorrectly identified peptides and widespread, informative, missing data. Furthermore, models are required for rolling peptide-level information up to the protein level. Results: We present a statistical model for protein abundance in terms of peptide peak intensities, applicable to both label-based and label-free quantitation experiments. The model allows for both random and censoring missingness mechanisms and provides naturally for protein-level estimates and confidence measures. The model is also used to derive automated filtering and imputation routines. Three LC-MS datasets are used tomore » illustrate the methods. Availability: The software has been made available in the open-source proteomics platform DAnTE (Polpitiya et al. (2008)) (http://omics.pnl.gov/software/). Contact: adabney@stat.tamu.edu« less
A place for agent-based models. Comment on "Statistical physics of crime: A review" by M.R. D'Orsogna and M. Perc

NASA Astrophysics Data System (ADS)

Barbaro, Alethea

2015-03-01

Agent-based models have been widely applied in theoretical ecology to explain migrations and other collective animal movements [2,5,8]. As D'Orsogna and Perc have expertly highlighted in [6], the recent emergence of crime modeling has opened another interesting avenue for mathematical investigation. The area of crime modeling is particularly suited to agent-based models, because these models offer a great deal of flexibility within the model and also ease of communication among criminologist, law enforcement and modelers.
Testing homogeneity in Weibull-regression models.

PubMed

Bolfarine, Heleno; Valença, Dione M

2005-10-01

In survival studies with families or geographical units it may be of interest testing whether such groups are homogeneous for given explanatory variables. In this paper we consider score type tests for group homogeneity based on a mixing model in which the group effect is modelled as a random variable. As opposed to hazard-based frailty models, this model presents survival times that conditioned on the random effect, has an accelerated failure time representation. The test statistics requires only estimation of the conventional regression model without the random effect and does not require specifying the distribution of the random effect. The tests are derived for a Weibull regression model and in the uncensored situation, a closed form is obtained for the test statistic. A simulation study is used for comparing the power of the tests. The proposed tests are applied to real data sets with censored data.
Fecal indicator organism modeling and microbial source tracking in environmental waters: Chapter 3.4.6

USGS Publications Warehouse

Nevers, Meredith; Byappanahalli, Muruleedhara; Phanikumar, Mantha S.; Whitman, Richard L.

2016-01-01

Mathematical models have been widely applied to surface waters to estimate rates of settling, resuspension, flow, dispersion, and advection in order to calculate movement of particles that influence water quality. Of particular interest are the movement, survival, and persistence of microbial pathogens or their surrogates, which may contaminate recreational water, drinking water, or shellfish. Most models devoted to microbial water quality have been focused on fecal indicator organisms (FIO), which act as a surrogate for pathogens and viruses. Process-based modeling and statistical modeling have been used to track contamination events to source and to predict future events. The use of these two types of models require different levels of expertise and input; process-based models rely on theoretical physical constructs to explain present conditions and biological distribution while data-based, statistical models use extant paired data to do the same. The selection of the appropriate model and interpretation of results is critical to proper use of these tools in microbial source tracking. Integration of the modeling approaches could provide insight for tracking and predicting contamination events in real time. A review of modeling efforts reveals that process-based modeling has great promise for microbial source tracking efforts; further, combining the understanding of physical processes influencing FIO contamination developed with process-based models and molecular characterization of the population by gene-based (i.e., biological) or chemical markers may be an effective approach for locating sources and remediating contamination in order to protect human health better.
Adapt-Mix: learning local genetic correlation structure improves summary statistics-based analyses

PubMed Central

Park, Danny S.; Brown, Brielin; Eng, Celeste; Huntsman, Scott; Hu, Donglei; Torgerson, Dara G.; Burchard, Esteban G.; Zaitlen, Noah

2015-01-01

Motivation: Approaches to identifying new risk loci, training risk prediction models, imputing untyped variants and fine-mapping causal variants from summary statistics of genome-wide association studies are playing an increasingly important role in the human genetics community. Current summary statistics-based methods rely on global ‘best guess’ reference panels to model the genetic correlation structure of the dataset being studied. This approach, especially in admixed populations, has the potential to produce misleading results, ignores variation in local structure and is not feasible when appropriate reference panels are missing or small. Here, we develop a method, Adapt-Mix, that combines information across all available reference panels to produce estimates of local genetic correlation structure for summary statistics-based methods in arbitrary populations. Results: We applied Adapt-Mix to estimate the genetic correlation structure of both admixed and non-admixed individuals using simulated and real data. We evaluated our method by measuring the performance of two summary statistics-based methods: imputation and joint-testing. When using our method as opposed to the current standard of ‘best guess’ reference panels, we observed a 28% decrease in mean-squared error for imputation and a 73.7% decrease in mean-squared error for joint-testing. Availability and implementation: Our method is publicly available in a software package called ADAPT-Mix available at https://github.com/dpark27/adapt_mix. Contact: noah.zaitlen@ucsf.edu PMID:26072481
Rainfall runoff modelling of the Upper Ganga and Brahmaputra basins using PERSiST.

PubMed

Futter, M N; Whitehead, P G; Sarkar, S; Rodda, H; Crossman, J

2015-06-01

There are ongoing discussions about the appropriate level of complexity and sources of uncertainty in rainfall runoff models. Simulations for operational hydrology, flood forecasting or nutrient transport all warrant different levels of complexity in the modelling approach. More complex model structures are appropriate for simulations of land-cover dependent nutrient transport while more parsimonious model structures may be adequate for runoff simulation. The appropriate level of complexity is also dependent on data availability. Here, we use PERSiST; a simple, semi-distributed dynamic rainfall-runoff modelling toolkit to simulate flows in the Upper Ganges and Brahmaputra rivers. We present two sets of simulations driven by single time series of daily precipitation and temperature using simple (A) and complex (B) model structures based on uniform and hydrochemically relevant land covers respectively. Models were compared based on ensembles of Bayesian Information Criterion (BIC) statistics. Equifinality was observed for parameters but not for model structures. Model performance was better for the more complex (B) structural representations than for parsimonious model structures. The results show that structural uncertainty is more important than parameter uncertainty. The ensembles of BIC statistics suggested that neither structural representation was preferable in a statistical sense. Simulations presented here confirm that relatively simple models with limited data requirements can be used to credibly simulate flows and water balance components needed for nutrient flux modelling in large, data-poor basins.
Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

NASA Astrophysics Data System (ADS)

Ren, W. X.; Lin, Y. Q.; Fang, S. E.

2011-11-01

One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.
Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

NASA Astrophysics Data System (ADS)

Zaichik, Leonid I.; Alipchenkov, Vladimir M.

2009-10-01

The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.
Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

NASA Technical Reports Server (NTRS)

Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

1992-01-01

The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.
Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

DTIC Science & Technology

2010-06-01

Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler
The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels.

PubMed

Gontscharuk, Veronika; Landwehr, Sandra; Finner, Helmut

2015-01-01

The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov-Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov-Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

PubMed Central

Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

2008-01-01

Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909
Optimization of Analytical Potentials for Coarse-Grained Biopolymer Models.

PubMed

Mereghetti, Paolo; Maccari, Giuseppe; Spampinato, Giulia Lia Beatrice; Tozzini, Valentina

2016-08-25

The increasing trend in the recent literature on coarse grained (CG) models testifies their impact in the study of complex systems. However, the CG model landscape is variegated: even considering a given resolution level, the force fields are very heterogeneous and optimized with very different parametrization procedures. Along the road for standardization of CG models for biopolymers, here we describe a strategy to aid building and optimization of statistics based analytical force fields and its implementation in the software package AsParaGS (Assisted Parameterization platform for coarse Grained modelS). Our method is based on the use and optimization of analytical potentials, optimized by targeting internal variables statistical distributions by means of the combination of different algorithms (i.e., relative entropy driven stochastic exploration of the parameter space and iterative Boltzmann inversion). This allows designing a custom model that endows the force field terms with a physically sound meaning. Furthermore, the level of transferability and accuracy can be tuned through the choice of statistical data set composition. The method-illustrated by means of applications to helical polypeptides-also involves the analysis of two and three variable distributions, and allows handling issues related to the FF term correlations. AsParaGS is interfaced with general-purpose molecular dynamics codes and currently implements the "minimalist" subclass of CG models (i.e., one bead per amino acid, Cα based). Extensions to nucleic acids and different levels of coarse graining are in the course.

A Statistical-Physics Approach to Language Acquisition and Language Change

NASA Astrophysics Data System (ADS)

Cassandro, Marzio; Collet, Pierre; Galves, Antonio; Galves, Charlotte

1999-02-01

The aim of this paper is to explain why Statistical Physics can help understanding two related linguistic questions. The first question is how to model first language acquisition by a child. The second question is how language change proceeds in time. Our approach is based on a Gibbsian model for the interface between syntax and prosody. We also present a simulated annealing model of language acquisition, which extends the Triggering Learning Algorithm recently introduced in the linguistic literature.
Estimating liver cancer deaths in Thailand based on verbal autopsy study.

PubMed

Waeto, Salwa; Pipatjaturon, Nattakit; Tongkumchum, Phattrawan; Choonpradub, Chamnein; Saelim, Rattikan; Makaje, Nifatamah

2014-01-01

Liver cancer mortality is high in Thailand but utility of related vital statistics is limited due to national vital registration (VR) data being under reported for specific causes of deaths. Accurate methodologies and reliable supplementary data are needed to provide worthy national vital statistics. This study aimed to model liver cancer deaths based on verbal autopsy (VA) study in 2005 to provide more accurate estimates of liver cancer deaths than those reported. The results were used to estimate number of liver cancer deaths during 2000-2009. A verbal autopsy (VA) was carried out in 2005 based on a sample of 9,644 deaths from nine provinces and it provided reliable information on causes of deaths by gender, age group, location of deaths in or outside hospital, and causes of deaths of the VR database. Logistic regression was used to model liver cancer deaths and other variables. The estimated probabilities from the model were applied to liver cancer deaths in the VR database, 2000-2009. Thus, the more accurately VA-estimated numbers of liver cancer deaths were obtained. The model fits the data quite well with sensitivity 0.64. The confidence intervals from statistical model provide the estimates and their precisions. The VA-estimated numbers of liver cancer deaths were higher than the corresponding VR database with inflation factors 1.56 for males and 1.64 for females. The statistical methods used in this study can be applied to available mortality data in developing countries where their national vital registration data are of low quality and supplementary reliable data are available.
Statistical Models of Fracture Relevant to Nuclear-Grade Graphite: Review and Recommendations

NASA Technical Reports Server (NTRS)

Nemeth, Noel N.; Bratton, Robert L.

2011-01-01

The nuclear-grade (low-impurity) graphite needed for the fuel element and moderator material for next-generation (Gen IV) reactors displays large scatter in strength and a nonlinear stress-strain response from damage accumulation. This response can be characterized as quasi-brittle. In this expanded review, relevant statistical failure models for various brittle and quasi-brittle material systems are discussed with regard to strength distribution, size effect, multiaxial strength, and damage accumulation. This includes descriptions of the Weibull, Batdorf, and Burchell models as well as models that describe the strength response of composite materials, which involves distributed damage. Results from lattice simulations are included for a physics-based description of material breakdown. Consideration is given to the predicted transition between brittle and quasi-brittle damage behavior versus the density of damage (level of disorder) within the material system. The literature indicates that weakest-link-based failure modeling approaches appear to be reasonably robust in that they can be applied to materials that display distributed damage, provided that the level of disorder in the material is not too large. The Weibull distribution is argued to be the most appropriate statistical distribution to model the stochastic-strength response of graphite.
Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

PubMed

LaBudde, Robert A; Harnly, James M

2012-01-01

A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
A Modeling Framework for Optimal Computational Resource Allocation Estimation: Considering the Trade-offs between Physical Resolutions, Uncertainty and Computational Costs

NASA Astrophysics Data System (ADS)

Moslehi, M.; de Barros, F.; Rajagopal, R.

2014-12-01

Hydrogeological models that represent flow and transport in subsurface domains are usually large-scale with excessive computational complexity and uncertain characteristics. Uncertainty quantification for predicting flow and transport in heterogeneous formations often entails utilizing a numerical Monte Carlo framework, which repeatedly simulates the model according to a random field representing hydrogeological characteristics of the field. The physical resolution (e.g. grid resolution associated with the physical space) for the simulation is customarily chosen based on recommendations in the literature, independent of the number of Monte Carlo realizations. This practice may lead to either excessive computational burden or inaccurate solutions. We propose an optimization-based methodology that considers the trade-off between the following conflicting objectives: time associated with computational costs, statistical convergence of the model predictions and physical errors corresponding to numerical grid resolution. In this research, we optimally allocate computational resources by developing a modeling framework for the overall error based on a joint statistical and numerical analysis and optimizing the error model subject to a given computational constraint. The derived expression for the overall error explicitly takes into account the joint dependence between the discretization error of the physical space and the statistical error associated with Monte Carlo realizations. The accuracy of the proposed framework is verified in this study by applying it to several computationally extensive examples. Having this framework at hand aims hydrogeologists to achieve the optimum physical and statistical resolutions to minimize the error with a given computational budget. Moreover, the influence of the available computational resources and the geometric properties of the contaminant source zone on the optimum resolutions are investigated. We conclude that the computational cost associated with optimal allocation can be substantially reduced compared with prevalent recommendations in the literature.
Statistical prediction with Kanerva's sparse distributed memory

NASA Technical Reports Server (NTRS)

Rogers, David

1989-01-01

A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.
Statistical inference, the bootstrap, and neural-network modeling with application to foreign exchange rates.

PubMed

White, H; Racine, J

2001-01-01

We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Confidence Intervals from Realizations of Simulated Nuclear Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Younes, W.; Ratkiewicz, A.; Ressler, J. J.

2017-09-28

Various statistical techniques are discussed that can be used to assign a level of confidence in the prediction of models that depend on input data with known uncertainties and correlations. The particular techniques reviewed in this paper are: 1) random realizations of the input data using Monte-Carlo methods, 2) the construction of confidence intervals to assess the reliability of model predictions, and 3) resampling techniques to impose statistical constraints on the input data based on additional information. These techniques are illustrated with a calculation of the keff value, based on the 235U(n, f) and 239Pu (n, f) cross sections.
Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: A case of ragweed (Ambrosia artemisiifolia L.) distribution in China

USGS Publications Warehouse

Hao, Chen; LiJun, Chen; Albright, Thomas P.

2007-01-01

Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.
Equilibrium statistical-thermal models in high-energy physics

NASA Astrophysics Data System (ADS)

Tawfik, Abdel Nasser

2014-05-01

We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.
How to interpret the results of medical time series data analysis: Classical statistical approaches versus dynamic Bayesian network modeling.

PubMed

Onisko, Agnieszka; Druzdzel, Marek J; Austin, R Marshall

2016-01-01

Classical statistics is a well-established approach in the analysis of medical data. While the medical community seems to be familiar with the concept of a statistical analysis and its interpretation, the Bayesian approach, argued by many of its proponents to be superior to the classical frequentist approach, is still not well-recognized in the analysis of medical data. The goal of this study is to encourage data analysts to use the Bayesian approach, such as modeling with graphical probabilistic networks, as an insightful alternative to classical statistical analysis of medical data. This paper offers a comparison of two approaches to analysis of medical time series data: (1) classical statistical approach, such as the Kaplan-Meier estimator and the Cox proportional hazards regression model, and (2) dynamic Bayesian network modeling. Our comparison is based on time series cervical cancer screening data collected at Magee-Womens Hospital, University of Pittsburgh Medical Center over 10 years. The main outcomes of our comparison are cervical cancer risk assessments produced by the three approaches. However, our analysis discusses also several aspects of the comparison, such as modeling assumptions, model building, dealing with incomplete data, individualized risk assessment, results interpretation, and model validation. Our study shows that the Bayesian approach is (1) much more flexible in terms of modeling effort, and (2) it offers an individualized risk assessment, which is more cumbersome for classical statistical approaches.
EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 3. PROGRAM USER'S GUIDE

EPA Science Inventory

The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four volumes. Moreover, the tests are generally applicable to other model evaluation problems. Volu...
EVALUATION OF THE REAL-TIME AIR-QUALITY MODEL USING THE RAPS (REGIONAL AIR POLLUTION STUDY) DATA BASE. VOLUME 4. EVALUATION GUIDE

EPA Science Inventory

The theory and programming of statistical tests for evaluating the Real-Time Air-Quality Model (RAM) using the Regional Air Pollution Study (RAPS) data base are fully documented in four volumes. Moreover, the tests are generally applicable to other model evaluation problems. Volu...
SPATIO-TEMPORAL MODELING OF AGRICULTURAL YIELD DATA WITH AN APPLICATION TO PRICING CROP INSURANCE CONTRACTS

PubMed Central

Ozaki, Vitor A.; Ghosh, Sujit K.; Goodwin, Barry K.; Shirota, Ricardo

2009-01-01

This article presents a statistical model of agricultural yield data based on a set of hierarchical Bayesian models that allows joint modeling of temporal and spatial autocorrelation. This method captures a comprehensive range of the various uncertainties involved in predicting crop insurance premium rates as opposed to the more traditional ad hoc, two-stage methods that are typically based on independent estimation and prediction. A panel data set of county-average yield data was analyzed for 290 counties in the State of Paraná (Brazil) for the period of 1990 through 2002. Posterior predictive criteria are used to evaluate different model specifications. This article provides substantial improvements in the statistical and actuarial methods often applied to the calculation of insurance premium rates. These improvements are especially relevant to situations where data are limited. PMID:19890450
Statistically Modeling I-V Characteristics of CNT-FET with LASSO

NASA Astrophysics Data System (ADS)

Ma, Dongsheng; Ye, Zuochang; Wang, Yan

2017-08-01

With the advent of internet of things (IOT), the need for studying new material and devices for various applications is increasing. Traditionally we build compact models for transistors on the basis of physics. But physical models are expensive and need a very long time to adjust for non-ideal effects. As the vision for the application of many novel devices is not certain or the manufacture process is not mature, deriving generalized accurate physical models for such devices is very strenuous, whereas statistical modeling is becoming a potential method because of its data oriented property and fast implementation. In this paper, one classical statistical regression method, LASSO, is used to model the I-V characteristics of CNT-FET and a pseudo-PMOS inverter simulation based on the trained model is implemented in Cadence. The normalized relative mean square prediction error of the trained model versus experiment sample data and the simulation results show that the model is acceptable for digital circuit static simulation. And such modeling methodology can extend to general devices.
SIG-VISA: Signal-based Vertically Integrated Seismic Monitoring

NASA Astrophysics Data System (ADS)

Moore, D.; Mayeda, K. M.; Myers, S. C.; Russell, S.

2013-12-01

Traditional seismic monitoring systems rely on discrete detections produced by station processing software; however, while such detections may constitute a useful summary of station activity, they discard large amounts of information present in the original recorded signal. We present SIG-VISA (Signal-based Vertically Integrated Seismic Analysis), a system for seismic monitoring through Bayesian inference on seismic signals. By directly modeling the recorded signal, our approach incorporates additional information unavailable to detection-based methods, enabling higher sensitivity and more accurate localization using techniques such as waveform matching. SIG-VISA's Bayesian forward model of seismic signal envelopes includes physically-derived models of travel times and source characteristics as well as Gaussian process (kriging) statistical models of signal properties that combine interpolation of historical data with extrapolation of learned physical trends. Applying Bayesian inference, we evaluate the model on earthquakes as well as the 2009 DPRK test event, demonstrating a waveform matching effect as part of the probabilistic inference, along with results on event localization and sensitivity. In particular, we demonstrate increased sensitivity from signal-based modeling, in which the SIGVISA signal model finds statistical evidence for arrivals even at stations for which the IMS station processing failed to register any detection.
Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

NASA Astrophysics Data System (ADS)

El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

2006-11-01

Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
10 CFR 431.17 - Determination of efficiency.

Code of Federal Regulations, 2014 CFR

2014-01-01

... characteristics of that basic model, and (ii) Based on engineering or statistical analysis, computer simulation or... simulation or modeling, and other analytic evaluation of performance data on which the AEDM is based... applied. (iii) If requested by the Department, the manufacturer shall conduct simulations to predict the...
10 CFR 431.17 - Determination of efficiency.

Code of Federal Regulations, 2012 CFR

2012-01-01

... characteristics of that basic model, and (ii) Based on engineering or statistical analysis, computer simulation or... simulation or modeling, and other analytic evaluation of performance data on which the AEDM is based... applied. (iii) If requested by the Department, the manufacturer shall conduct simulations to predict the...
Nonlinear Wave Chaos and the Random Coupling Model

NASA Astrophysics Data System (ADS)

Zhou, Min; Ott, Edward; Antonsen, Thomas M.; Anlage, Steven

The Random Coupling Model (RCM) has been shown to successfully predict the statistical properties of linear wave chaotic cavities in the highly over-moded regime. It is of interest to extend the RCM to strongly nonlinear systems. To introduce nonlinearity, an active nonlinear circuit is connected to two ports of the wave chaotic 1/4-bowtie cavity. The active nonlinear circuit consists of a frequency multiplier, an amplifier and several passive filters. It acts to double the input frequency in the range from 3.5 GHz to 5 GHz, and operates for microwaves going in only one direction. Measurements are taken between two additional ports of the cavity and we measure the statistics of the second harmonic voltage over an ensemble of realizations of the scattering system. We developed an RCM-based model of this system as two chaotic cavities coupled by means of a nonlinear transfer function. The harmonics received at the output are predicted to be the product of three statistical quantities that describe the three elements correspondingly. Statistical results from simulation, RCM-based modeling, and direct experimental measurements will be compared. ONR under Grant No. N000141512134, AFOSR under COE Grant FA9550-15-1-0171,0 and the Maryland Center for Nanophysics and Advanced Materials.

Statistical use of argonaute expression and RISC assembly in microRNA target identification.

PubMed

Stanhope, Stephen A; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A

2009-09-01

MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies.
Cortical Surround Interactions and Perceptual Salience via Natural Scene Statistics

PubMed Central

Coen-Cagli, Ruben; Dayan, Peter; Schwartz, Odelia

2012-01-01

Spatial context in images induces perceptual phenomena associated with salience and modulates the responses of neurons in primary visual cortex (V1). However, the computational and ecological principles underlying contextual effects are incompletely understood. We introduce a model of natural images that includes grouping and segmentation of neighboring features based on their joint statistics, and we interpret the firing rates of V1 neurons as performing optimal recognition in this model. We show that this leads to a substantial generalization of divisive normalization, a computation that is ubiquitous in many neural areas and systems. A main novelty in our model is that the influence of the context on a target stimulus is determined by their degree of statistical dependence. We optimized the parameters of the model on natural image patches, and then simulated neural and perceptual responses on stimuli used in classical experiments. The model reproduces some rich and complex response patterns observed in V1, such as the contrast dependence, orientation tuning and spatial asymmetry of surround suppression, while also allowing for surround facilitation under conditions of weak stimulation. It also mimics the perceptual salience produced by simple displays, and leads to readily testable predictions. Our results provide a principled account of orientation-based contextual modulation in early vision and its sensitivity to the homogeneity and spatial arrangement of inputs, and lends statistical support to the theory that V1 computes visual salience. PMID:22396635
Logical reasoning versus information processing in the dual-strategy model of reasoning.

PubMed

Markovits, Henry; Brisson, Janie; de Chantal, Pier-Luc

2017-01-01

One of the major debates concerning the nature of inferential reasoning is between counterexample-based strategies such as mental model theory and statistical strategies underlying probabilistic models. The dual-strategy model, proposed by Verschueren, Schaeken, & d'Ydewalle (2005a, 2005b), which suggests that people might have access to both kinds of strategy has been supported by several recent studies. These have shown that statistical reasoners make inferences based on using information about premises in order to generate a likelihood estimate of conclusion probability. However, while results concerning counterexample reasoners are consistent with a counterexample detection model, these results could equally be interpreted as indicating a greater sensitivity to logical form. In order to distinguish these 2 interpretations, in Studies 1 and 2, we presented reasoners with Modus ponens (MP) inferences with statistical information about premise strength and in Studies 3 and 4, naturalistic MP inferences with premises having many disabling conditions. Statistical reasoners accepted the MP inference more often than counterexample reasoners in Studies 1 and 2, while the opposite pattern was observed in Studies 3 and 4. Results show that these strategies must be defined in terms of information processing, with no clear relations to "logical" reasoning. These results have additional implications for the underlying debate about the nature of human reasoning. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Student Background, School Climate, School Disorder, and Student Achievement: An Empirical Study of New York City's Middle Schools

ERIC Educational Resources Information Center

Chen, Greg; Weikart, Lynne A.

2008-01-01

This study develops and tests a school disorder and student achievement model based upon the school climate framework. The model was fitted to 212 New York City middle schools using the Structural Equations Modeling Analysis method. The analysis shows that the model fits the data well based upon test statistics and goodness of fit indices. The…
Statistical modeling of 4D respiratory lung motion using diffeomorphic image registration.

PubMed

Ehrhardt, Jan; Werner, René; Schmidt-Richberg, Alexander; Handels, Heinz

2011-02-01

Modeling of respiratory motion has become increasingly important in various applications of medical imaging (e.g., radiation therapy of lung cancer). Current modeling approaches are usually confined to intra-patient registration of 3D image data representing the individual patient's anatomy at different breathing phases. We propose an approach to generate a mean motion model of the lung based on thoracic 4D computed tomography (CT) data of different patients to extend the motion modeling capabilities. Our modeling process consists of three steps: an intra-subject registration to generate subject-specific motion models, the generation of an average shape and intensity atlas of the lung as anatomical reference frame, and the registration of the subject-specific motion models to the atlas in order to build a statistical 4D mean motion model (4D-MMM). Furthermore, we present methods to adapt the 4D mean motion model to a patient-specific lung geometry. In all steps, a symmetric diffeomorphic nonlinear intensity-based registration method was employed. The Log-Euclidean framework was used to compute statistics on the diffeomorphic transformations. The presented methods are then used to build a mean motion model of respiratory lung motion using thoracic 4D CT data sets of 17 patients. We evaluate the model by applying it for estimating respiratory motion of ten lung cancer patients. The prediction is evaluated with respect to landmark and tumor motion, and the quantitative analysis results in a mean target registration error (TRE) of 3.3 ±1.6 mm if lung dynamics are not impaired by large lung tumors or other lung disorders (e.g., emphysema). With regard to lung tumor motion, we show that prediction accuracy is independent of tumor size and tumor motion amplitude in the considered data set. However, tumors adhering to non-lung structures degrade local lung dynamics significantly and the model-based prediction accuracy is lower in these cases. The statistical respiratory motion model is capable of providing valuable prior knowledge in many fields of applications. We present two examples of possible applications in radiation therapy and image guided diagnosis.
Does rational selection of training and test sets improve the outcome of QSAR modeling?

PubMed

Martin, Todd M; Harten, Paul; Young, Douglas M; Muratov, Eugene N; Golbraikh, Alexander; Zhu, Hao; Tropsha, Alexander

2012-10-22

Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.
Statistical Modeling of Retinal Optical Coherence Tomography.

PubMed

Amini, Zahra; Rabbani, Hossein

2016-06-01

In this paper, a new model for retinal Optical Coherence Tomography (OCT) images is proposed. This statistical model is based on introducing a nonlinear Gaussianization transform to convert the probability distribution function (pdf) of each OCT intra-retinal layer to a Gaussian distribution. The retina is a layered structure and in OCT each of these layers has a specific pdf which is corrupted by speckle noise, therefore a mixture model for statistical modeling of OCT images is proposed. A Normal-Laplace distribution, which is a convolution of a Laplace pdf and Gaussian noise, is proposed as the distribution of each component of this model. The reason for choosing Laplace pdf is the monotonically decaying behavior of OCT intensities in each layer for healthy cases. After fitting a mixture model to the data, each component is gaussianized and all of them are combined by Averaged Maximum A Posterior (AMAP) method. To demonstrate the ability of this method, a new contrast enhancement method based on this statistical model is proposed and tested on thirteen healthy 3D OCTs taken by the Topcon 3D OCT and five 3D OCTs from Age-related Macular Degeneration (AMD) patients, taken by Zeiss Cirrus HD-OCT. Comparing the results with two contending techniques, the prominence of the proposed method is demonstrated both visually and numerically. Furthermore, to prove the efficacy of the proposed method for a more direct and specific purpose, an improvement in the segmentation of intra-retinal layers using the proposed contrast enhancement method as a preprocessing step, is demonstrated.
Studies of oceanic tectonics based on GEOS-3 satellite altimetry

NASA Technical Reports Server (NTRS)

Poehls, K. A.; Kaula, W. M.; Schubert, G.; Sandwell, D.

1979-01-01

Using statistical analysis, geoidal admittance (the relationship between the ocean geoid and seafloor topography) obtained from GEOS-3 altimetry was compared to various model admittances. Analysis of several altimetry tracks in the Pacific Ocean demonstrated a low coherence between altimetry and seafloor topography except where the track crosses active or recent tectonic features. However, global statistical studies using the much larger data base of all available gravimetry showed a positive correlation of oceanic gravity with topography. The oceanic lithosphere was modeled by simultaneously inverting surface wave dispersion, topography, and gravity data. Efforts to incorporate geoid data into the inversion showed that the base of the subchannel can be better resolved with geoid rather than gravity data. Thermomechanical models of seafloor spreading taking into account differing plate velocities, heat source distributions, and rock rheologies were discussed.
Statistical Compression of Wind Speed Data

NASA Astrophysics Data System (ADS)

Tagle, F.; Castruccio, S.; Crippa, P.; Genton, M.

2017-12-01

In this work we introduce a lossy compression approach that utilizes a stochastic wind generator based on a non-Gaussian distribution to reproduce the internal climate variability of daily wind speed as represented by the CESM Large Ensemble over Saudi Arabia. Stochastic wind generators, and stochastic weather generators more generally, are statistical models that aim to match certain statistical properties of the data on which they are trained. They have been used extensively in applications ranging from agricultural models to climate impact studies. In this novel context, the parameters of the fitted model can be interpreted as encoding the information contained in the original uncompressed data. The statistical model is fit to only 3 of the 30 ensemble members and it adequately captures the variability of the ensemble in terms of seasonal internannual variability of daily wind speed. To deal with such a large spatial domain, it is partitioned into 9 region, and the model is fit independently to each of these. We further discuss a recent refinement of the model, which relaxes this assumption of regional independence, by introducing a large-scale component that interacts with the fine-scale regional effects.
Performance of Reclassification Statistics in Comparing Risk Prediction Models

PubMed Central

Paynter, Nina P.

2012-01-01

Concerns have been raised about the use of traditional measures of model fit in evaluating risk prediction models for clinical use, and reclassification tables have been suggested as an alternative means of assessing the clinical utility of a model. Several measures based on the table have been proposed, including the reclassification calibration (RC) statistic, the net reclassification improvement (NRI), and the integrated discrimination improvement (IDI), but the performance of these in practical settings has not been fully examined. We used simulations to estimate the type I error and power for these statistics in a number of scenarios, as well as the impact of the number and type of categories, when adding a new marker to an established or reference model. The type I error was found to be reasonable in most settings, and power was highest for the IDI, which was similar to the test of association. The relative power of the RC statistic, a test of calibration, and the NRI, a test of discrimination, varied depending on the model assumptions. These tools provide unique but complementary information. PMID:21294152
Statistical shape model-based reconstruction of a scaled, patient-specific surface model of the pelvis from a single standard AP x-ray radiograph

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zheng Guoyan

2010-04-15

Purpose: The aim of this article is to investigate the feasibility of using a statistical shape model (SSM)-based reconstruction technique to derive a scaled, patient-specific surface model of the pelvis from a single standard anteroposterior (AP) x-ray radiograph and the feasibility of estimating the scale of the reconstructed surface model by performing a surface-based 3D/3D matching. Methods: Data sets of 14 pelvises (one plastic bone, 12 cadavers, and one patient) were used to validate the single-image based reconstruction technique. This reconstruction technique is based on a hybrid 2D/3D deformable registration process combining a landmark-to-ray registration with a SSM-based 2D/3D reconstruction.more » The landmark-to-ray registration was used to find an initial scale and an initial rigid transformation between the x-ray image and the SSM. The estimated scale and rigid transformation were used to initialize the SSM-based 2D/3D reconstruction. The optimal reconstruction was then achieved in three stages by iteratively matching the projections of the apparent contours extracted from a 3D model derived from the SSM to the image contours extracted from the x-ray radiograph: Iterative affine registration, statistical instantiation, and iterative regularized shape deformation. The image contours are first detected by using a semiautomatic segmentation tool based on the Livewire algorithm and then approximated by a set of sparse dominant points that are adaptively sampled from the detected contours. The unknown scales of the reconstructed models were estimated by performing a surface-based 3D/3D matching between the reconstructed models and the associated ground truth models that were derived from a CT-based reconstruction method. Such a matching also allowed for computing the errors between the reconstructed models and the associated ground truth models. Results: The technique could reconstruct the surface models of all 14 pelvises directly from the landmark-based initialization. Depending on the surface-based matching techniques, the reconstruction errors were slightly different. When a surface-based iterative affine registration was used, an average reconstruction error of 1.6 mm was observed. This error was increased to 1.9 mm, when a surface-based iterative scaled rigid registration was used. Conclusions: It is feasible to reconstruct a scaled, patient-specific surface model of the pelvis from single standard AP x-ray radiograph using the present approach. The unknown scale of the reconstructed model can be estimated by performing a surface-based 3D/3D matching.« less
K-nearest neighbors based methods for identification of different gear crack levels under different motor speeds and loads: Revisited

NASA Astrophysics Data System (ADS)

Wang, Dong

2016-03-01

Gears are the most commonly used components in mechanical transmission systems. Their failures may cause transmission system breakdown and result in economic loss. Identification of different gear crack levels is important to prevent any unexpected gear failure because gear cracks lead to gear tooth breakage. Signal processing based methods mainly require expertize to explain gear fault signatures which is usually not easy to be achieved by ordinary users. In order to automatically identify different gear crack levels, intelligent gear crack identification methods should be developed. The previous case studies experimentally proved that K-nearest neighbors based methods exhibit high prediction accuracies for identification of 3 different gear crack levels under different motor speeds and loads. In this short communication, to further enhance prediction accuracies of existing K-nearest neighbors based methods and extend identification of 3 different gear crack levels to identification of 5 different gear crack levels, redundant statistical features are constructed by using Daubechies 44 (db44) binary wavelet packet transform at different wavelet decomposition levels, prior to the use of a K-nearest neighbors method. The dimensionality of redundant statistical features is 620, which provides richer gear fault signatures. Since many of these statistical features are redundant and highly correlated with each other, dimensionality reduction of redundant statistical features is conducted to obtain new significant statistical features. At last, the K-nearest neighbors method is used to identify 5 different gear crack levels under different motor speeds and loads. A case study including 3 experiments is investigated to demonstrate that the developed method provides higher prediction accuracies than the existing K-nearest neighbors based methods for recognizing different gear crack levels under different motor speeds and loads. Based on the new significant statistical features, some other popular statistical models including linear discriminant analysis, quadratic discriminant analysis, classification and regression tree and naive Bayes classifier, are compared with the developed method. The results show that the developed method has the highest prediction accuracies among these statistical models. Additionally, selection of the number of new significant features and parameter selection of K-nearest neighbors are thoroughly investigated.
Control Theory and Statistical Generalizations.

ERIC Educational Resources Information Center

Powers, William T.

1990-01-01

Contrasts modeling methods in control theory to the methods of statistical generalizations in empirical studies of human or animal behavior. Presents a computer simulation that predicts behavior based on variables (effort and rewards) determined by the invariable (desired reward). Argues that control theory methods better reflect relationships to…
Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition.

PubMed

Falgreen, Steffen; Laursen, Maria Bach; Bødker, Julie Støve; Kjeldsen, Malene Krag; Schmitz, Alexander; Nyegaard, Mette; Johnsen, Hans Erik; Dybkær, Karen; Bøgsted, Martin

2014-06-05

In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves' dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Time independent summary statistics may aid the understanding of drugs' action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies.
Exposure time independent summary statistics for assessment of drug dependent cell line growth inhibition

PubMed Central

2014-01-01

Background In vitro generated dose-response curves of human cancer cell lines are widely used to develop new therapeutics. The curves are summarised by simplified statistics that ignore the conventionally used dose-response curves’ dependency on drug exposure time and growth kinetics. This may lead to suboptimal exploitation of data and biased conclusions on the potential of the drug in question. Therefore we set out to improve the dose-response assessments by eliminating the impact of time dependency. Results First, a mathematical model for drug induced cell growth inhibition was formulated and used to derive novel dose-response curves and improved summary statistics that are independent of time under the proposed model. Next, a statistical analysis workflow for estimating the improved statistics was suggested consisting of 1) nonlinear regression models for estimation of cell counts and doubling times, 2) isotonic regression for modelling the suggested dose-response curves, and 3) resampling based method for assessing variation of the novel summary statistics. We document that conventionally used summary statistics for dose-response experiments depend on time so that fast growing cell lines compared to slowly growing ones are considered overly sensitive. The adequacy of the mathematical model is tested for doxorubicin and found to fit real data to an acceptable degree. Dose-response data from the NCI60 drug screen were used to illustrate the time dependency and demonstrate an adjustment correcting for it. The applicability of the workflow was illustrated by simulation and application on a doxorubicin growth inhibition screen. The simulations show that under the proposed mathematical model the suggested statistical workflow results in unbiased estimates of the time independent summary statistics. Variance estimates of the novel summary statistics are used to conclude that the doxorubicin screen covers a significant diverse range of responses ensuring it is useful for biological interpretations. Conclusion Time independent summary statistics may aid the understanding of drugs’ action mechanism on tumour cells and potentially renew previous drug sensitivity evaluation studies. PMID:24902483
Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

PubMed

Yin, Guosheng; Ma, Yanyuan

2013-01-01

The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.
Statistical modeling of temperature, humidity and wind fields in the atmospheric boundary layer over the Siberian region

NASA Astrophysics Data System (ADS)

Lomakina, N. Ya.

2017-11-01

The work presents the results of the applied climatic division of the Siberian region into districts based on the methodology of objective classification of the atmospheric boundary layer climates by the "temperature-moisture-wind" complex realized with using the method of principal components and the special similarity criteria of average profiles and the eigen values of correlation matrices. On the territory of Siberia, it was identified 14 homogeneous regions for winter season and 10 regions were revealed for summer. The local statistical models were constructed for each region. These include vertical profiles of mean values, mean square deviations, and matrices of interlevel correlation of temperature, specific humidity, zonal and meridional wind velocity. The advantage of the obtained local statistical models over the regional models is shown.
AGENT-BASED MODELS IN EMPIRICAL SOCIAL RESEARCH*

PubMed Central

Bruch, Elizabeth; Atwell, Jon

2014-01-01

Agent-based modeling has become increasingly popular in recent years, but there is still no codified set of recommendations or practices for how to use these models within a program of empirical research. This article provides ideas and practical guidelines drawn from sociology, biology, computer science, epidemiology, and statistics. We first discuss the motivations for using agent-based models in both basic science and policy-oriented social research. Next, we provide an overview of methods and strategies for incorporating data on behavior and populations into agent-based models, and review techniques for validating and testing the sensitivity of agent-based models. We close with suggested directions for future research. PMID:25983351
Statistical iterative material image reconstruction for spectral CT using a semi-empirical forward model

NASA Astrophysics Data System (ADS)

Mechlem, Korbinian; Ehn, Sebastian; Sellerer, Thorsten; Pfeiffer, Franz; Noël, Peter B.

2017-03-01

In spectral computed tomography (spectral CT), the additional information about the energy dependence of attenuation coefficients can be exploited to generate material selective images. These images have found applications in various areas such as artifact reduction, quantitative imaging or clinical diagnosis. However, significant noise amplification on material decomposed images remains a fundamental problem of spectral CT. Most spectral CT algorithms separate the process of material decomposition and image reconstruction. Separating these steps is suboptimal because the full statistical information contained in the spectral tomographic measurements cannot be exploited. Statistical iterative reconstruction (SIR) techniques provide an alternative, mathematically elegant approach to obtaining material selective images with improved tradeoffs between noise and resolution. Furthermore, image reconstruction and material decomposition can be performed jointly. This is accomplished by a forward model which directly connects the (expected) spectral projection measurements and the material selective images. To obtain this forward model, detailed knowledge of the different photon energy spectra and the detector response was assumed in previous work. However, accurately determining the spectrum is often difficult in practice. In this work, a new algorithm for statistical iterative material decomposition is presented. It uses a semi-empirical forward model which relies on simple calibration measurements. Furthermore, an efficient optimization algorithm based on separable surrogate functions is employed. This partially negates one of the major shortcomings of SIR, namely high computational cost and long reconstruction times. Numerical simulations and real experiments show strongly improved image quality and reduced statistical bias compared to projection-based material decomposition.
Network Polymers Formed Under Nonideal Conditions.

DTIC Science & Technology

1986-12-01

the system or the limited ability of the statistical model to account for stochastic correlations. The viscosity of the reacting system was measured as...based on competing reactions (ring, chain) and employs equilibrium chain statistics . The work thus far has been limited to single cycle growth on an...polymerizations, because a large number of differential equations must be solved. The Makovian approach (sometimes referred to as the statistical or

Comparison of the dynamics of neural interactions between current-based and conductance-based integrate-and-fire recurrent networks

PubMed Central

Cavallari, Stefano; Panzeri, Stefano; Mazzoni, Alberto

2014-01-01

Models of networks of Leaky Integrate-and-Fire (LIF) neurons are a widely used tool for theoretical investigations of brain function. These models have been used both with current- and conductance-based synapses. However, the differences in the dynamics expressed by these two approaches have been so far mainly studied at the single neuron level. To investigate how these synaptic models affect network activity, we compared the single neuron and neural population dynamics of conductance-based networks (COBNs) and current-based networks (CUBNs) of LIF neurons. These networks were endowed with sparse excitatory and inhibitory recurrent connections, and were tested in conditions including both low- and high-conductance states. We developed a novel procedure to obtain comparable networks by properly tuning the synaptic parameters not shared by the models. The so defined comparable networks displayed an excellent and robust match of first order statistics (average single neuron firing rates and average frequency spectrum of network activity). However, these comparable networks showed profound differences in the second order statistics of neural population interactions and in the modulation of these properties by external inputs. The correlation between inhibitory and excitatory synaptic currents and the cross-neuron correlation between synaptic inputs, membrane potentials and spike trains were stronger and more stimulus-modulated in the COBN. Because of these properties, the spike train correlation carried more information about the strength of the input in the COBN, although the firing rates were equally informative in both network models. Moreover, the network activity of COBN showed stronger synchronization in the gamma band, and spectral information about the input higher and spread over a broader range of frequencies. These results suggest that the second order statistics of network dynamics depend strongly on the choice of synaptic model. PMID:24634645
Comparison of the dynamics of neural interactions between current-based and conductance-based integrate-and-fire recurrent networks.

PubMed

Cavallari, Stefano; Panzeri, Stefano; Mazzoni, Alberto

2014-01-01

Models of networks of Leaky Integrate-and-Fire (LIF) neurons are a widely used tool for theoretical investigations of brain function. These models have been used both with current- and conductance-based synapses. However, the differences in the dynamics expressed by these two approaches have been so far mainly studied at the single neuron level. To investigate how these synaptic models affect network activity, we compared the single neuron and neural population dynamics of conductance-based networks (COBNs) and current-based networks (CUBNs) of LIF neurons. These networks were endowed with sparse excitatory and inhibitory recurrent connections, and were tested in conditions including both low- and high-conductance states. We developed a novel procedure to obtain comparable networks by properly tuning the synaptic parameters not shared by the models. The so defined comparable networks displayed an excellent and robust match of first order statistics (average single neuron firing rates and average frequency spectrum of network activity). However, these comparable networks showed profound differences in the second order statistics of neural population interactions and in the modulation of these properties by external inputs. The correlation between inhibitory and excitatory synaptic currents and the cross-neuron correlation between synaptic inputs, membrane potentials and spike trains were stronger and more stimulus-modulated in the COBN. Because of these properties, the spike train correlation carried more information about the strength of the input in the COBN, although the firing rates were equally informative in both network models. Moreover, the network activity of COBN showed stronger synchronization in the gamma band, and spectral information about the input higher and spread over a broader range of frequencies. These results suggest that the second order statistics of network dynamics depend strongly on the choice of synaptic model.
A statistical method to estimate low-energy hadronic cross sections

NASA Astrophysics Data System (ADS)

Balassa, Gábor; Kovács, Péter; Wolf, György

2018-02-01

In this article we propose a model based on the Statistical Bootstrap approach to estimate the cross sections of different hadronic reactions up to a few GeV in c.m.s. energy. The method is based on the idea, when two particles collide a so-called fireball is formed, which after a short time period decays statistically into a specific final state. To calculate the probabilities we use a phase space description extended with quark combinatorial factors and the possibility of more than one fireball formation. In a few simple cases the probability of a specific final state can be calculated analytically, where we show that the model is able to reproduce the ratios of the considered cross sections. We also show that the model is able to describe proton-antiproton annihilation at rest. In the latter case we used a numerical method to calculate the more complicated final state probabilities. Additionally, we examined the formation of strange and charmed mesons as well, where we used existing data to fit the relevant model parameters.
A powerful score-based test statistic for detecting gene-gene co-association.

PubMed

Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

2016-01-29

The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.
a Statistical Theory of the Epilepsies.

NASA Astrophysics Data System (ADS)

Thomas, Kuryan

1988-12-01

A new physical and mathematical model for the epilepsies is proposed, based on the theory of bond percolation on finite lattices. Within this model, the onset of seizures in the brain is identified with the appearance of spanning clusters of neurons engaged in the spurious and uncontrollable electrical activity characteristic of seizures. It is proposed that the fraction of excitatory to inhibitory synapses can be identified with a bond probability, and that the bond probability is a randomly varying quantity displaying Gaussian statistics. The consequences of the proposed model to the treatment of the epilepsies is explored. The nature of the data on the epilepsies which can be acquired in a clinical setting is described. It is shown that such data can be analyzed to provide preliminary support for the bond percolation hypothesis, and to quantify the efficacy of anti-epileptic drugs in a treatment program. The results of a battery of statistical tests on seizure distributions are discussed. The physical theory of the electroencephalogram (EEG) is described, and extant models of the electrical activity measured by the EEG are discussed, with an emphasis on their physical behavior. A proposal is made to explain the difference between the power spectra of electrical activity measured with cranial probes and with the EEG. Statistical tests on the characteristic EEG manifestations of epileptic activity are conducted, and their results described. Computer simulations of a correlated bond percolating system are constructed. It is shown that the statistical properties of the results of such a simulation are strongly suggestive of the statistical properties of clinical data. The study finds no contradictions between the predictions of the bond percolation model and the observed properties of the available data. Suggestions are made for further research and for techniques based on the proposed model which may be used for tuning the effects of anti -epileptic drugs.
The Problem of Auto-Correlation in Parasitology

PubMed Central

Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick

2012-01-01

Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
Analog-Based Postprocessing of Navigation-Related Hydrological Ensemble Forecasts

NASA Astrophysics Data System (ADS)

Hemri, S.; Klein, B.

2017-11-01

Inland waterway transport benefits from probabilistic forecasts of water levels as they allow to optimize the ship load and, hence, to minimize the transport costs. Probabilistic state-of-the-art hydrologic ensemble forecasts inherit biases and dispersion errors from the atmospheric ensemble forecasts they are driven with. The use of statistical postprocessing techniques like ensemble model output statistics (EMOS) allows for a reduction of these systematic errors by fitting a statistical model based on training data. In this study, training periods for EMOS are selected based on forecast analogs, i.e., historical forecasts that are similar to the forecast to be verified. Due to the strong autocorrelation of water levels, forecast analogs have to be selected based on entire forecast hydrographs in order to guarantee similar hydrograph shapes. Custom-tailored measures of similarity for forecast hydrographs comprise hydrological series distance (SD), the hydrological matching algorithm (HMA), and dynamic time warping (DTW). Verification against observations reveals that EMOS forecasts for water level at three gauges along the river Rhine with training periods selected based on SD, HMA, and DTW compare favorably with reference EMOS forecasts, which are based on either seasonal training periods or on training periods obtained by dividing the hydrological forecast trajectories into runoff regimes.
SOCR: Statistics Online Computational Resource

PubMed Central

Dinov, Ivo D.

2011-01-01

The need for hands-on computer laboratory experience in undergraduate and graduate statistics education has been firmly established in the past decade. As a result a number of attempts have been undertaken to develop novel approaches for problem-driven statistical thinking, data analysis and result interpretation. In this paper we describe an integrated educational web-based framework for: interactive distribution modeling, virtual online probability experimentation, statistical data analysis, visualization and integration. Following years of experience in statistical teaching at all college levels using established licensed statistical software packages, like STATA, S-PLUS, R, SPSS, SAS, Systat, etc., we have attempted to engineer a new statistics education environment, the Statistics Online Computational Resource (SOCR). This resource performs many of the standard types of statistical analysis, much like other classical tools. In addition, it is designed in a plug-in object-oriented architecture and is completely platform independent, web-based, interactive, extensible and secure. Over the past 4 years we have tested, fine-tuned and reanalyzed the SOCR framework in many of our undergraduate and graduate probability and statistics courses and have evidence that SOCR resources build student’s intuition and enhance their learning. PMID:21451741
The Analysis of Organizational Diagnosis on Based Six Box Model in Universities

ERIC Educational Resources Information Center

Hamid, Rahimi; Siadat, Sayyed Ali; Reza, Hoveida; Arash, Shahin; Ali, Nasrabadi Hasan; Azizollah, Arbabisarjou

2011-01-01

Purpose: The analysis of organizational diagnosis on based six box model at universities. Research method: Research method was descriptive-survey. Statistical population consisted of 1544 faculty members of universities which through random strafed sampling method 218 persons were chosen as the sample. Research Instrument were organizational…
Accounting for Test Variability through Sizing Local Domains in Sequential Design Optimization with Concurrent Calibration-Based Model Validation

DTIC Science & Technology

2013-08-01

in Sequential Design Optimization with Concurrent Calibration-Based Model Validation Dorin Drignei 1 Mathematics and Statistics Department...Validation 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) Dorin Drignei; Zissimos Mourelatos; Vijitashwa Pandey
Probabilistic models for reactive behaviour in heterogeneous condensed phase media

NASA Astrophysics Data System (ADS)

Baer, M. R.; Gartling, D. K.; DesJardin, P. E.

2012-02-01

This work presents statistically-based models to describe reactive behaviour in heterogeneous energetic materials. Mesoscale effects are incorporated in continuum-level reactive flow descriptions using probability density functions (pdfs) that are associated with thermodynamic and mechanical states. A generalised approach is presented that includes multimaterial behaviour by treating the volume fraction as a random kinematic variable. Model simplifications are then sought to reduce the complexity of the description without compromising the statistical approach. Reactive behaviour is first considered for non-deformable media having a random temperature field as an initial state. A pdf transport relationship is derived and an approximate moment approach is incorporated in finite element analysis to model an example application whereby a heated fragment impacts a reactive heterogeneous material which leads to a delayed cook-off event. Modelling is then extended to include deformation effects associated with shock loading of a heterogeneous medium whereby random variables of strain, strain-rate and temperature are considered. A demonstrative mesoscale simulation of a non-ideal explosive is discussed that illustrates the joint statistical nature of the strain and temperature fields during shock loading to motivate the probabilistic approach. This modelling is derived in a Lagrangian framework that can be incorporated in continuum-level shock physics analysis. Future work will consider particle-based methods for a numerical implementation of this modelling approach.
Addressing issues associated with evaluating prediction models for survival endpoints based on the concordance statistic.

PubMed

Wang, Ming; Long, Qi

2016-09-01

Prediction models for disease risk and prognosis play an important role in biomedical research, and evaluating their predictive accuracy in the presence of censored data is of substantial interest. The standard concordance (c) statistic has been extended to provide a summary measure of predictive accuracy for survival models. Motivated by a prostate cancer study, we address several issues associated with evaluating survival prediction models based on c-statistic with a focus on estimators using the technique of inverse probability of censoring weighting (IPCW). Compared to the existing work, we provide complete results on the asymptotic properties of the IPCW estimators under the assumption of coarsening at random (CAR), and propose a sensitivity analysis under the mechanism of noncoarsening at random (NCAR). In addition, we extend the IPCW approach as well as the sensitivity analysis to high-dimensional settings. The predictive accuracy of prediction models for cancer recurrence after prostatectomy is assessed by applying the proposed approaches. We find that the estimated predictive accuracy for the models in consideration is sensitive to NCAR assumption, and thus identify the best predictive model. Finally, we further evaluate the performance of the proposed methods in both settings of low-dimensional and high-dimensional data under CAR and NCAR through simulations. © 2016, The International Biometric Society.
An application of seasonal ARIMA models on group commodities to forecast Philippine merchandise exports performance

NASA Astrophysics Data System (ADS)

Natividad, Gina May R.; Cawiding, Olive R.; Addawe, Rizavel C.

2017-11-01

The increase in the merchandise exports of the country offers information about the Philippines' trading role within the global economy. Merchandise exports statistics are used to monitor the country's overall production that is consumed overseas. This paper investigates the comparison between two models obtained by a) clustering the commodity groups into two based on its proportional contribution to the total exports, and b) treating only the total exports. Different seasonal autoregressive integrated moving average (SARIMA) models were then developed for the clustered commodities and for the total exports based on the monthly merchandise exports of the Philippines from 2011 to 2016. The data set used in this study was retrieved from the Philippine Statistics Authority (PSA) which is the central statistical authority in the country responsible for primary data collection. A test for significance of the difference between means at 0.05 level of significance was then performed on the forecasts produced. The result indicates that there is a significant difference between the mean of the forecasts of the two models. Moreover, upon a comparison of the root mean square error (RMSE) and mean absolute error (MAE) of the models, it was found that the models used for the clustered groups outperform the model for the total exports.
On the implications of the classical ergodic theorems: analysis of developmental processes has to focus on intra-individual variation.

PubMed

Molenaar, Peter C M

2008-01-01

It is argued that general mathematical-statistical theorems imply that standard statistical analysis techniques of inter-individual variation are invalid to investigate developmental processes. Developmental processes have to be analyzed at the level of individual subjects, using time series data characterizing the patterns of intra-individual variation. It is shown that standard statistical techniques based on the analysis of inter-individual variation appear to be insensitive to the presence of arbitrary large degrees of inter-individual heterogeneity in the population. An important class of nonlinear epigenetic models of neural growth is described which can explain the occurrence of such heterogeneity in brain structures and behavior. Links with models of developmental instability are discussed. A simulation study based on a chaotic growth model illustrates the invalidity of standard analysis of inter-individual variation, whereas time series analysis of intra-individual variation is able to recover the true state of affairs. (c) 2007 Wiley Periodicals, Inc.
Multivariate statistical model for 3D image segmentation with application to medical images.

PubMed

John, Nigel M; Kabuka, Mansur R; Ibrahim, Mohamed O

2003-12-01

In this article we describe a statistical model that was developed to segment brain magnetic resonance images. The statistical segmentation algorithm was applied after a pre-processing stage involving the use of a 3D anisotropic filter along with histogram equalization techniques. The segmentation algorithm makes use of prior knowledge and a probability-based multivariate model designed to semi-automate the process of segmentation. The algorithm was applied to images obtained from the Center for Morphometric Analysis at Massachusetts General Hospital as part of the Internet Brain Segmentation Repository (IBSR). The developed algorithm showed improved accuracy over the k-means, adaptive Maximum Apriori Probability (MAP), biased MAP, and other algorithms. Experimental results showing the segmentation and the results of comparisons with other algorithms are provided. Results are based on an overlap criterion against expertly segmented images from the IBSR. The algorithm produced average results of approximately 80% overlap with the expertly segmented images (compared with 85% for manual segmentation and 55% for other algorithms).
A statistical approach to the brittle fracture of a multi-phase solid

NASA Technical Reports Server (NTRS)

Liu, W. K.; Lua, Y. I.; Belytschko, T.

1991-01-01

A stochastic damage model is proposed to quantify the inherent statistical distribution of the fracture toughness of a brittle, multi-phase solid. The model, based on the macrocrack-microcrack interaction, incorporates uncertainties in locations and orientations of microcracks. Due to the high concentration of microcracks near the macro-tip, a higher order analysis based on traction boundary integral equations is formulated first for an arbitrary array of cracks. The effects of uncertainties in locations and orientations of microcracks at a macro-tip are analyzed quantitatively by using the boundary integral equations method in conjunction with the computer simulation of the random microcrack array. The short range interactions resulting from surrounding microcracks closet to the main crack tip are investigated. The effects of microcrack density parameter are also explored in the present study. The validity of the present model is demonstrated by comparing its statistical output with the Neville distribution function, which gives correct fits to sets of experimental data from multi-phase solids.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Apte, A; Veeraraghavan, H; Oh, J

Purpose: To present an open source and free platform to facilitate radiomics research — The “Radiomics toolbox” in CERR. Method: There is scarcity of open source tools that support end-to-end modeling of image features to predict patient outcomes. The “Radiomics toolbox” strives to fill the need for such a software platform. The platform supports (1) import of various kinds of image modalities like CT, PET, MR, SPECT, US. (2) Contouring tools to delineate structures of interest. (3) Extraction and storage of image based features like 1st order statistics, gray-scale co-occurrence and zonesize matrix based texture features and shape features andmore » (4) Statistical Analysis. Statistical analysis of the extracted features is supported with basic functionality that includes univariate correlations, Kaplan-Meir curves and advanced functionality that includes feature reduction and multivariate modeling. The graphical user interface and the data management are performed with Matlab for the ease of development and readability of code and features for wide audience. Open-source software developed with other programming languages is integrated to enhance various components of this toolbox. For example: Java-based DCM4CHE for import of DICOM, R for statistical analysis. Results: The Radiomics toolbox will be distributed as an open source, GNU copyrighted software. The toolbox was prototyped for modeling Oropharyngeal PET dataset at MSKCC. The analysis will be presented in a separate paper. Conclusion: The Radiomics Toolbox provides an extensible platform for extracting and modeling image features. To emphasize new uses of CERR for radiomics and image-based research, we have changed the name from the “Computational Environment for Radiotherapy Research” to the “Computational Environment for Radiological Research”.« less
VMT-based traffic impact assessment : development of a trip length model.

DOT National Transportation Integrated Search

2010-06-01

This report develops models that relate the trip-lengths to the land-use characteristics at : the trip-ends (both production- and attraction-ends). Separate models were developed by trip : purpose. The results indicate several statistically significa...
A Stochastic Fractional Dynamics Model of Space-time Variability of Rain

NASA Technical Reports Server (NTRS)

Kundu, Prasun K.; Travis, James E.

2013-01-01

Rainfall varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, that allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and times scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and in Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to the second moment statistics of radar data. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well without any further adjustment.
Statistical methods and neural network approaches for classification of data from multiple sources

NASA Technical Reports Server (NTRS)

Benediktsson, Jon Atli; Swain, Philip H.

1990-01-01

Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

Modeling Ka-band low elevation angle propagation statistics

NASA Technical Reports Server (NTRS)

Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.

1995-01-01

The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.
Parameter discovery in stochastic biological models using simulated annealing and statistical model checking.

PubMed

Hussain, Faraz; Jha, Sumit K; Jha, Susmit; Langmead, Christopher J

2014-01-01

Stochastic models are increasingly used to study the behaviour of biochemical systems. While the structure of such models is often readily available from first principles, unknown quantitative features of the model are incorporated into the model as parameters. Algorithmic discovery of parameter values from experimentally observed facts remains a challenge for the computational systems biology community. We present a new parameter discovery algorithm that uses simulated annealing, sequential hypothesis testing, and statistical model checking to learn the parameters in a stochastic model. We apply our technique to a model of glucose and insulin metabolism used for in-silico validation of artificial pancreata and demonstrate its effectiveness by developing parallel CUDA-based implementation for parameter synthesis in this model.
Dynamic modelling of n-of-1 data: powerful and flexible data analytics applied to individualised studies.

PubMed

Vieira, Rute; McDonald, Suzanne; Araújo-Soares, Vera; Sniehotta, Falko F; Henderson, Robin

2017-09-01

N-of-1 studies are based on repeated observations within an individual or unit over time and are acknowledged as an important research method for generating scientific evidence about the health or behaviour of an individual. Statistical analyses of n-of-1 data require accurate modelling of the outcome while accounting for its distribution, time-related trend and error structures (e.g., autocorrelation) as well as reporting readily usable contextualised effect sizes for decision-making. A number of statistical approaches have been documented but no consensus exists on which method is most appropriate for which type of n-of-1 design. We discuss the statistical considerations for analysing n-of-1 studies and briefly review some currently used methodologies. We describe dynamic regression modelling as a flexible and powerful approach, adaptable to different types of outcomes and capable of dealing with the different challenges inherent to n-of-1 statistical modelling. Dynamic modelling borrows ideas from longitudinal and event history methodologies which explicitly incorporate the role of time and the influence of past on future. We also present an illustrative example of the use of dynamic regression on monitoring physical activity during the retirement transition. Dynamic modelling has the potential to expand researchers' access to robust and user-friendly statistical methods for individualised studies.
Can air temperature be used to project influences of climate change on stream temperature?

Treesearch

Ivan Arismendi; Mohammad Safeeq; Jason B Dunham; Sherri L Johnson

2014-01-01

Worldwide, lack of data on stream temperature has motivated the use of regression-based statistical models to predict stream temperatures based on more widely available data on air temperatures. Such models have been widely applied to project responses of stream temperatures under climate change, but the performance of these models has not been fully evaluated. To...
MODEL ANALYSIS OF RIPARIAN BUFFER EFFECTIVENESS FOR REDUCING NUTRIENT INPUTS TO STREAMS IN AGRICULTURAL LANDSCAPES

EPA Science Inventory

Federal and state agencies responsible for protecting water quality rely mainly on statistically-based methods to assess and manage risks to the nation's streams, lakes and estuaries. Although statistical approaches provide valuable information on current trends in water quality...
Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

NASA Astrophysics Data System (ADS)

Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

1995-06-01

A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.
Predicting future protection of respirator users: Statistical approaches and practical implications.

PubMed

Hu, Chengcheng; Harber, Philip; Su, Jing

2016-01-01

The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.
Statistical wave climate projections for coastal impact assessments

NASA Astrophysics Data System (ADS)

Camus, P.; Losada, I. J.; Izaguirre, C.; Espejo, A.; Menéndez, M.; Pérez, J.

2017-09-01

Global multimodel wave climate projections are obtained at 1.0° × 1.0° scale from 30 Coupled Model Intercomparison Project Phase 5 (CMIP5) global circulation model (GCM) realizations. A semi-supervised weather-typing approach based on a characterization of the ocean wave generation areas and the historical wave information from the recent GOW2 database are used to train the statistical model. This framework is also applied to obtain high resolution projections of coastal wave climate and coastal impacts as port operability and coastal flooding. Regional projections are estimated using the collection of weather types at spacing of 1.0°. This assumption is feasible because the predictor is defined based on the wave generation area and the classification is guided by the local wave climate. The assessment of future changes in coastal impacts is based on direct downscaling of indicators defined by empirical formulations (total water level for coastal flooding and number of hours per year with overtopping for port operability). Global multimodel projections of the significant wave height and peak period are consistent with changes obtained in previous studies. Statistical confidence of expected changes is obtained due to the large number of GCMs to construct the ensemble. The proposed methodology is proved to be flexible to project wave climate at different spatial scales. Regional changes of additional variables as wave direction or other statistics can be estimated from the future empirical distribution with extreme values restricted to high percentiles (i.e., 95th, 99th percentiles). The statistical framework can also be applied to evaluate regional coastal impacts integrating changes in storminess and sea level rise.
Effectiveness of feature and classifier algorithms in character recognition systems

NASA Astrophysics Data System (ADS)

Wilson, Charles L.

1993-04-01

At the first Census Optical Character Recognition Systems Conference, NIST generated accuracy data for more than character recognition systems. Most systems were tested on the recognition of isolated digits and upper and lower case alphabetic characters. The recognition experiments were performed on sample sizes of 58,000 digits, and 12,000 upper and lower case alphabetic characters. The algorithms used by the 26 conference participants included rule-based methods, image-based methods, statistical methods, and neural networks. The neural network methods included Multi-Layer Perceptron's, Learned Vector Quantitization, Neocognitrons, and cascaded neural networks. In this paper 11 different systems are compared using correlations between the answers of different systems, comparing the decrease in error rate as a function of confidence of recognition, and comparing the writer dependence of recognition. This comparison shows that methods that used different algorithms for feature extraction and recognition performed with very high levels of correlation. This is true for neural network systems, hybrid systems, and statistically based systems, and leads to the conclusion that neural networks have not yet demonstrated a clear superiority to more conventional statistical methods. Comparison of these results with the models of Vapnick (for estimation problems), MacKay (for Bayesian statistical models), Moody (for effective parameterization), and Boltzmann models (for information content) demonstrate that as the limits of training data variance are approached, all classifier systems have similar statistical properties. The limiting condition can only be approached for sufficiently rich feature sets because the accuracy limit is controlled by the available information content of the training set, which must pass through the feature extraction process prior to classification.
Spatial Accessibility and Availability Measures and Statistical Properties in the Food Environment

PubMed Central

Van Meter, E.; Lawson, A.B.; Colabianchi, N.; Nichols, M.; Hibbert, J.; Porter, D.; Liese, A.D.

2010-01-01

Spatial accessibility is of increasing interest in the health sciences. This paper addresses the statistical use of spatial accessibility and availability indices. These measures are evaluated via an extensive simulation based on cluster models for local food outlet density. We derived Monte Carlo critical values for several statistical tests based on the indices. In particular we are interested in the ability to make inferential comparisons between different study areas where indices of accessibility and availability are to be calculated. We derive tests of mean difference as well as tests for differences in Moran's I for spatial correlation for each of the accessibility and availability indices. We also apply these new statistical tests to a data example based on two counties in South Carolina for various accessibility and availability measures calculated for food outlets, stores, and restaurants. PMID:21499528
Reevaluation of a walleye (Sander vitreus) bioenergetics model

USGS Publications Warehouse

Madenjian, Charles P.; Wang, Chunfang

2013-01-01

Walleye (Sander vitreus) is an important sport fish throughout much of North America, and walleye populations support valuable commercial fisheries in certain lakes as well. Using a corrected algorithm for balancing the energy budget, we reevaluated the performance of the Wisconsin bioenergetics model for walleye in the laboratory. Walleyes were fed rainbow smelt (Osmerus mordax) in four laboratory tanks each day during a 126-day experiment. Feeding rates ranged from 1.4 to 1.7 % of walleye body weight per day. Based on a statistical comparison of bioenergetics model predictions of monthly consumption with observed monthly consumption, we concluded that the bioenergetics model estimated food consumption by walleye without any significant bias. Similarly, based on a statistical comparison of bioenergetics model predictions of weight at the end of the monthly test period with observed weight, we concluded that the bioenergetics model predicted walleye growth without any detectable bias. In addition, the bioenergetics model predictions of cumulative consumption over the 126-day experiment differed fromobserved cumulative consumption by less than 10 %. Although additional laboratory and field testing will be needed to fully evaluate model performance, based on our laboratory results, the Wisconsin bioenergetics model for walleye appears to be providing unbiased predictions of food consumption.
A comparison of linear and nonlinear statistical techniques in performance attribution.

PubMed

Chan, N H; Genovese, C R

2001-01-01

Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.
Landau's statistical mechanics for quasi-particle models

NASA Astrophysics Data System (ADS)

Bannur, Vishnu M.

2014-04-01

Landau's formalism of statistical mechanics [following L. D. Landau and E. M. Lifshitz, Statistical Physics (Pergamon Press, Oxford, 1980)] is applied to the quasi-particle model of quark-gluon plasma. Here, one starts from the expression for pressure and develop all thermodynamics. It is a general formalism and consistent with our earlier studies [V. M. Bannur, Phys. Lett. B647, 271 (2007)] based on Pathria's formalism [following R. K. Pathria, Statistical Mechanics (Butterworth-Heinemann, Oxford, 1977)]. In Pathria's formalism, one starts from the expression for energy density and develop thermodynamics. Both the formalisms are consistent with thermodynamics and statistical mechanics. Under certain conditions, which are wrongly called thermodynamic consistent relation, we recover other formalism of quasi-particle system, like in M. I. Gorenstein and S. N. Yang, Phys. Rev. D52, 5206 (1995), widely studied in quark-gluon plasma.
An Algebraic Implicitization and Specialization of Minimum KL-Divergence Models

NASA Astrophysics Data System (ADS)

Dukkipati, Ambedkar; Manathara, Joel George

In this paper we study representation of KL-divergence minimization, in the cases where integer sufficient statistics exists, using tools from polynomial algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. In particular, we also study the case of Kullback-Csisźar iteration scheme. We present implicit descriptions of these models and show that implicitization preserves specialization of prior distribution. This result leads us to a Gröbner bases method to compute an implicit representation of minimum KL-divergence models.
A terrain-based site characterization map of California with implications for the contiguous United States

USGS Publications Warehouse

Yong, Alan K.; Hough, Susan E.; Iwahashi, Junko; Braverman, Amy

2012-01-01

We present an approach based on geomorphometry to predict material properties and characterize site conditions using the VS30 parameter (time‐averaged shear‐wave velocity to a depth of 30 m). Our framework consists of an automated terrain classification scheme based on taxonomic criteria (slope gradient, local convexity, and surface texture) that systematically identifies 16 terrain types from 1‐km spatial resolution (30 arcsec) Shuttle Radar Topography Mission digital elevation models (SRTM DEMs). Using 853 VS30 values from California, we apply a simulation‐based statistical method to determine the mean VS30 for each terrain type in California. We then compare the VS30 values with models based on individual proxies, such as mapped surface geology and topographic slope, and show that our systematic terrain‐based approach consistently performs better than semiempirical estimates based on individual proxies. To further evaluate our model, we apply our California‐based estimates to terrains of the contiguous United States. Comparisons of our estimates with 325 VS30 measurements outside of California, as well as estimates based on the topographic slope model, indicate our method to be statistically robust and more accurate. Our approach thus provides an objective and robust method for extending estimates of VS30 for regions where in situ measurements are sparse or not readily available.
Three-dimensional holoscopic image coding scheme using high-efficiency video coding with kernel-based minimum mean-square-error estimation

NASA Astrophysics Data System (ADS)

Liu, Deyang; An, Ping; Ma, Ran; Yang, Chao; Shen, Liquan; Li, Kai

2016-07-01

Three-dimensional (3-D) holoscopic imaging, also known as integral imaging, light field imaging, or plenoptic imaging, can provide natural and fatigue-free 3-D visualization. However, a large amount of data is required to represent the 3-D holoscopic content. Therefore, efficient coding schemes for this particular type of image are needed. A 3-D holoscopic image coding scheme with kernel-based minimum mean square error (MMSE) estimation is proposed. In the proposed scheme, the coding block is predicted by an MMSE estimator under statistical modeling. In order to obtain the signal statistical behavior, kernel density estimation (KDE) is utilized to estimate the probability density function of the statistical modeling. As bandwidth estimation (BE) is a key issue in the KDE problem, we also propose a BE method based on kernel trick. The experimental results demonstrate that the proposed scheme can achieve a better rate-distortion performance and a better visual rendering quality.
New Developments in the Embedded Statistical Coupling Method: Atomistic/Continuum Crack Propagation

NASA Technical Reports Server (NTRS)

Saether, E.; Yamakov, V.; Glaessgen, E.

2008-01-01

A concurrent multiscale modeling methodology that embeds a molecular dynamics (MD) region within a finite element (FEM) domain has been enhanced. The concurrent MD-FEM coupling methodology uses statistical averaging of the deformation of the atomistic MD domain to provide interface displacement boundary conditions to the surrounding continuum FEM region, which, in turn, generates interface reaction forces that are applied as piecewise constant traction boundary conditions to the MD domain. The enhancement is based on the addition of molecular dynamics-based cohesive zone model (CZM) elements near the MD-FEM interface. The CZM elements are a continuum interpretation of the traction-displacement relationships taken from MD simulations using Cohesive Zone Volume Elements (CZVE). The addition of CZM elements to the concurrent MD-FEM analysis provides a consistent set of atomistically-based cohesive properties within the finite element region near the growing crack. Another set of CZVEs are then used to extract revised CZM relationships from the enhanced embedded statistical coupling method (ESCM) simulation of an edge crack under uniaxial loading.
Application of Hierarchy Theory to Cross-Scale Hydrologic Modeling of Nutrient Loads

EPA Science Inventory

We describe a model called Regional Hydrologic Modeling for Environmental Evaluation 16 (RHyME²) for quantifying annual nutrient loads in stream networks and watersheds. RHyME² is 17 a cross-scale statistical and process-based water-quality model. The model ...
Modeling the spatial distribution of landslide-prone colluvium and shallow groundwater on hillslopes of Seattle, WA

USGS Publications Warehouse

Schulz, W.H.; Lidke, D.J.; Godt, J.W.

2008-01-01

Landslides in partially saturated colluvium on Seattle, WA, hillslopes have resulted in property damage and human casualties. We developed statistical models of colluvium and shallow-groundwater distributions to aid landslide hazard assessments. The models were developed using a geographic information system, digital geologic maps, digital topography, subsurface exploration results, the groundwater flow modeling software VS2DI and regression analyses. Input to the colluvium model includes slope, distance to a hillslope-crest escarpment, and escarpment slope and height. We developed different statistical relations for thickness of colluvium on four landforms. Groundwater model input includes colluvium basal slope and distance from the Fraser aquifer. This distance was used to estimate hydraulic conductivity based on the assumption that addition of finer-grained material from down-section would result in lower conductivity. Colluvial groundwater is perched so we estimated its saturated thickness. We used VS2DI to establish relations between saturated thickness and the hydraulic conductivity and basal slope of the colluvium. We developed different statistical relations for three groundwater flow regimes. All model results were validated using observational data that were excluded from calibration. Eighty percent of colluvium thickness predictions were within 25% of observed values and 88% of saturated thickness predictions were within 20% of observed values. The models are based on conditions common to many areas, so our method can provide accurate results for similar regions; relations in our statistical models require calibration for new regions. Our results suggest that Seattle landslides occur in native deposits and colluvium, ultimately in response to surface-water erosion of hillstope toes. Regional groundwater conditions do not appear to strongly affect the general distribution of Seattle landslides; historical landslides were equally dispersed within and outside of the area potentially affected by regional groundwater conditions.
A critical issue in model-based inference for studying trait-based community assembly and a solution.

PubMed

Ter Braak, Cajo J F; Peres-Neto, Pedro; Dray, Stéphane

2017-01-01

Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p -values (the p max test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p max test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an "omitted variable bias" problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p max test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time.

Hedonic approaches based on spatial econometrics and spatial statistics: application to evaluation of project benefits

NASA Astrophysics Data System (ADS)

Tsutsumi, Morito; Seya, Hajime

2009-12-01

This study discusses the theoretical foundation of the application of spatial hedonic approaches—the hedonic approach employing spatial econometrics or/and spatial statistics—to benefits evaluation. The study highlights the limitations of the spatial econometrics approach since it uses a spatial weight matrix that is not employed by the spatial statistics approach. Further, the study presents empirical analyses by applying the Spatial Autoregressive Error Model (SAEM), which is based on the spatial econometrics approach, and the Spatial Process Model (SPM), which is based on the spatial statistics approach. SPMs are conducted based on both isotropy and anisotropy and applied to different mesh sizes. The empirical analysis reveals that the estimated benefits are quite different, especially between isotropic and anisotropic SPM and between isotropic SPM and SAEM; the estimated benefits are similar for SAEM and anisotropic SPM. The study demonstrates that the mesh size does not affect the estimated amount of benefits. Finally, the study provides a confidence interval for the estimated benefits and raises an issue with regard to benefit evaluation.
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.

PubMed

Li, Xuelong; Guo, Qun; Lu, Xiaoqiang

2016-05-13

It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli.

PubMed

Westfall, Jacob; Kenny, David A; Judd, Charles M

2014-10-01

Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Investigating market efficiency through a forecasting model based on differential equations

NASA Astrophysics Data System (ADS)

de Resende, Charlene C.; Pereira, Adriano C. M.; Cardoso, Rodrigo T. N.; de Magalhães, A. R. Bosco

2017-05-01

A new differential equation based model for stock price trend forecast is proposed as a tool to investigate efficiency in an emerging market. Its predictive power showed statistically to be higher than the one of a completely random model, signaling towards the presence of arbitrage opportunities. Conditions for accuracy to be enhanced are investigated, and application of the model as part of a trading strategy is discussed.
Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation

NASA Technical Reports Server (NTRS)

Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.

1998-01-01

The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.
Numerical and Qualitative Contrasts of Two Statistical Models ...

EPA Pesticide Factsheets

Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and products. This study provided an empirical and qualitative comparison of both models using 29 years of data for two discrete time series of chlorophyll-a (chl-a) in the Patuxent River estuary. Empirical descriptions of each model were based on predictive performance against the observed data, ability to reproduce flow-normalized trends with simulated data, and comparisons of performance with validation datasets. Between-model differences were apparent but minor and both models had comparable abilities to remove flow effects from simulated time series. Both models similarly predicted observations for missing data with different characteristics. Trends from each model revealed distinct mainstem influences of the Chesapeake Bay with both models predicting a roughly 65% increase in chl-a over time in the lower estuary, whereas flow-normalized predictions for the upper estuary showed a more dynamic pattern, with a nearly 100% increase in chl-a in the last 10 years. Qualitative comparisons highlighted important differences in the statistical structure, available products, and characteristics of the data and desired analysis. This manuscript describes a quantitative comparison of two recently-
A Hybrid Multi-Scale Model of Crystal Plasticity for Handling Stress Concentrations

DOE PAGES

Sun, Shang; Ramazani, Ali; Sundararaghavan, Veera

2017-09-04

Microstructural effects become important at regions of stress concentrators such as notches, cracks and contact surfaces. A multiscale model is presented that efficiently captures microstructural details at such critical regions. The approach is based on a multiresolution mesh that includes an explicit microstructure representation at critical regions where stresses are localized. At regions farther away from the stress concentration, a reduced order model that statistically captures the effect of the microstructure is employed. The statistical model is based on a finite element representation of the orientation distribution function (ODF). As an illustrative example, we have applied the multiscaling method tomore » compute the stress intensity factor K I around the crack tip in a wedge-opening load specimen. The approach is verified with an analytical solution within linear elasticity approximation and is then extended to allow modeling of microstructural effects on crack tip plasticity.« less
A Hybrid Multi-Scale Model of Crystal Plasticity for Handling Stress Concentrations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sun, Shang; Ramazani, Ali; Sundararaghavan, Veera

Microstructural effects become important at regions of stress concentrators such as notches, cracks and contact surfaces. A multiscale model is presented that efficiently captures microstructural details at such critical regions. The approach is based on a multiresolution mesh that includes an explicit microstructure representation at critical regions where stresses are localized. At regions farther away from the stress concentration, a reduced order model that statistically captures the effect of the microstructure is employed. The statistical model is based on a finite element representation of the orientation distribution function (ODF). As an illustrative example, we have applied the multiscaling method tomore » compute the stress intensity factor K I around the crack tip in a wedge-opening load specimen. The approach is verified with an analytical solution within linear elasticity approximation and is then extended to allow modeling of microstructural effects on crack tip plasticity.« less
Predicting perceptual quality of images in realistic scenario using deep filter banks

NASA Astrophysics Data System (ADS)

Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang

2018-03-01

Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.
Probing the exchange statistics of one-dimensional anyon models

NASA Astrophysics Data System (ADS)

Greschner, Sebastian; Cardarelli, Lorenzo; Santos, Luis

2018-05-01

We propose feasible scenarios for revealing the modified exchange statistics in one-dimensional anyon models in optical lattices based on an extension of the multicolor lattice-depth modulation scheme introduced in [Phys. Rev. A 94, 023615 (2016), 10.1103/PhysRevA.94.023615]. We show that the fast modulation of a two-component fermionic lattice gas in the presence a magnetic field gradient, in combination with additional resonant microwave fields, allows for the quantum simulation of hardcore anyon models with periodic boundary conditions. Such a semisynthetic ring setup allows for realizing an interferometric arrangement sensitive to the anyonic statistics. Moreover, we show as well that simple expansion experiments may reveal the formation of anomalously bound pairs resulting from the anyonic exchange.
New statistical scission-point model to predict fission fragment observables

NASA Astrophysics Data System (ADS)

Lemaître, Jean-François; Panebianco, Stefano; Sida, Jean-Luc; Hilaire, Stéphane; Heinrich, Sophie

2015-09-01

The development of high performance computing facilities makes possible a massive production of nuclear data in a full microscopic framework. Taking advantage of the individual potential calculations of more than 7000 nuclei, a new statistical scission-point model, called SPY, has been developed. It gives access to the absolute available energy at the scission point, which allows the use of a parameter-free microcanonical statistical description to calculate the distributions and the mean values of all fission observables. SPY uses the richness of microscopy in a rather simple theoretical framework, without any parameter except the scission-point definition, to draw clear answers based on perfect knowledge of the ingredients involved in the model, with very limited computing cost.
Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

PubMed Central

Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

2015-01-01

Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579
Predicting trauma patient mortality: ICD [or ICD-10-AM] versus AIS based approaches.

PubMed

Willis, Cameron D; Gabbe, Belinda J; Jolley, Damien; Harrison, James E; Cameron, Peter A

2010-11-01

The International Classification of Diseases Injury Severity Score (ICISS) has been proposed as an International Classification of Diseases (ICD)-10-based alternative to mortality prediction tools that use Abbreviated Injury Scale (AIS) data, including the Trauma and Injury Severity Score (TRISS). To date, studies have not examined the performance of ICISS using Australian trauma registry data. This study aimed to compare the performance of ICISS with other mortality prediction tools in an Australian trauma registry. This was a retrospective review of prospectively collected data from the Victorian State Trauma Registry. A training dataset was created for model development and a validation dataset for evaluation. The multiplicative ICISS model was compared with a worst injury ICISS approach, Victorian TRISS (V-TRISS, using local coefficients), maximum AIS severity and a multivariable model including ICD-10-AM codes as predictors. Models were investigated for discrimination (C-statistic) and calibration (Hosmer-Lemeshow statistic). The multivariable approach had the highest level of discrimination (C-statistic 0.90) and calibration (H-L 7.65, P= 0.468). Worst injury ICISS, V-TRISS and maximum AIS had similar performance. The multiplicative ICISS produced the lowest level of discrimination (C-statistic 0.80) and poorest calibration (H-L 50.23, P < 0.001). The performance of ICISS may be affected by the data used to develop estimates, the ICD version employed, the methods for deriving estimates and the inclusion of covariates. In this analysis, a multivariable approach using ICD-10-AM codes was the best-performing method. A multivariable ICISS approach may therefore be a useful alternative to AIS-based methods and may have comparable predictive performance to locally derived TRISS models. © 2010 The Authors. ANZ Journal of Surgery © 2010 Royal Australasian College of Surgeons.
Determination of optimal imaging settings for urolithiasis CT using filtered back projection (FBP), statistical iterative reconstruction (IR) and knowledge-based iterative model reconstruction (IMR): a physical human phantom study

PubMed Central

Choi, Se Y; Ahn, Seung H; Choi, Jae D; Kim, Jung H; Lee, Byoung-Il; Kim, Jeong-In

2016-01-01

Objective: The purpose of this study was to compare CT image quality for evaluating urolithiasis using filtered back projection (FBP), statistical iterative reconstruction (IR) and knowledge-based iterative model reconstruction (IMR) according to various scan parameters and radiation doses. Methods: A 5 × 5 × 5 mm3 uric acid stone was placed in a physical human phantom at the level of the pelvis. 3 tube voltages (120, 100 and 80 kV) and 4 current–time products (100, 70, 30 and 15 mAs) were implemented in 12 scans. Each scan was reconstructed with FBP, statistical IR (Levels 5–7) and knowledge-based IMR (soft-tissue Levels 1–3). The radiation dose, objective image quality and signal-to-noise ratio (SNR) were evaluated, and subjective assessments were performed. Results: The effective doses ranged from 0.095 to 2.621 mSv. Knowledge-based IMR showed better objective image noise and SNR than did FBP and statistical IR. The subjective image noise of FBP was worse than that of statistical IR and knowledge-based IMR. The subjective assessment scores deteriorated after a break point of 100 kV and 30 mAs. Conclusion: At the setting of 100 kV and 30 mAs, the radiation dose can be decreased by approximately 84% while keeping the subjective image assessment. Advances in knowledge: Patients with urolithiasis can be evaluated with ultralow-dose non-enhanced CT using a knowledge-based IMR algorithm at a substantially reduced radiation dose with the imaging quality preserved, thereby minimizing the risks of radiation exposure while providing clinically relevant diagnostic benefits for patients. PMID:26577542
Ground-Based Telescope Parametric Cost Model

NASA Technical Reports Server (NTRS)

Stahl, H. Philip; Rowell, Ginger Holmes

2004-01-01

A parametric cost model for ground-based telescopes is developed using multi-variable statistical analysis, The model includes both engineering and performance parameters. While diameter continues to be the dominant cost driver, other significant factors include primary mirror radius of curvature and diffraction limited wavelength. The model includes an explicit factor for primary mirror segmentation and/or duplication (i.e.. multi-telescope phased-array systems). Additionally, single variable models based on aperture diameter are derived. This analysis indicates that recent mirror technology advances have indeed reduced the historical telescope cost curve.
Incorporating signal-dependent noise for hyperspectral target detection

NASA Astrophysics Data System (ADS)

Morman, Christopher J.; Meola, Joseph

2015-05-01

The majority of hyperspectral target detection algorithms are developed from statistical data models employing stationary background statistics or white Gaussian noise models. Stationary background models are inaccurate as a result of two separate physical processes. First, varying background classes often exist in the imagery that possess different clutter statistics. Many algorithms can account for this variability through the use of subspaces or clustering techniques. The second physical process, which is often ignored, is a signal-dependent sensor noise term. For photon counting sensors that are often used in hyperspectral imaging systems, sensor noise increases as the measured signal level increases as a result of Poisson random processes. This work investigates the impact of this sensor noise on target detection performance. A linear noise model is developed describing sensor noise variance as a linear function of signal level. The linear noise model is then incorporated for detection of targets using data collected at Wright Patterson Air Force Base.
Multi-Parent Clustering Algorithms from Stochastic Grammar Data Models

NASA Technical Reports Server (NTRS)

Mjoisness, Eric; Castano, Rebecca; Gray, Alexander

1999-01-01

We introduce a statistical data model and an associated optimization-based clustering algorithm which allows data vectors to belong to zero, one or several "parent" clusters. For each data vector the algorithm makes a discrete decision among these alternatives. Thus, a recursive version of this algorithm would place data clusters in a Directed Acyclic Graph rather than a tree. We test the algorithm with synthetic data generated according to the statistical data model. We also illustrate the algorithm using real data from large-scale gene expression assays.
Statistical Properties of Echosignal Obtained from Human Dermis In Vivo

NASA Astrophysics Data System (ADS)

Piotrzkowska, Hanna; Litniewski, Jerzy; Nowicki, Andrzej; Szymańska, Elżbieta

The paper presents the classification of the healthy skin and the skin lesions (basal cell carcinoma and actinic keratosis), basing on the statistical parameters of the envelope of ultrasonic echoes. The envelope was modeled using Rayleigh and non-Rayleigh (K-distribution) statistics. Furthermore, the characteristic parameter of the K-distribution, the effective number of scatterers was investigated. Also the attenuation coefficient was used for the skin lesion assessment.
Cancer Related-Knowledge - Small Area Estimates

Cancer.gov

These model-based estimates are produced using statistical models that combine data from the Health Information National Trends Survey, and auxiliary variables obtained from relevant sources and borrow strength from other areas with similar characteristics.
A detailed heterogeneous agent model for a single asset financial market with trading via an order book.

PubMed

Mota Navarro, Roberto; Larralde, Hernán

2017-01-01

We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of "profit taking", our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains.

A detailed heterogeneous agent model for a single asset financial market with trading via an order book

PubMed Central

2017-01-01

We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of “profit taking”, our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains. PMID:28245251
Diagnosis of students' ability in a statistical course based on Rasch probabilistic outcome

NASA Astrophysics Data System (ADS)

Mahmud, Zamalia; Ramli, Wan Syahira Wan; Sapri, Shamsiah; Ahmad, Sanizah

2017-06-01

Measuring students' ability and performance are important in assessing how well students have learned and mastered the statistical courses. Any improvement in learning will depend on the student's approaches to learning, which are relevant to some factors of learning, namely assessment methods carrying out tasks consisting of quizzes, tests, assignment and final examination. This study has attempted an alternative approach to measure students' ability in an undergraduate statistical course based on the Rasch probabilistic model. Firstly, this study aims to explore the learning outcome patterns of students in a statistics course (Applied Probability and Statistics) based on an Entrance-Exit survey. This is followed by investigating students' perceived learning ability based on four Course Learning Outcomes (CLOs) and students' actual learning ability based on their final examination scores. Rasch analysis revealed that students perceived themselves as lacking the ability to understand about 95% of the statistics concepts at the beginning of the class but eventually they had a good understanding at the end of the 14 weeks class. In terms of students' performance in their final examination, their ability in understanding the topics varies at different probability values given the ability of the students and difficulty of the questions. Majority found the probability and counting rules topic to be the most difficult to learn.
A New Statistical Model of Electroencephalogram Noise Spectra for Real-Time Brain-Computer Interfaces.

PubMed

Paris, Alan; Atia, George K; Vosoughi, Azadeh; Berman, Stephen A

2017-08-01

A characteristic of neurological signal processing is high levels of noise from subcellular ion channels up to whole-brain processes. In this paper, we propose a new model of electroencephalogram (EEG) background periodograms, based on a family of functions which we call generalized van der Ziel-McWhorter (GVZM) power spectral densities (PSDs). To the best of our knowledge, the GVZM PSD function is the only EEG noise model that has relatively few parameters, matches recorded EEG PSD's with high accuracy from 0 to over 30 Hz, and has approximately 1/f θ behavior in the midfrequencies without infinities. We validate this model using three approaches. First, we show how GVZM PSDs can arise in a population of ion channels at maximum entropy equilibrium. Second, we present a class of mixed autoregressive models, which simulate brain background noise and whose periodograms are asymptotic to the GVZM PSD. Third, we present two real-time estimation algorithms for steady-state visual evoked potential (SSVEP) frequencies, and analyze their performance statistically. In pairwise comparisons, the GVZM-based algorithms showed statistically significant accuracy improvement over two well-known and widely used SSVEP estimators. The GVZM noise model can be a useful and reliable technique for EEG signal processing. Understanding EEG noise is essential for EEG-based neurology and applications such as real-time brain-computer interfaces, which must make accurate control decisions from very short data epochs. The GVZM approach represents a successful new paradigm for understanding and managing this neurological noise.
Whole vertebral bone segmentation method with a statistical intensity-shape model based approach

NASA Astrophysics Data System (ADS)

Hanaoka, Shouhei; Fritscher, Karl; Schuler, Benedikt; Masutani, Yoshitaka; Hayashi, Naoto; Ohtomo, Kuni; Schubert, Rainer

2011-03-01

An automatic segmentation algorithm for the vertebrae in human body CT images is presented. Especially we focused on constructing and utilizing 4 different statistical intensity-shape combined models for the cervical, upper / lower thoracic and lumbar vertebrae, respectively. For this purpose, two previously reported methods were combined: a deformable model-based initial segmentation method and a statistical shape-intensity model-based precise segmentation method. The former is used as a pre-processing to detect the position and orientation of each vertebra, which determines the initial condition for the latter precise segmentation method. The precise segmentation method needs prior knowledge on both the intensities and the shapes of the objects. After PCA analysis of such shape-intensity expressions obtained from training image sets, vertebrae were parametrically modeled as a linear combination of the principal component vectors. The segmentation of each target vertebra was performed as fitting of this parametric model to the target image by maximum a posteriori estimation, combined with the geodesic active contour method. In the experimental result by using 10 cases, the initial segmentation was successful in 6 cases and only partially failed in 4 cases (2 in the cervical area and 2 in the lumbo-sacral). In the precise segmentation, the mean error distances were 2.078, 1.416, 0.777, 0.939 mm for cervical, upper and lower thoracic, lumbar spines, respectively. In conclusion, our automatic segmentation algorithm for the vertebrae in human body CT images showed a fair performance for cervical, thoracic and lumbar vertebrae.
Localized Smart-Interpretation

NASA Astrophysics Data System (ADS)

Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom

2014-05-01

The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.
Effects of atmospheric turbulence on microwave and millimeter wave satellite communications systems. [attenuation statistics and antenna design

NASA Technical Reports Server (NTRS)

Devasirvatham, D. M. J.; Hodge, D. B.

1981-01-01

A model of the microwave and millimeter wave link in the presence of atmospheric turbulence is presented with emphasis on satellite communications systems. The analysis is based on standard methods of statistical theory. The results are directly usable by the design engineer.
Internationalizing Nussbaum's Model of Cosmopolitan Democratic Education

ERIC Educational Resources Information Center

Culp, Julian

2018-01-01

Nussbaum's moral cosmopolitanism informs her capability-based theory of justice, which she uses in order to develop a distinctive model of cosmopolitan democratic education. I characterize Nussbaum's educational model as a 'statist model,' however, because it regards cosmopolitan democratic education as necessary for realizing democratic…
Fault detection and diagnosis using neural network approaches

NASA Technical Reports Server (NTRS)

Kramer, Mark A.

1992-01-01

Neural networks can be used to detect and identify abnormalities in real-time process data. Two basic approaches can be used, the first based on training networks using data representing both normal and abnormal modes of process behavior, and the second based on statistical characterization of the normal mode only. Given data representative of process faults, radial basis function networks can effectively identify failures. This approach is often limited by the lack of fault data, but can be facilitated by process simulation. The second approach employs elliptical and radial basis function neural networks and other models to learn the statistical distributions of process observables under normal conditions. Analytical models of failure modes can then be applied in combination with the neural network models to identify faults. Special methods can be applied to compensate for sensor failures, to produce real-time estimation of missing or failed sensors based on the correlations codified in the neural network.
An Intelligent Model for Pairs Trading Using Genetic Algorithms.

PubMed

Huang, Chien-Feng; Hsu, Chi-Jen; Chen, Chi-Chung; Chang, Bao Rong; Li, Chen-An

2015-01-01

Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice.
An Intelligent Model for Pairs Trading Using Genetic Algorithms

PubMed Central

Hsu, Chi-Jen; Chen, Chi-Chung; Li, Chen-An

2015-01-01

Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice. PMID:26339236
Directional Statistics for Polarization Observations of Individual Pulses from Radio Pulsars

NASA Astrophysics Data System (ADS)

McKinnon, M. M.

2010-10-01

Radio polarimetry is a three-dimensional statistical problem. The three-dimensional aspect of the problem arises from the Stokes parameters Q, U, and V, which completely describe the polarization of electromagnetic radiation and conceptually define the orientation of a polarization vector in the Poincaré sphere. The statistical aspect of the problem arises from the random fluctuations in the source-intrinsic polarization and the instrumental noise. A simple model for the polarization of pulsar radio emission has been used to derive the three-dimensional statistics of radio polarimetry. The model is based upon the proposition that the observed polarization is due to the incoherent superposition of two, highly polarized, orthogonal modes. The directional statistics derived from the model follow the Bingham-Mardia and Fisher family of distributions. The model assumptions are supported by the qualitative agreement between the statistics derived from it and those measured with polarization observations of the individual pulses from pulsars. The orthogonal modes are thought to be the natural modes of radio wave propagation in the pulsar magnetosphere. The intensities of the modes become statistically independent when generalized Faraday rotation (GFR) in the magnetosphere causes the difference in their phases to be large. A stochastic version of GFR occurs when fluctuations in the phase difference are also large, and may be responsible for the more complicated polarization patterns observed in pulsar radio emission.
Statistics for X-chromosome associations.

PubMed

Özbek, Umut; Lin, Hui-Min; Lin, Yan; Weeks, Daniel E; Chen, Wei; Shaffer, John R; Purcell, Shaun M; Feingold, Eleanor

2018-06-13

In a genome-wide association study (GWAS), association between genotype and phenotype at autosomal loci is generally tested by regression models. However, X-chromosome data are often excluded from published analyses of autosomes because of the difference between males and females in number of X chromosomes. Failure to analyze X-chromosome data at all is obviously less than ideal, and can lead to missed discoveries. Even when X-chromosome data are included, they are often analyzed with suboptimal statistics. Several mathematically sensible statistics for X-chromosome association have been proposed. The optimality of these statistics, however, is based on very specific simple genetic models. In addition, while previous simulation studies of these statistics have been informative, they have focused on single-marker tests and have not considered the types of error that occur even under the null hypothesis when the entire X chromosome is scanned. In this study, we comprehensively tested several X-chromosome association statistics using simulation studies that include the entire chromosome. We also considered a wide range of trait models for sex differences and phenotypic effects of X inactivation. We found that models that do not incorporate a sex effect can have large type I error in some cases. We also found that many of the best statistics perform well even when there are modest deviations, such as trait variance differences between the sexes or small sex differences in allele frequencies, from assumptions. © 2018 WILEY PERIODICALS, INC.
A review of single-sample-based models and other approaches for radiocarbon dating of dissolved inorganic carbon in groundwater

USGS Publications Warehouse

Han, L. F; Plummer, Niel

2016-01-01

Numerous methods have been proposed to estimate the pre-nuclear-detonation 14C content of dissolved inorganic carbon (DIC) recharged to groundwater that has been corrected/adjusted for geochemical processes in the absence of radioactive decay (14C0) - a quantity that is essential for estimation of radiocarbon age of DIC in groundwater. The models/approaches most commonly used are grouped as follows: (1) single-sample-based models, (2) a statistical approach based on the observed (curved) relationship between 14C and δ13C data for the aquifer, and (3) the geochemical mass-balance approach that constructs adjustment models accounting for all the geochemical reactions known to occur along a groundwater flow path. This review discusses first the geochemical processes behind each of the single-sample-based models, followed by discussions of the statistical approach and the geochemical mass-balance approach. Finally, the applications, advantages and limitations of the three groups of models/approaches are discussed.The single-sample-based models constitute the prevailing use of 14C data in hydrogeology and hydrological studies. This is in part because the models are applied to an individual water sample to estimate the 14C age, therefore the measurement data are easily available. These models have been shown to provide realistic radiocarbon ages in many studies. However, they usually are limited to simple carbonate aquifers and selection of model may have significant effects on 14C0 often resulting in a wide range of estimates of 14C ages.Of the single-sample-based models, four are recommended for the estimation of 14C0 of DIC in groundwater: Pearson's model, (Ingerson and Pearson, 1964; Pearson and White, 1967), Han & Plummer's model (Han and Plummer, 2013), the IAEA model (Gonfiantini, 1972; Salem et al., 1980), and Oeschger's model (Geyh, 2000). These four models include all processes considered in single-sample-based models, and can be used in different ranges of 13C values.In contrast to the single-sample-based models, the extended Gonfiantini & Zuppi model (Gonfiantini and Zuppi, 2003; Han et al., 2014) is a statistical approach. This approach can be used to estimate 14C ages when a curved relationship between the 14C and 13C values of the DIC data is observed. In addition to estimation of groundwater ages, the relationship between 14C and δ13C data can be used to interpret hydrogeological characteristics of the aquifer, e.g. estimating apparent rates of geochemical reactions and revealing the complexity of the geochemical environment, and identify samples that are not affected by the same set of reactions/processes as the rest of the dataset. The investigated water samples may have a wide range of ages, and for waters with very low values of 14C, the model based on statistics may give more reliable age estimates than those obtained from single-sample-based models. In the extended Gonfiantini & Zuppi model, a representative system-wide value of the initial 14C content is derived from the 14C and δ13C data of DIC and can differ from that used in single-sample-based models. Therefore, the extended Gonfiantini & Zuppi model usually avoids the effect of modern water components which might retain ‘bomb’ pulse signatures.The geochemical mass-balance approach constructs an adjustment model that accounts for all the geochemical reactions known to occur along an aquifer flow path (Plummer et al., 1983; Wigley et al., 1978; Plummer et al., 1994; Plummer and Glynn, 2013), and includes, in addition to DIC, dissolved organic carbon (DOC) and methane (CH4). If sufficient chemical, mineralogical and isotopic data are available, the geochemical mass-balance method can yield the most accurate estimates of the adjusted radiocarbon age. The main limitation of this approach is that complete information is necessary on chemical, mineralogical and isotopic data and these data are often limited.Failure to recognize the limitations and underlying assumptions on which the various models and approaches are based can result in a wide range of estimates of 14C0 and limit the usefulness of radiocarbon as a dating tool for groundwater. In each of the three generalized approaches (single-sample-based models, statistical approach, and geochemical mass-balance approach), successful application depends on scrutiny of the isotopic (14C and 13C) and chemical data to conceptualize the reactions and processes that affect the 14C content of DIC in aquifers. The recently developed graphical analysis method is shown to aid in determining which approach is most appropriate for the isotopic and chemical data from a groundwater system.
A statistical mechanics model for free-for-all airplane passenger boarding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steffen, Jason H.; /Fermilab

2008-08-01

I discuss a model for free-for-all passenger boarding which is employed by some discount air carriers. The model is based on the principles of statistical mechanics where each seat in the aircraft has an associated energy which reflects the preferences of travelers. As each passenger enters the airplane they select their seats using Boltzmann statistics, proceed to that location, load their luggage, sit down, and the partition function seen by remaining passengers is modified to reflect this fact. I discuss the various model parameters and make qualitative comparisons of this passenger boarding model with those that involve assigned seats. Themore » model can be used to predict the probability that certain seats will be occupied at different times during the boarding process. These results might provide a useful description of this boarding method. The model is a relatively unusual application of undergraduate level physics and describes a situation familiar to many students and faculty.« less
Counts-in-cylinders in the Sloan Digital Sky Survey with Comparisons to N-body Simulations

NASA Astrophysics Data System (ADS)

Berrier, Heather D.; Barton, Elizabeth J.; Berrier, Joel C.; Bullock, James S.; Zentner, Andrew R.; Wechsler, Risa H.

2011-01-01

Environmental statistics provide a necessary means of comparing the properties of galaxies in different environments, and a vital test of models of galaxy formation within the prevailing hierarchical cosmological model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a particular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same two-point correlation functions do not necessarily have the same companion count distributions. We use this statistic to examine the environments of galaxies in the Sloan Digital Sky Survey Data Release 4 (SDSS DR4). We also make preliminary comparisons to four models for the spatial distributions of galaxies, based on N-body simulations and data from SDSS DR4, to study the utility of the counts-in-cylinders statistic. There is a very large scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that prevalent empirical models of galaxy clustering, that match observed two- and three-point clustering statistics well, fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3, and 6 h -1 Mpc scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 6 h -1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 h -1 Mpc cylinder than the galaxies in any of the models we use. Simple phenomenological models that map galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments.
Diffusion-Based Density-Equalizing Maps: an Interdisciplinary Approach to Visualizing Homicide Rates and Other Georeferenced Statistical Data

NASA Astrophysics Data System (ADS)

Mazzitello, Karina I.; Candia, Julián

2012-12-01

In every country, public and private agencies allocate extensive funding to collect large-scale statistical data, which in turn are studied and analyzed in order to determine local, regional, national, and international policies regarding all aspects relevant to the welfare of society. One important aspect of that process is the visualization of statistical data with embedded geographical information, which most often relies on archaic methods such as maps colored according to graded scales. In this work, we apply nonstandard visualization techniques based on physical principles. We illustrate the method with recent statistics on homicide rates in Brazil and their correlation to other publicly available data. This physics-based approach provides a novel tool that can be used by interdisciplinary teams investigating statistics and model projections in a variety of fields such as economics and gross domestic product research, public health and epidemiology, sociodemographics, political science, business and marketing, and many others.
Atmospheric Visibility Monitoring for planetary optical communications

NASA Technical Reports Server (NTRS)

Cowles, Kelly

1991-01-01

The Atmospheric Visibility Monitoring project endeavors to improve current atmospheric models and generate visibility statistics relevant to prospective earth-satellite optical communications systems. Three autonomous observatories are being used to measure atmospheric conditions on the basis of observed starlight; these data will yield clear-sky and transmission statistics for three sites with high clear-sky probabilities. Ground-based data will be compared with satellite imagery to determine the correlation between satellite data and ground-based observations.
A subdivision-based parametric deformable model for surface extraction and statistical shape modeling of the knee cartilages

NASA Astrophysics Data System (ADS)

Fripp, Jurgen; Crozier, Stuart; Warfield, Simon K.; Ourselin, Sébastien

2006-03-01

Subdivision surfaces and parameterization are desirable for many algorithms that are commonly used in Medical Image Analysis. However, extracting an accurate surface and parameterization can be difficult for many anatomical objects of interest, due to noisy segmentations and the inherent variability of the object. The thin cartilages of the knee are an example of this, especially after damage is incurred from injuries or conditions like osteoarthritis. As a result, the cartilages can have different topologies or exist in multiple pieces. In this paper we present a topology preserving (genus 0) subdivision-based parametric deformable model that is used to extract the surfaces of the patella and tibial cartilages in the knee. These surfaces have minimal thickness in areas without cartilage. The algorithm inherently incorporates several desirable properties, including: shape based interpolation, sub-division remeshing and parameterization. To illustrate the usefulness of this approach, the surfaces and parameterizations of the patella cartilage are used to generate a 3D statistical shape model.
A data-based conservation planning tool for Florida panthers

USGS Publications Warehouse

Murrow, Jennifer L.; Thatcher, Cindy A.; Van Manen, Frank T.; Clark, Joseph D.

2013-01-01

Habitat loss and fragmentation are the greatest threats to the endangered Florida panther (Puma concolor coryi). We developed a data-based habitat model and user-friendly interface so that land managers can objectively evaluate Florida panther habitat. We used a geographic information system (GIS) and the Mahalanobis distance statistic (D2) to develop a model based on broad-scale landscape characteristics associated with panther home ranges. Variables in our model were Euclidean distance to natural land cover, road density, distance to major roads, human density, amount of natural land cover, amount of semi-natural land cover, amount of permanent or semi-permanent flooded area–open water, and a cost–distance variable. We then developed a Florida Panther Habitat Estimator tool, which automates and replicates the GIS processes used to apply the statistical habitat model. The estimator can be used by persons with moderate GIS skills to quantify effects of land-use changes on panther habitat at local and landscape scales. Example applications of the tool are presented.
Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

NASA Astrophysics Data System (ADS)

Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

2010-10-01

This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

Analysis and modeling of wafer-level process variability in 28 nm FD-SOI using split C-V measurements

NASA Astrophysics Data System (ADS)

Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard

2018-07-01

This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.
Communication Dynamics of Blog Networks

NASA Astrophysics Data System (ADS)

Goldberg, Mark; Kelley, Stephen; Magdon-Ismail, Malik; Mertsalov, Konstantin; Wallace, William (Al)

We study the communication dynamics of Blog networks, focusing on the Russian section of LiveJournal as a case study. Communication (blogger-to-blogger links) in such online communication networks is very dynamic: over 60% of the links in the network are new from one week to the next, though the set of bloggers remains approximately constant. Two fundamental questions are: (i) what models adequately describe such dynamic communication behavior; and (ii) how does one detect the phase transitions, i.e. the changes that go beyond the standard high-level dynamics? We approach these questions through the notion of stable statistics. We give strong experimental evidence to the fact that, despite the extreme amount of communication dynamics, several aggregate statistics are remarkably stable. We use stable statistics to test our models of communication dynamics postulating that any good model should produce values for these statistics which are both stable and close to the observed ones. Stable statistics can also be used to identify phase transitions, since any change in a normally stable statistic indicates a substantial change in the nature of the communication dynamics. We describe models of the communication dynamics in large social networks based on the principle of locality of communication: a node's communication energy is spent mostly within its own "social area," the locality of the node.
Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

DOE PAGES

Blanc, Élodie

2017-01-26

This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less
Statistical emulators of maize, rice, soybean and wheat yields from global gridded crop models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blanc, Élodie

This study provides statistical emulators of crop yields based on global gridded crop model simulations from the Inter-Sectoral Impact Model Intercomparison Project Fast Track project. The ensemble of simulations is used to build a panel of annual crop yields from five crop models and corresponding monthly summer weather variables for over a century at the grid cell level globally. This dataset is then used to estimate, for each crop and gridded crop model, the statistical relationship between yields, temperature, precipitation and carbon dioxide. This study considers a new functional form to better capture the non-linear response of yields to weather,more » especially for extreme temperature and precipitation events, and now accounts for the effect of soil type. In- and out-of-sample validations show that the statistical emulators are able to replicate spatial patterns of yields crop levels and changes overtime projected by crop models reasonably well, although the accuracy of the emulators varies by model and by region. This study therefore provides a reliable and accessible alternative to global gridded crop yield models. By emulating crop yields for several models using parsimonious equations, the tools provide a computationally efficient method to account for uncertainty in climate change impact assessments.« less
Sampling methods to the statistical control of the production of blood components.

PubMed

Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

2017-12-01

The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
Demonstration of fundamental statistics by studying timing of electronics signals in a physics-based laboratory

NASA Astrophysics Data System (ADS)

Beach, Shaun E.; Semkow, Thomas M.; Remling, David J.; Bradt, Clayton J.

2017-07-01

We have developed accessible methods to demonstrate fundamental statistics in several phenomena, in the context of teaching electronic signal processing in a physics-based college-level curriculum. A relationship between the exponential time-interval distribution and Poisson counting distribution for a Markov process with constant rate is derived in a novel way and demonstrated using nuclear counting. Negative binomial statistics is demonstrated as a model for overdispersion and justified by the effect of electronic noise in nuclear counting. The statistics of digital packets on a computer network are shown to be compatible with the fractal-point stochastic process leading to a power-law as well as generalized inverse Gaussian density distributions of time intervals between packets.
Statistical Downscaling of General Circulation Model Outputs to Precipitation Accounting for Non-Stationarities in Predictor-Predictand Relationships

PubMed Central

Sachindra, D. A.; Perera, B. J. C.

2016-01-01

This paper presents a novel approach to incorporate the non-stationarities characterised in the GCM outputs, into the Predictor-Predictand Relationships (PPRs) in statistical downscaling models. In this approach, a series of 42 PPRs based on multi-linear regression (MLR) technique were determined for each calendar month using a 20-year moving window moved at a 1-year time step on the predictor data obtained from the NCEP/NCAR reanalysis data archive and observations of precipitation at 3 stations located in Victoria, Australia, for the period 1950–2010. Then the relationships between the constants and coefficients in the PPRs and the statistics of reanalysis data of predictors were determined for the period 1950–2010, for each calendar month. Thereafter, using these relationships with the statistics of the past data of HadCM3 GCM pertaining to the predictors, new PPRs were derived for the periods 1950–69, 1970–89 and 1990–99 for each station. This process yielded a non-stationary downscaling model consisting of a PPR per calendar month for each of the above three periods for each station. The non-stationarities in the climate are characterised by the long-term changes in the statistics of the climate variables and above process enabled relating the non-stationarities in the climate to the PPRs. These new PPRs were then used with the past data of HadCM3, to reproduce the observed precipitation. It was found that the non-stationary MLR based downscaling model was able to produce more accurate simulations of observed precipitation more often than conventional stationary downscaling models developed with MLR and Genetic Programming (GP). PMID:27997609
Statistical Downscaling of General Circulation Model Outputs to Precipitation Accounting for Non-Stationarities in Predictor-Predictand Relationships.

PubMed

Sachindra, D A; Perera, B J C

2016-01-01

This paper presents a novel approach to incorporate the non-stationarities characterised in the GCM outputs, into the Predictor-Predictand Relationships (PPRs) in statistical downscaling models. In this approach, a series of 42 PPRs based on multi-linear regression (MLR) technique were determined for each calendar month using a 20-year moving window moved at a 1-year time step on the predictor data obtained from the NCEP/NCAR reanalysis data archive and observations of precipitation at 3 stations located in Victoria, Australia, for the period 1950-2010. Then the relationships between the constants and coefficients in the PPRs and the statistics of reanalysis data of predictors were determined for the period 1950-2010, for each calendar month. Thereafter, using these relationships with the statistics of the past data of HadCM3 GCM pertaining to the predictors, new PPRs were derived for the periods 1950-69, 1970-89 and 1990-99 for each station. This process yielded a non-stationary downscaling model consisting of a PPR per calendar month for each of the above three periods for each station. The non-stationarities in the climate are characterised by the long-term changes in the statistics of the climate variables and above process enabled relating the non-stationarities in the climate to the PPRs. These new PPRs were then used with the past data of HadCM3, to reproduce the observed precipitation. It was found that the non-stationary MLR based downscaling model was able to produce more accurate simulations of observed precipitation more often than conventional stationary downscaling models developed with MLR and Genetic Programming (GP).
A copula-based assessment of Bartlett-Lewis type of rainfall models for preserving drought statistics

NASA Astrophysics Data System (ADS)

Pham, M. T.; Vanhaute, W. J.; Vandenberghe, S.; De Baets, B.; Verhoest, N. E. C.

2013-06-01

Of all natural disasters, the economic and environmental consequences of droughts are among the highest because of their longevity and widespread spatial extent. Because of their extreme behaviour, studying droughts generally requires long time series of historical climate data. Rainfall is a very important variable for calculating drought statistics, for quantifying historical droughts or for assessing the impact on other hydrological (e.g. water stage in rivers) or agricultural (e.g. irrigation requirements) variables. Unfortunately, time series of historical observations are often too short for such assessments. To circumvent this, one may rely on the synthetic rainfall time series from stochastic point process rainfall models, such as Bartlett-Lewis models. The present study investigates whether drought statistics are preserved when simulating rainfall with Bartlett-Lewis models. Therefore, a 105 yr 10 min rainfall time series obtained at Uccle, Belgium is used as test case. First, drought events were identified on the basis of the Effective Drought Index (EDI), and each event was characterized by two variables, i.e. drought duration (D) and drought severity (S). As both parameters are interdependent, a multivariate distribution function, which makes use of a copula, was fitted. Based on the copula, four types of drought return periods are calculated for observed as well as simulated droughts and are used to evaluate the ability of the rainfall models to simulate drought events with the appropriate characteristics. Overall, all Bartlett-Lewis type of models studied fail in preserving extreme drought statistics, which is attributed to the model structure and to the model stationarity caused by maintaining the same parameter set during the whole simulation period.
Exploring complex dynamics in multi agent-based intelligent systems: Theoretical and experimental approaches using the Multi Agent-based Behavioral Economic Landscape (MABEL) model

NASA Astrophysics Data System (ADS)

Alexandridis, Konstantinos T.

This dissertation adopts a holistic and detailed approach to modeling spatially explicit agent-based artificial intelligent systems, using the Multi Agent-based Behavioral Economic Landscape (MABEL) model. The research questions that addresses stem from the need to understand and analyze the real-world patterns and dynamics of land use change from a coupled human-environmental systems perspective. Describes the systemic, mathematical, statistical, socio-economic and spatial dynamics of the MABEL modeling framework, and provides a wide array of cross-disciplinary modeling applications within the research, decision-making and policy domains. Establishes the symbolic properties of the MABEL model as a Markov decision process, analyzes the decision-theoretic utility and optimization attributes of agents towards comprising statistically and spatially optimal policies and actions, and explores the probabilogic character of the agents' decision-making and inference mechanisms via the use of Bayesian belief and decision networks. Develops and describes a Monte Carlo methodology for experimental replications of agent's decisions regarding complex spatial parcel acquisition and learning. Recognizes the gap on spatially-explicit accuracy assessment techniques for complex spatial models, and proposes an ensemble of statistical tools designed to address this problem. Advanced information assessment techniques such as the Receiver-Operator Characteristic curve, the impurity entropy and Gini functions, and the Bayesian classification functions are proposed. The theoretical foundation for modular Bayesian inference in spatially-explicit multi-agent artificial intelligent systems, and the ensembles of cognitive and scenario assessment modular tools build for the MABEL model are provided. Emphasizes the modularity and robustness as valuable qualitative modeling attributes, and examines the role of robust intelligent modeling as a tool for improving policy-decisions related to land use change. Finally, the major contributions to the science are presented along with valuable directions for future research.
A nonparametric analysis of plot basal area growth using tree based models

Treesearch

G. L. Gadbury; H. K. lyer; H. T. Schreuder; C. Y. Ueng

1997-01-01

Tree based statistical models can be used to investigate data structure and predict future observations. We used nonparametric and nonlinear models to reexamine the data sets on tree growth used by Bechtold et al. (1991) and Ruark et al. (1991). The growth data were collected by Forest Inventory and Analysis (FIA) teams from 1962 to 1972 (4th cycle) and 1972 to 1982 (...
Review of Statistical Methods for Analysing Healthcare Resources and Costs

PubMed Central

Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

2011-01-01

We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344
Scene-based nonuniformity correction and enhancement: pixel statistics and subpixel motion.

PubMed

Zhao, Wenyi; Zhang, Chao

2008-07-01

We propose a framework for scene-based nonuniformity correction (NUC) and nonuniformity correction and enhancement (NUCE) that is required for focal-plane array-like sensors to obtain clean and enhanced-quality images. The core of the proposed framework is a novel registration-based nonuniformity correction super-resolution (NUCSR) method that is bootstrapped by statistical scene-based NUC methods. Based on a comprehensive imaging model and an accurate parametric motion estimation, we are able to remove severe/structured nonuniformity and in the presence of subpixel motion to simultaneously improve image resolution. One important feature of our NUCSR method is the adoption of a parametric motion model that allows us to (1) handle many practical scenarios where parametric motions are present and (2) carry out perfect super-resolution in principle by exploring available subpixel motions. Experiments with real data demonstrate the efficiency of the proposed NUCE framework and the effectiveness of the NUCSR method.
Evaluating bacterial gene-finding HMM structures as probabilistic logic programs.

PubMed

Mørk, Søren; Holmes, Ian

2012-03-01

Probabilistic logic programming offers a powerful way to describe and evaluate structured statistical models. To investigate the practicality of probabilistic logic programming for structure learning in bioinformatics, we undertook a simplified bacterial gene-finding benchmark in PRISM, a probabilistic dialect of Prolog. We evaluate Hidden Markov Model structures for bacterial protein-coding gene potential, including a simple null model structure, three structures based on existing bacterial gene finders and two novel model structures. We test standard versions as well as ADPH length modeling and three-state versions of the five model structures. The models are all represented as probabilistic logic programs and evaluated using the PRISM machine learning system in terms of statistical information criteria and gene-finding prediction accuracy, in two bacterial genomes. Neither of our implementations of the two currently most used model structures are best performing in terms of statistical information criteria or prediction performances, suggesting that better-fitting models might be achievable. The source code of all PRISM models, data and additional scripts are freely available for download at: http://github.com/somork/codonhmm. Supplementary data are available at Bioinformatics online.
Risk prediction model: Statistical and artificial neural network approach

NASA Astrophysics Data System (ADS)

Paiman, Nuur Azreen; Hariri, Azian; Masood, Ibrahim

2017-04-01

Prediction models are increasingly gaining popularity and had been used in numerous areas of studies to complement and fulfilled clinical reasoning and decision making nowadays. The adoption of such models assist physician's decision making, individual's behavior, and consequently improve individual outcomes and the cost-effectiveness of care. The objective of this paper is to reviewed articles related to risk prediction model in order to understand the suitable approach, development and the validation process of risk prediction model. A qualitative review of the aims, methods and significant main outcomes of the nineteen published articles that developed risk prediction models from numerous fields were done. This paper also reviewed on how researchers develop and validate the risk prediction models based on statistical and artificial neural network approach. From the review done, some methodological recommendation in developing and validating the prediction model were highlighted. According to studies that had been done, artificial neural network approached in developing the prediction model were more accurate compared to statistical approach. However currently, only limited published literature discussed on which approach is more accurate for risk prediction model development.
Development and external validation of a risk-prediction model to predict 5-year overall survival in advanced larynx cancer.

PubMed

Petersen, Japke F; Stuiver, Martijn M; Timmermans, Adriana J; Chen, Amy; Zhang, Hongzhen; O'Neill, James P; Deady, Sandra; Vander Poorten, Vincent; Meulemans, Jeroen; Wennerberg, Johan; Skroder, Carl; Day, Andrew T; Koch, Wayne; van den Brekel, Michiel W M

2018-05-01

TNM-classification inadequately estimates patient-specific overall survival (OS). We aimed to improve this by developing a risk-prediction model for patients with advanced larynx cancer. Cohort study. We developed a risk prediction model to estimate the 5-year OS rate based on a cohort of 3,442 patients with T3T4N0N+M0 larynx cancer. The model was internally validated using bootstrapping samples and externally validated on patient data from five external centers (n = 770). The main outcome was performance of the model as tested by discrimination, calibration, and the ability to distinguish risk groups based on tertiles from the derivation dataset. The model performance was compared to a model based on T and N classification only. We included age, gender, T and N classification, and subsite as prognostic variables in the standard model. After external validation, the standard model had a significantly better fit than a model based on T and N classification alone (C statistic, 0.59 vs. 0.55, P < .001). The model was able to distinguish well among three risk groups based on tertiles of the risk score. Adding treatment modality to the model did not decrease the predictive power. As a post hoc analysis, we tested the added value of comorbidity as scored by American Society of Anesthesiologists score in a subsample, which increased the C statistic to 0.68. A risk prediction model for patients with advanced larynx cancer, consisting of readily available clinical variables, gives more accurate estimations of the estimated 5-year survival rate when compared to a model based on T and N classification alone. 2c. Laryngoscope, 128:1140-1145, 2018. © 2017 The American Laryngological, Rhinological and Otological Society, Inc.
Implications of the methodological choices for hydrologic portrayals of climate change over the contiguous United States: Statistically downscaled forcing data and hydrologic models

USGS Publications Warehouse

Mizukami, Naoki; Clark, Martyn P.; Gutmann, Ethan D.; Mendoza, Pablo A.; Newman, Andrew J.; Nijssen, Bart; Livneh, Ben; Hay, Lauren E.; Arnold, Jeffrey R.; Brekke, Levi D.

2016-01-01

Continental-domain assessments of climate change impacts on water resources typically rely on statistically downscaled climate model outputs to force hydrologic models at a finer spatial resolution. This study examines the effects of four statistical downscaling methods [bias-corrected constructed analog (BCCA), bias-corrected spatial disaggregation applied at daily (BCSDd) and monthly scales (BCSDm), and asynchronous regression (AR)] on retrospective hydrologic simulations using three hydrologic models with their default parameters (the Community Land Model, version 4.0; the Variable Infiltration Capacity model, version 4.1.2; and the Precipitation–Runoff Modeling System, version 3.0.4) over the contiguous United States (CONUS). Biases of hydrologic simulations forced by statistically downscaled climate data relative to the simulation with observation-based gridded data are presented. Each statistical downscaling method produces different meteorological portrayals including precipitation amount, wet-day frequency, and the energy input (i.e., shortwave radiation), and their interplay affects estimations of precipitation partitioning between evapotranspiration and runoff, extreme runoff, and hydrologic states (i.e., snow and soil moisture). The analyses show that BCCA underestimates annual precipitation by as much as −250 mm, leading to unreasonable hydrologic portrayals over the CONUS for all models. Although the other three statistical downscaling methods produce a comparable precipitation bias ranging from −10 to 8 mm across the CONUS, BCSDd severely overestimates the wet-day fraction by up to 0.25, leading to different precipitation partitioning compared to the simulations with other downscaled data. Overall, the choice of downscaling method contributes to less spread in runoff estimates (by a factor of 1.5–3) than the choice of hydrologic model with use of the default parameters if BCCA is excluded.
Use of microcomputers for planning and managing silviculture habitat relationships.

Treesearch

B.G. Marcot; R.S. McNay; R.E. Page

1988-01-01

Microcomputers aid in monitoring, modeling, and decision support for integrating objectives of silviculture and wildlife habitat management. Spreadsheets, data bases, statistics, and graphics programs are described for use in monitoring. Stand growth models, modeling languages, area and geobased information systems, and optimization models are discussed for use in...
A computational visual saliency model based on statistics and machine learning.

PubMed

Lin, Ru-Je; Lin, Wei-Song

2014-08-01

Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.
Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

PubMed

Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

2017-01-01

Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.

Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

PubMed

Dai, Qi; Yang, Yanchun; Wang, Tianming

2008-10-15

Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
Statistical method evaluation for differentially methylated CpGs in base resolution next-generation DNA sequencing data.

PubMed

Zhang, Yun; Baheti, Saurabh; Sun, Zhifu

2018-05-01

High-throughput bisulfite methylation sequencing such as reduced representation bisulfite sequencing (RRBS), Agilent SureSelect Human Methyl-Seq (Methyl-seq) or whole-genome bisulfite sequencing is commonly used for base resolution methylome research. These data are represented either by the ratio of methylated cytosine versus total coverage at a CpG site or numbers of methylated and unmethylated cytosines. Multiple statistical methods can be used to detect differentially methylated CpGs (DMCs) between conditions, and these methods are often the base for the next step of differentially methylated region identification. The ratio data have a flexibility of fitting to many linear models, but the raw count data take consideration of coverage information. There is an array of options in each datatype for DMC detection; however, it is not clear which is an optimal statistical method. In this study, we systematically evaluated four statistic methods on methylation ratio data and four methods on count-based data and compared their performances with regard to type I error control, sensitivity and specificity of DMC detection and computational resource demands using real RRBS data along with simulation. Our results show that the ratio-based tests are generally more conservative (less sensitive) than the count-based tests. However, some count-based methods have high false-positive rates and should be avoided. The beta-binomial model gives a good balance between sensitivity and specificity and is preferred method. Selection of methods in different settings, signal versus noise and sample size estimation are also discussed.
A stochastic fractional dynamics model of space-time variability of rain

NASA Astrophysics Data System (ADS)

Kundu, Prasun K.; Travis, James E.

2013-09-01

varies in space and time in a highly irregular manner and is described naturally in terms of a stochastic process. A characteristic feature of rainfall statistics is that they depend strongly on the space-time scales over which rain data are averaged. A spectral model of precipitation has been developed based on a stochastic differential equation of fractional order for the point rain rate, which allows a concise description of the second moment statistics of rain at any prescribed space-time averaging scale. The model is thus capable of providing a unified description of the statistics of both radar and rain gauge data. The underlying dynamical equation can be expressed in terms of space-time derivatives of fractional orders that are adjusted together with other model parameters to fit the data. The form of the resulting spectrum gives the model adequate flexibility to capture the subtle interplay between the spatial and temporal scales of variability of rain but strongly constrains the predicted statistical behavior as a function of the averaging length and time scales. We test the model with radar and gauge data collected contemporaneously at the NASA TRMM ground validation sites located near Melbourne, Florida and on the Kwajalein Atoll, Marshall Islands in the tropical Pacific. We estimate the parameters by tuning them to fit the second moment statistics of radar data at the smaller spatiotemporal scales. The model predictions are then found to fit the second moment statistics of the gauge data reasonably well at these scales without any further adjustment.
Modeling of Dissipation Element Statistics in Turbulent Non-Premixed Jet Flames

NASA Astrophysics Data System (ADS)

Denker, Dominik; Attili, Antonio; Boschung, Jonas; Hennig, Fabian; Pitsch, Heinz

2017-11-01

The dissipation element (DE) analysis is a method for analyzing and compartmentalizing turbulent scalar fields. DEs can be described by two parameters, namely the Euclidean distance l between their extremal points and the scalar difference in the respective points Δϕ . The joint probability density function (jPDF) of these two parameters P(Δϕ , l) is expected to suffice for a statistical reconstruction of the scalar field. In addition, reacting scalars show a strong correlation with these DE parameters in both premixed and non-premixed flames. Normalized DE statistics show a remarkable invariance towards changes in Reynolds numbers. This feature of DE statistics was exploited in a Boltzmann-type evolution equation based model for the probability density function (PDF) of the distance between the extremal points P(l) in isotropic turbulence. Later, this model was extended for the jPDF P(Δϕ , l) and then adapted for the use in free shear flows. The effect of heat release on the scalar scales and DE statistics is investigated and an extended model for non-premixed jet flames is introduced, which accounts for the presence of chemical reactions. This new model is validated against a series of DNS of temporally evolving jet flames. European Research Council Project ``Milestone''.
Statistical aspects of modeling the labor curve.

PubMed

Zhang, Jun; Troendle, James; Grantz, Katherine L; Reddy, Uma M

2015-06-01

In a recent review by Cohen and Friedman, several statistical questions on modeling labor curves were raised. This article illustrates that asking data to fit a preconceived model or letting a sufficiently flexible model fit observed data is the main difference in principles of statistical modeling between the original Friedman curve and our average labor curve. An evidence-based approach to construct a labor curve and establish normal values should allow the statistical model to fit observed data. In addition, the presence of the deceleration phase in the active phase of an average labor curve was questioned. Forcing a deceleration phase to be part of the labor curve may have artificially raised the speed of progression in the active phase with a particularly large impact on earlier labor between 4 and 6 cm. Finally, any labor curve is illustrative and may not be instructive in managing labor because of variations in individual labor pattern and large errors in measuring cervical dilation. With the tools commonly available, it may be more productive to establish a new partogram that takes the physiology of labor and contemporary obstetric population into account. Copyright © 2015 Elsevier Inc. All rights reserved.
Referenceless perceptual fog density prediction model

NASA Astrophysics Data System (ADS)

Choi, Lark Kwon; You, Jaehee; Bovik, Alan C.

2014-02-01

We propose a perceptual fog density prediction model based on natural scene statistics (NSS) and "fog aware" statistical features, which can predict the visibility in a foggy scene from a single image without reference to a corresponding fogless image, without side geographical camera information, without training on human-rated judgments, and without dependency on salient objects such as lane markings or traffic signs. The proposed fog density predictor only makes use of measurable deviations from statistical regularities observed in natural foggy and fog-free images. A fog aware collection of statistical features is derived from a corpus of foggy and fog-free images by using a space domain NSS model and observed characteristics of foggy images such as low contrast, faint color, and shifted intensity. The proposed model not only predicts perceptual fog density for the entire image but also provides a local fog density index for each patch. The predicted fog density of the model correlates well with the measured visibility in a foggy scene as measured by judgments taken in a human subjective study on a large foggy image database. As one application, the proposed model accurately evaluates the performance of defog algorithms designed to enhance the visibility of foggy images.
Climate and dengue transmission: evidence and implications.

PubMed

Morin, Cory W; Comrie, Andrew C; Ernst, Kacey

2013-01-01

Climate influences dengue ecology by affecting vector dynamics, agent development, and mosquito/human interactions. Although these relationships are known, the impact climate change will have on transmission is unclear. Climate-driven statistical and process-based models are being used to refine our knowledge of these relationships and predict the effects of projected climate change on dengue fever occurrence, but results have been inconsistent. We sought to identify major climatic influences on dengue virus ecology and to evaluate the ability of climate-based dengue models to describe associations between climate and dengue, simulate outbreaks, and project the impacts of climate change. We reviewed the evidence for direct and indirect relationships between climate and dengue generated from laboratory studies, field studies, and statistical analyses of associations between vectors, dengue fever incidence, and climate conditions. We assessed the potential contribution of climate-driven, process-based dengue models and provide suggestions to improve their performance. Relationships between climate variables and factors that influence dengue transmission are complex. A climate variable may increase dengue transmission potential through one aspect of the system while simultaneously decreasing transmission potential through another. This complexity may at least partly explain inconsistencies in statistical associations between dengue and climate. Process-based models can account for the complex dynamics but often omit important aspects of dengue ecology, notably virus development and host-species interactions. Synthesizing and applying current knowledge of climatic effects on all aspects of dengue virus ecology will help direct future research and enable better projections of climate change effects on dengue incidence.
A critique of Rasch residual fit statistics.

PubMed

Karabatsos, G

2000-01-01

In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
Coupled local facilitation and global hydrologic inhibition drive landscape geometry in a patterned peatland

NASA Astrophysics Data System (ADS)

Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.

2015-05-01

Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing-canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
Coupled local facilitation and global hydrologic inhibition drive landscape geometry in a patterned peatland

NASA Astrophysics Data System (ADS)

Acharya, S.; Kaplan, D. A.; Casey, S.; Cohen, M. J.; Jawitz, J. W.

2015-01-01

Self-organized landscape patterning can arise in response to multiple processes. Discriminating among alternative patterning mechanisms, particularly where experimental manipulations are untenable, requires process-based models. Previous modeling studies have attributed patterning in the Everglades (Florida, USA) to sediment redistribution and anisotropic soil hydraulic properties. In this work, we tested an alternate theory, the self-organizing canal (SOC) hypothesis, by developing a cellular automata model that simulates pattern evolution via local positive feedbacks (i.e., facilitation) coupled with a global negative feedback based on hydrology. The model is forced by global hydroperiod that drives stochastic transitions between two patch types: ridge (higher elevation) and slough (lower elevation). We evaluated model performance using multiple criteria based on six statistical and geostatistical properties observed in reference portions of the Everglades landscape: patch density, patch anisotropy, semivariogram ranges, power-law scaling of ridge areas, perimeter area fractal dimension, and characteristic pattern wavelength. Model results showed strong statistical agreement with reference landscapes, but only when anisotropically acting local facilitation was coupled with hydrologic global feedback, for which several plausible mechanisms exist. Critically, the model correctly generated fractal landscapes that had no characteristic pattern wavelength, supporting the invocation of global rather than scale-specific negative feedbacks.
Helicity statistics in homogeneous and isotropic turbulence and turbulence models

NASA Astrophysics Data System (ADS)

Sahoo, Ganapati; De Pietro, Massimo; Biferale, Luca

2017-02-01

We study the statistical properties of helicity in direct numerical simulations of fully developed homogeneous and isotropic turbulence and in a class of turbulence shell models. We consider correlation functions based on combinations of vorticity and velocity increments that are not invariant under mirror symmetry. We also study the scaling properties of high-order structure functions based on the moments of the velocity increments projected on a subset of modes with either positive or negative helicity (chirality). We show that mirror symmetry is recovered at small scales, i.e., chiral terms are subleading and they are well captured by a dimensional argument plus anomalous corrections. These findings are also supported by a high Reynolds numbers study of helical shell models with the same chiral symmetry of Navier-Stokes equations.
Statistical Use of Argonaute Expression and RISC Assembly in microRNA Target Identification

PubMed Central

Stanhope, Stephen A.; Sengupta, Srikumar; den Boon, Johan; Ahlquist, Paul; Newton, Michael A.

2009-01-01

MicroRNAs (miRNAs) posttranscriptionally regulate targeted messenger RNAs (mRNAs) by inducing cleavage or otherwise repressing their translation. We address the problem of detecting m/miRNA targeting relationships in homo sapiens from microarray data by developing statistical models that are motivated by the biological mechanisms used by miRNAs. The focus of our modeling is the construction, activity, and mediation of RNA-induced silencing complexes (RISCs) competent for targeted mRNA cleavage. We demonstrate that regression models accommodating RISC abundance and controlling for other mediating factors fit the expression profiles of known target pairs substantially better than models based on m/miRNA expressions alone, and lead to verifications of computational target pair predictions that are more sensitive than those based on marginal expression levels. Because our models are fully independent of exogenous results from sequence-based computational methods, they are appropriate for use as either a primary or secondary source of information regarding m/miRNA target pair relationships, especially in conjunction with high-throughput expression studies. PMID:19779550
A model for generating Surface EMG signal of m. Tibialis Anterior.

PubMed

Siddiqi, Ariba; Kumar, Dinesh; Arjunan, Sridhar P

2014-01-01

A model that simulates surface electromyogram (sEMG) signal of m. Tibialis Anterior has been developed and tested. This has a firing rate equation that is based on experimental findings. It also has a recruitment threshold that is based on observed statistical distribution. Importantly, it has considered both, slow and fast type which has been distinguished based on their conduction velocity. This model has assumed that the deeper unipennate half of the muscle does not contribute significantly to the potential induced on the surface of the muscle and has approximated the muscle to have parallel structure. The model was validated by comparing the simulated and the experimental sEMG signal recordings. Experiments were conducted on eight subjects who performed isometric dorsiflexion at 10, 20, 30, 50, 75, and 100% maximal voluntary contraction. Normalized root mean square and median frequency of the experimental and simulated EMG signal were computed and the slopes of the linearity with the force were statistically analyzed. The gradients were found to be similar (p>0.05) for both experimental and simulated sEMG signal, validating the proposed model.
Hybrid statistics-simulations based method for atom-counting from ADF STEM images.

PubMed

De Wael, Annelies; De Backer, Annick; Jones, Lewys; Nellist, Peter D; Van Aert, Sandra

2017-06-01

A hybrid statistics-simulations based method for atom-counting from annular dark field scanning transmission electron microscopy (ADF STEM) images of monotype crystalline nanostructures is presented. Different atom-counting methods already exist for model-like systems. However, the increasing relevance of radiation damage in the study of nanostructures demands a method that allows atom-counting from low dose images with a low signal-to-noise ratio. Therefore, the hybrid method directly includes prior knowledge from image simulations into the existing statistics-based method for atom-counting, and accounts in this manner for possible discrepancies between actual and simulated experimental conditions. It is shown by means of simulations and experiments that this hybrid method outperforms the statistics-based method, especially for low electron doses and small nanoparticles. The analysis of a simulated low dose image of a small nanoparticle suggests that this method allows for far more reliable quantitative analysis of beam-sensitive materials. Copyright © 2017 Elsevier B.V. All rights reserved.
Nonlinear Curve-Fitting Program

NASA Technical Reports Server (NTRS)

Everhart, Joel L.; Badavi, Forooz F.

1989-01-01

Nonlinear optimization algorithm helps in finding best-fit curve. Nonlinear Curve Fitting Program, NLINEAR, interactive curve-fitting routine based on description of quadratic expansion of X(sup 2) statistic. Utilizes nonlinear optimization algorithm calculating best statistically weighted values of parameters of fitting function and X(sup 2) minimized. Provides user with such statistical information as goodness of fit and estimated values of parameters producing highest degree of correlation between experimental data and mathematical model. Written in FORTRAN 77.
Clinical study of the Erlanger silver catheter--data management and biometry.

PubMed

Martus, P; Geis, C; Lugauer, S; Böswald, M; Guggenbichler, J P

1999-01-01

The clinical evaluation of venous catheters for catheter-induced infections must conform to a strict biometric methodology. The statistical planning of the study (target population, design, degree of blinding), data management (database design, definition of variables, coding), quality assurance (data inspection at several levels) and the biometric evaluation of the Erlanger silver catheter project are described. The three-step data flow included: 1) primary data from the hospital, 2) relational database, 3) files accessible for statistical evaluation. Two different statistical models were compared: analyzing the first catheter only of a patient in the analysis (independent data) and analyzing several catheters from the same patient (dependent data) by means of the generalized estimating equations (GEE) method. The main result of the study was based on the comparison of both statistical models.
Fiber Breakage Model for Carbon Composite Stress Rupture Phenomenon: Theoretical Development and Applications

NASA Technical Reports Server (NTRS)

Murthy, Pappu L. N.; Phoenix, S. Leigh; Grimes-Ledesma, Lorie

2010-01-01

Stress rupture failure of Carbon Composite Overwrapped Pressure Vessels (COPVs) is of serious concern to Science Mission and Constellation programs since there are a number of COPVs on board space vehicles with stored gases under high pressure for long durations of time. It has become customary to establish the reliability of these vessels using the so called classic models. The classical models are based on Weibull statistics fitted to observed stress rupture data. These stochastic models cannot account for any additional damage due to the complex pressure-time histories characteristic of COPVs being supplied for NASA missions. In particular, it is suspected that the effects of proof test could significantly reduce the stress rupture lifetime of COPVs. The focus of this paper is to present an analytical appraisal of a model that incorporates damage due to proof test. The model examined in the current paper is based on physical mechanisms such as micromechanics based load sharing concepts coupled with creep rupture and Weibull statistics. For example, the classic model cannot accommodate for damage due to proof testing which every flight vessel undergoes. The paper compares current model to the classic model with a number of examples. In addition, several applications of the model to current ISS and Constellation program issues are also examined.
Modeling and prediction of peptide drift times in ion mobility spectrometry using sequence-based and structure-based approaches.

PubMed

Zhang, Yiming; Jin, Quan; Wang, Shuting; Ren, Ren

2011-05-01

The mobile behavior of 1481 peptides in ion mobility spectrometry (IMS), which are generated by protease digestion of the Drosophila melanogaster proteome, is modeled and predicted based on two different types of characterization methods, i.e. sequence-based approach and structure-based approach. In this procedure, the sequence-based approach considers both the amino acid composition of a peptide and the local environment profile of each amino acid in the peptide; the structure-based approach is performed with the CODESSA protocol, which regards a peptide as a common organic compound and generates more than 200 statistically significant variables to characterize the whole structure profile of a peptide molecule. Subsequently, the nonlinear support vector machine (SVM) and Gaussian process (GP) as well as linear partial least squares (PLS) regression is employed to correlate the structural parameters of the characterizations with the IMS drift times of these peptides. The obtained quantitative structure-spectrum relationship (QSSR) models are evaluated rigorously and investigated systematically via both one-deep and two-deep cross-validations as well as the rigorous Monte Carlo cross-validation (MCCV). We also give a comprehensive comparison on the resulting statistics arising from the different combinations of variable types with modeling methods and find that the sequence-based approach can give the QSSR models with better fitting ability and predictive power but worse interpretability than the structure-based approach. In addition, though the QSSR modeling using sequence-based approach is not needed for the preparation of the minimization structures of peptides before the modeling, it would be considerably efficient as compared to that using structure-based approach. Copyright © 2011 Elsevier Ltd. All rights reserved.
An R package for analyzing and modeling ranking data

PubMed Central

2013-01-01

Background In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Results Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance. Conclusions In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations. PMID:23672645
An R package for analyzing and modeling ranking data.

PubMed

Lee, Paul H; Yu, Philip L H

2013-05-14

In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty's and Koczkodaj's inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis. Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians' preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as "internal/external"), and the second dimension can be interpreted as their overall variance of (labeled as "push/pull factors"). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman's footrule distance. In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.