Sample records for combining statistical modelling

  1. Comparing and combining process-based crop models and statistical models with some implications for climate change

    NASA Astrophysics Data System (ADS)

    Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram

    2017-09-01

    We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.

  2. Texture and haptic cues in slant discrimination: reliability-based cue weighting without statistically optimal cue combination

    NASA Astrophysics Data System (ADS)

    Rosas, Pedro; Wagemans, Johan; Ernst, Marc O.; Wichmann, Felix A.

    2005-05-01

    A number of models of depth-cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum-variance unbiased estimator that can be constructed from the available information. Here we test such models by using visual and haptic depth information. Different texture types produce differences in slant-discrimination performance, thus providing a means for testing a reliability-sensitive cue-combination model with texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability but fell short of statistically optimal combination - we find reliability-based reweighting but not statistically optimal cue combination.

  3. RooStatsCms: A tool for analysis modelling, combination and statistical studies

    NASA Astrophysics Data System (ADS)

    Piparo, D.; Schott, G.; Quast, G.

    2010-04-01

    RooStatsCms is an object oriented statistical framework based on the RooFit technology. Its scope is to allow the modelling, statistical analysis and combination of multiple search channels for new phenomena in High Energy Physics. It provides a variety of methods described in literature implemented as classes, whose design is oriented to the execution of multiple CPU intensive jobs on batch systems or on the Grid.

  4. Future missions studies: Combining Schatten's solar activity prediction model with a chaotic prediction model

    NASA Technical Reports Server (NTRS)

    Ashrafi, S.

    1991-01-01

    K. Schatten (1991) recently developed a method for combining his prediction model with our chaotic model. The philosophy behind this combined model and his method of combination is explained. Because the Schatten solar prediction model (KS) uses a dynamo to mimic solar dynamics, accurate prediction is limited to long-term solar behavior (10 to 20 years). The Chaotic prediction model (SA) uses the recently developed techniques of nonlinear dynamics to predict solar activity. It can be used to predict activity only up to the horizon. In theory, the chaotic prediction should be several orders of magnitude better than statistical predictions up to that horizon; beyond the horizon, chaotic predictions would theoretically be just as good as statistical predictions. Therefore, chaos theory puts a fundamental limit on predictability.

  5. Robust Combining of Disparate Classifiers Through Order Statistics

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep

    2001-01-01

    Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.

  6. Combination of statistical and physically based methods to assess shallow slide susceptibility at the basin scale

    NASA Astrophysics Data System (ADS)

    Oliveira, Sérgio C.; Zêzere, José L.; Lajas, Sara; Melo, Raquel

    2017-07-01

    Approaches used to assess shallow slide susceptibility at the basin scale are conceptually different depending on the use of statistical or physically based methods. The former are based on the assumption that the same causes are more likely to produce the same effects, whereas the latter are based on the comparison between forces which tend to promote movement along the slope and the counteracting forces that are resistant to motion. Within this general framework, this work tests two hypotheses: (i) although conceptually and methodologically distinct, the statistical and deterministic methods generate similar shallow slide susceptibility results regarding the model's predictive capacity and spatial agreement; and (ii) the combination of shallow slide susceptibility maps obtained with statistical and physically based methods, for the same study area, generate a more reliable susceptibility model for shallow slide occurrence. These hypotheses were tested at a small test site (13.9 km2) located north of Lisbon (Portugal), using a statistical method (the information value method, IV) and a physically based method (the infinite slope method, IS). The landslide susceptibility maps produced with the statistical and deterministic methods were combined into a new landslide susceptibility map. The latter was based on a set of integration rules defined by the cross tabulation of the susceptibility classes of both maps and analysis of the corresponding contingency tables. The results demonstrate a higher predictive capacity of the new shallow slide susceptibility map, which combines the independent results obtained with statistical and physically based models. Moreover, the combination of the two models allowed the identification of areas where the results of the information value and the infinite slope methods are contradictory. Thus, these areas were classified as uncertain and deserve additional investigation at a more detailed scale.

  7. A General Model for Estimating and Correcting the Effects of Nonindependence in Meta-Analysis.

    ERIC Educational Resources Information Center

    Strube, Michael J.

    A general model is described which can be used to represent the four common types of meta-analysis: (1) estimation of effect size by combining study outcomes; (2) estimation of effect size by contrasting study outcomes; (3) estimation of statistical significance by combining study outcomes; and (4) estimation of statistical significance by…

  8. Modeling Ka-band low elevation angle propagation statistics

    NASA Technical Reports Server (NTRS)

    Russell, Thomas A.; Weinfield, John; Pearson, Chris; Ippolito, Louis J.

    1995-01-01

    The statistical variability of the secondary atmospheric propagation effects on satellite communications cannot be ignored at frequencies of 20 GHz or higher, particularly if the propagation margin allocation is such that link availability falls below 99 percent. The secondary effects considered in this paper are gaseous absorption, cloud absorption, and tropospheric scintillation; rain attenuation is the primary effect. Techniques and example results are presented for estimation of the overall combined impact of the atmosphere on satellite communications reliability. Statistical methods are employed throughout and the most widely accepted models for the individual effects are used wherever possible. The degree of correlation between the effects is addressed and some bounds on the expected variability in the combined effects statistics are derived from the expected variability in correlation. Example estimates are presented of combined effects statistics in the Washington D.C. area of 20 GHz and 5 deg elevation angle. The statistics of water vapor are shown to be sufficient for estimation of the statistics of gaseous absorption at 20 GHz. A computer model based on monthly surface weather is described and tested. Significant improvement in prediction of absorption extremes is demonstrated with the use of path weather data instead of surface data.

  9. An Update on Statistical Boosting in Biomedicine.

    PubMed

    Mayr, Andreas; Hofner, Benjamin; Waldmann, Elisabeth; Hepp, Tobias; Meyer, Sebastian; Gefeller, Olaf

    2017-01-01

    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

  10. On temporal stochastic modeling of precipitation, nesting models across scales

    NASA Astrophysics Data System (ADS)

    Paschalis, Athanasios; Molnar, Peter; Fatichi, Simone; Burlando, Paolo

    2014-01-01

    We analyze the performance of composite stochastic models of temporal precipitation which can satisfactorily reproduce precipitation properties across a wide range of temporal scales. The rationale is that a combination of stochastic precipitation models which are most appropriate for specific limited temporal scales leads to better overall performance across a wider range of scales than single models alone. We investigate different model combinations. For the coarse (daily) scale these are models based on Alternating renewal processes, Markov chains, and Poisson cluster models, which are then combined with a microcanonical Multiplicative Random Cascade model to disaggregate precipitation to finer (minute) scales. The composite models were tested on data at four sites in different climates. The results show that model combinations improve the performance in key statistics such as probability distributions of precipitation depth, autocorrelation structure, intermittency, reproduction of extremes, compared to single models. At the same time they remain reasonably parsimonious. No model combination was found to outperform the others at all sites and for all statistics, however we provide insight on the capabilities of specific model combinations. The results for the four different climates are similar, which suggests a degree of generality and wider applicability of the approach.

  11. Statistical models of lunar rocks and regolith

    NASA Technical Reports Server (NTRS)

    Marcus, A. H.

    1973-01-01

    The mathematical, statistical, and computational approaches used in the investigation of the interrelationship of lunar fragmental material, regolith, lunar rocks, and lunar craters are described. The first two phases of the work explored the sensitivity of the production model of fragmental material to mathematical assumptions, and then completed earlier studies on the survival of lunar surface rocks with respect to competing processes. The third phase combined earlier work into a detailed statistical analysis and probabilistic model of regolith formation by lithologically distinct layers, interpreted as modified crater ejecta blankets. The fourth phase of the work dealt with problems encountered in combining the results of the entire project into a comprehensive, multipurpose computer simulation model for the craters and regolith. Highlights of each phase of research are given.

  12. Computational Analysis for Rocket-Based Combined-Cycle Systems During Rocket-Only Operation

    NASA Technical Reports Server (NTRS)

    Steffen, C. J., Jr.; Smith, T. D.; Yungster, S.; Keller, D. J.

    2000-01-01

    A series of Reynolds-averaged Navier-Stokes calculations were employed to study the performance of rocket-based combined-cycle systems operating in an all-rocket mode. This parametric series of calculations were executed within a statistical framework, commonly known as design of experiments. The parametric design space included four geometric and two flowfield variables set at three levels each, for a total of 729 possible combinations. A D-optimal design strategy was selected. It required that only 36 separate computational fluid dynamics (CFD) solutions be performed to develop a full response surface model, which quantified the linear, bilinear, and curvilinear effects of the six experimental variables. The axisymmetric, Reynolds-averaged Navier-Stokes simulations were executed with the NPARC v3.0 code. The response used in the statistical analysis was created from Isp efficiency data integrated from the 36 CFD simulations. The influence of turbulence modeling was analyzed by using both one- and two-equation models. Careful attention was also given to quantify the influence of mesh dependence, iterative convergence, and artificial viscosity upon the resulting statistical model. Thirteen statistically significant effects were observed to have an influence on rocket-based combined-cycle nozzle performance. It was apparent that the free-expansion process, directly downstream of the rocket nozzle, can influence the Isp efficiency. Numerical schlieren images and particle traces have been used to further understand the physical phenomena behind several of the statistically significant results.

  13. Variational stereo imaging of oceanic waves with statistical constraints.

    PubMed

    Gallego, Guillermo; Yezzi, Anthony; Fedele, Francesco; Benetazzo, Alvise

    2013-11-01

    An image processing observational technique for the stereoscopic reconstruction of the waveform of oceanic sea states is developed. The technique incorporates the enforcement of any given statistical wave law modeling the quasi-Gaussianity of oceanic waves observed in nature. The problem is posed in a variational optimization framework, where the desired waveform is obtained as the minimizer of a cost functional that combines image observations, smoothness priors and a weak statistical constraint. The minimizer is obtained by combining gradient descent and multigrid methods on the necessary optimality equations of the cost functional. Robust photometric error criteria and a spatial intensity compensation model are also developed to improve the performance of the presented image matching strategy. The weak statistical constraint is thoroughly evaluated in combination with other elements presented to reconstruct and enforce constraints on experimental stereo data, demonstrating the improvement in the estimation of the observed ocean surface.

  14. Image statistics underlying natural texture selectivity of neurons in macaque V4

    PubMed Central

    Okazawa, Gouki; Tajima, Satohiro; Komatsu, Hidehiko

    2015-01-01

    Our daily visual experiences are inevitably linked to recognizing the rich variety of textures. However, how the brain encodes and differentiates a plethora of natural textures remains poorly understood. Here, we show that many neurons in macaque V4 selectively encode sparse combinations of higher-order image statistics to represent natural textures. We systematically explored neural selectivity in a high-dimensional texture space by combining texture synthesis and efficient-sampling techniques. This yielded parameterized models for individual texture-selective neurons. The models provided parsimonious but powerful predictors for each neuron’s preferred textures using a sparse combination of image statistics. As a whole population, the neuronal tuning was distributed in a way suitable for categorizing textures and quantitatively predicts human ability to discriminate textures. Together, we suggest that the collective representation of visual image statistics in V4 plays a key role in organizing the natural texture perception. PMID:25535362

  15. The Development of the Children's Services Statistical Neighbour Benchmarking Model. Final Report

    ERIC Educational Resources Information Center

    Benton, Tom; Chamberlain, Tamsin; Wilson, Rebekah; Teeman, David

    2007-01-01

    In April 2006, the Department for Education and Skills (DfES) commissioned the National Foundation for Educational Research (NFER) to conduct an independent external review in order to develop a single "statistical neighbour" model. This single model aimed to combine the key elements of the different models currently available and be…

  16. Combining Statistics and Physics to Improve Climate Downscaling

    NASA Astrophysics Data System (ADS)

    Gutmann, E. D.; Eidhammer, T.; Arnold, J.; Nowak, K.; Clark, M. P.

    2017-12-01

    Getting useful information from climate models is an ongoing problem that has plagued climate science and hydrologic prediction for decades. While it is possible to develop statistical corrections for climate models that mimic current climate almost perfectly, this does not necessarily guarantee that future changes are portrayed correctly. In contrast, convection permitting regional climate models (RCMs) have begun to provide an excellent representation of the regional climate system purely from first principles, providing greater confidence in their change signal. However, the computational cost of such RCMs prohibits the generation of ensembles of simulations or long time periods, thus limiting their applicability for hydrologic applications. Here we discuss a new approach combining statistical corrections with physical relationships for a modest computational cost. We have developed the Intermediate Complexity Atmospheric Research model (ICAR) to provide a climate and weather downscaling option that is based primarily on physics for a fraction of the computational requirements of a traditional regional climate model. ICAR also enables the incorporation of statistical adjustments directly within the model. We demonstrate that applying even simple corrections to precipitation while the model is running can improve the simulation of land atmosphere feedbacks in ICAR. For example, by incorporating statistical corrections earlier in the modeling chain, we permit the model physics to better represent the effect of mountain snowpack on air temperature changes.

  17. A High Precision Prediction Model Using Hybrid Grey Dynamic Model

    ERIC Educational Resources Information Center

    Li, Guo-Dong; Yamaguchi, Daisuke; Nagai, Masatake; Masuda, Shiro

    2008-01-01

    In this paper, we propose a new prediction analysis model which combines the first order one variable Grey differential equation Model (abbreviated as GM(1,1) model) from grey system theory and time series Autoregressive Integrated Moving Average (ARIMA) model from statistics theory. We abbreviate the combined GM(1,1) ARIMA model as ARGM(1,1)…

  18. A statistical model of operational impacts on the framework of the bridge crane

    NASA Astrophysics Data System (ADS)

    Antsev, V. Yu; Tolokonnikov, A. S.; Gorynin, A. D.; Reutov, A. A.

    2017-02-01

    The technical regulations of the Customs Union demands implementation of the risk analysis of the bridge cranes operation at their design stage. The statistical model has been developed for performance of random calculations of risks, allowing us to model possible operational influences on the bridge crane metal structure in their various combination. The statistical model is practically actualized in the software product automated calculation of risks of failure occurrence of bridge cranes.

  19. Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

    DTIC Science & Technology

    2015-07-15

    Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors

  20. Survey of statistical techniques used in validation studies of air pollution prediction models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bornstein, R D; Anderson, S F

    1979-03-01

    Statistical techniques used by meteorologists to validate predictions made by air pollution models are surveyed. Techniques are divided into the following three groups: graphical, tabular, and summary statistics. Some of the practical problems associated with verification are also discussed. Characteristics desired in any validation program are listed and a suggested combination of techniques that possesses many of these characteristics is presented.

  1. The Use of a Context-Based Information Retrieval Technique

    DTIC Science & Technology

    2009-07-01

    provided in context. Latent Semantic Analysis (LSA) is a statistical technique for inferring contextual and structural information, and previous studies...WAIS). 10 DSTO-TR-2322 1.4.4 Latent Semantic Analysis LSA, which is also known as latent semantic indexing (LSI), uses a statistical and...1.4.6 Language Models In contrast, natural language models apply algorithms that combine statistical information with semantic information. Semantic

  2. Damages detection in cylindrical metallic specimens by means of statistical baseline models and updated daily temperature profiles

    NASA Astrophysics Data System (ADS)

    Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo

    2017-05-01

    In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.

  3. Development of the AFRL Aircrew Perfomance and Protection Data Bank

    DTIC Science & Technology

    2007-12-01

    Growth model and statistical model of hypobaric chamber simulations. It offers a quick and readily accessible online DCS risk assessment tool for...are used for the DCS prediction instead of the original model. ADRAC is based on more than 20 years of hypobaric chamber studies using human...prediction based on the combined Bubble Growth model and statistical model of hypobaric chamber simulations was integrated into the Data Bank. It

  4. Bureau of Labor Statistics Employment Projections: Detailed Analysis of Selected Occupations and Industries. Report to the Honorable Berkley Bedell, United States House of Representatives.

    ERIC Educational Resources Information Center

    General Accounting Office, Washington, DC.

    To compile its projections of future employment levels, the Bureau of Labor Statistics (BLS) combines the following five interlinked models in a six-step process: a labor force model, an econometric model of the U.S. economy, an industry activity model, an industry labor demand model, and an occupational labor demand model. The BLS was asked to…

  5. The l z ( p ) * Person-Fit Statistic in an Unfolding Model Context.

    PubMed

    Tendeiro, Jorge N

    2017-01-01

    Although person-fit analysis has a long-standing tradition within item response theory, it has been applied in combination with dominance response models almost exclusively. In this article, a popular log likelihood-based parametric person-fit statistic under the framework of the generalized graded unfolding model is used. Results from a simulation study indicate that the person-fit statistic performed relatively well in detecting midpoint response style patterns and not so well in detecting extreme response style patterns.

  6. [Applications of the hospital statistics management system].

    PubMed

    Zhai, Hong; Ren, Yong; Liu, Jing; Li, You-Zhang; Ma, Xiao-Long; Jiao, Tao-Tao

    2008-01-01

    The Hospital Statistics Management System is built on an Office Automation Platform of Shandong provincial hospital system. Its workflow, role and popedom technologies are used to standardize and optimize the management program of statistics in the total quality control of hospital statistics. The system's applications have combined the office automation platform with the statistics management in a hospital and this provides a practical example of a modern hospital statistics management model.

  7. Whole vertebral bone segmentation method with a statistical intensity-shape model based approach

    NASA Astrophysics Data System (ADS)

    Hanaoka, Shouhei; Fritscher, Karl; Schuler, Benedikt; Masutani, Yoshitaka; Hayashi, Naoto; Ohtomo, Kuni; Schubert, Rainer

    2011-03-01

    An automatic segmentation algorithm for the vertebrae in human body CT images is presented. Especially we focused on constructing and utilizing 4 different statistical intensity-shape combined models for the cervical, upper / lower thoracic and lumbar vertebrae, respectively. For this purpose, two previously reported methods were combined: a deformable model-based initial segmentation method and a statistical shape-intensity model-based precise segmentation method. The former is used as a pre-processing to detect the position and orientation of each vertebra, which determines the initial condition for the latter precise segmentation method. The precise segmentation method needs prior knowledge on both the intensities and the shapes of the objects. After PCA analysis of such shape-intensity expressions obtained from training image sets, vertebrae were parametrically modeled as a linear combination of the principal component vectors. The segmentation of each target vertebra was performed as fitting of this parametric model to the target image by maximum a posteriori estimation, combined with the geodesic active contour method. In the experimental result by using 10 cases, the initial segmentation was successful in 6 cases and only partially failed in 4 cases (2 in the cervical area and 2 in the lumbo-sacral). In the precise segmentation, the mean error distances were 2.078, 1.416, 0.777, 0.939 mm for cervical, upper and lower thoracic, lumbar spines, respectively. In conclusion, our automatic segmentation algorithm for the vertebrae in human body CT images showed a fair performance for cervical, thoracic and lumbar vertebrae.

  8. Model Performance Evaluation and Scenario Analysis (MPESA) Tutorial

    EPA Pesticide Factsheets

    The model performance evaluation consists of metrics and model diagnostics. These metrics provides modelers with statistical goodness-of-fit measures that capture magnitude only, sequence only, and combined magnitude and sequence errors.

  9. Nonelastic nuclear reactions and accompanying gamma radiation

    NASA Technical Reports Server (NTRS)

    Snow, R.; Rosner, H. R.; George, M. C.; Hayes, J. D.

    1971-01-01

    Several aspects of nonelastic nuclear reactions which proceed through the formation of a compound nucleus are dealt with. The full statistical model and the partial statistical model are described and computer programs based on these models are presented along with operating instructions and input and output for sample problems. A theoretical development of the expression for the reaction cross section for the hybrid case which involves a combination of the continuum aspects of the full statistical model with the discrete level aspects of the partial statistical model is presented. Cross sections for level excitation and gamma production by neutron inelastic scattering from the nuclei Al-27, Fe-56, Si-28, and Pb-208 are calculated and compared with avaliable experimental data.

  10. A cloud and radiation model-based algorithm for rainfall retrieval from SSM/I multispectral microwave measurements

    NASA Technical Reports Server (NTRS)

    Xiang, Xuwu; Smith, Eric A.; Tripoli, Gregory J.

    1992-01-01

    A hybrid statistical-physical retrieval scheme is explored which combines a statistical approach with an approach based on the development of cloud-radiation models designed to simulate precipitating atmospheres. The algorithm employs the detailed microphysical information from a cloud model as input to a radiative transfer model which generates a cloud-radiation model database. Statistical procedures are then invoked to objectively generate an initial guess composite profile data set from the database. The retrieval algorithm has been tested for a tropical typhoon case using Special Sensor Microwave/Imager (SSM/I) data and has shown satisfactory results.

  11. Improving Face Verification in Photo Albums by Combining Facial Recognition and Metadata With Cross-Matching

    DTIC Science & Technology

    2017-12-01

    satisfactory performance. We do not use statistical models, and we do not create patterns that require supervised learning. Our methodology is intended...statistical models, and we do not create patterns that require supervised learning. Our methodology is intended for use in personal digital image...THESIS MOTIVATION .........................................................................19 III. METHODOLOGY

  12. Modeling Complex Phenomena Using Multiscale Time Sequences

    DTIC Science & Technology

    2009-08-24

    measures based on Hurst and Holder exponents , auto-regressive methods and Fourier and wavelet decomposition methods. The applications for this technology...relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and Holder exponents , auto-regressive...different scales and how these scales relate to each other. This can be done by combining a set statistical fractal measures based on Hurst and

  13. Statistical assessment on a combined analysis of GRYN-ROMN-UCBN upland vegetation vital signs

    USGS Publications Warehouse

    Irvine, Kathryn M.; Rodhouse, Thomas J.

    2014-01-01

    As of 2013, Rocky Mountain and Upper Columbia Basin Inventory and Monitoring Networks have multiple years of vegetation data and Greater Yellowstone Network has three years of vegetation data and monitoring is ongoing in all three networks. Our primary objective is to assess whether a combined analysis of these data aimed at exploring correlations with climate and weather data is feasible. We summarize the core survey design elements across protocols and point out the major statistical challenges for a combined analysis at present. The dissimilarity in response designs between ROMN and UCBN-GRYN network protocols presents a statistical challenge that has not been resolved yet. However, the UCBN and GRYN data are compatible as they implement a similar response design; therefore, a combined analysis is feasible and will be pursued in future. When data collected by different networks are combined, the survey design describing the merged dataset is (likely) a complex survey design. A complex survey design is the result of combining datasets from different sampling designs. A complex survey design is characterized by unequal probability sampling, varying stratification, and clustering (see Lohr 2010 Chapter 7 for general overview). Statistical analysis of complex survey data requires modifications to standard methods, one of which is to include survey design weights within a statistical model. We focus on this issue for a combined analysis of upland vegetation from these networks, leaving other topics for future research. We conduct a simulation study on the possible effects of equal versus unequal probability selection of points on parameter estimates of temporal trend using available packages within the R statistical computing package. We find that, as written, using lmer or lm for trend detection in a continuous response and clm and clmm for visually estimated cover classes with “raw” GRTS design weights specified for the weight argument leads to substantially different results and/or computational instability. However, when only fixed effects are of interest, the survey package (svyglm and svyolr) may be suitable for a model-assisted analysis for trend. We provide possible directions for future research into combined analysis for ordinal and continuous vital sign indictors.

  14. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations.

    PubMed

    Li, Qizhai; Hu, Jiyuan; Ding, Juan; Zheng, Gang

    2014-04-01

    A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.

  15. The Impact of a Flipped Classroom Model of Learning on a Large Undergraduate Statistics Class

    ERIC Educational Resources Information Center

    Nielson, Perpetua Lynne; Bean, Nathan William Bean; Larsen, Ross Allen Andrew

    2018-01-01

    We examine the impact of a flipped classroom model of learning on student performance and satisfaction in a large undergraduate introductory statistics class. Two professors each taught a lecture-section and a flipped-class section. Using MANCOVA, a linear combination of final exam scores, average quiz scores, and course ratings was compared for…

  16. Remote sensing estimation of the total phosphorus concentration in a large lake using band combinations and regional multivariate statistical modeling techniques.

    PubMed

    Gao, Yongnian; Gao, Junfeng; Yin, Hongbin; Liu, Chuansheng; Xia, Ting; Wang, Jing; Huang, Qi

    2015-03-15

    Remote sensing has been widely used for ater quality monitoring, but most of these monitoring studies have only focused on a few water quality variables, such as chlorophyll-a, turbidity, and total suspended solids, which have typically been considered optically active variables. Remote sensing presents a challenge in estimating the phosphorus concentration in water. The total phosphorus (TP) in lakes has been estimated from remotely sensed observations, primarily using the simple individual band ratio or their natural logarithm and the statistical regression method based on the field TP data and the spectral reflectance. In this study, we investigated the possibility of establishing a spatial modeling scheme to estimate the TP concentration of a large lake from multi-spectral satellite imagery using band combinations and regional multivariate statistical modeling techniques, and we tested the applicability of the spatial modeling scheme. The results showed that HJ-1A CCD multi-spectral satellite imagery can be used to estimate the TP concentration in a lake. The correlation and regression analysis showed a highly significant positive relationship between the TP concentration and certain remotely sensed combination variables. The proposed modeling scheme had a higher accuracy for the TP concentration estimation in the large lake compared with the traditional individual band ratio method and the whole-lake scale regression-modeling scheme. The TP concentration values showed a clear spatial variability and were high in western Lake Chaohu and relatively low in eastern Lake Chaohu. The northernmost portion, the northeastern coastal zone and the southeastern portion of western Lake Chaohu had the highest TP concentrations, and the other regions had the lowest TP concentration values, except for the coastal zone of eastern Lake Chaohu. These results strongly suggested that the proposed modeling scheme, i.e., the band combinations and the regional multivariate statistical modeling techniques, demonstrated advantages for estimating the TP concentration in a large lake and had a strong potential for universal application for the TP concentration estimation in large lake waters worldwide. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Statistical Methodologies to Integrate Experimental and Computational Research

    NASA Technical Reports Server (NTRS)

    Parker, P. A.; Johnson, R. T.; Montgomery, D. C.

    2008-01-01

    Development of advanced algorithms for simulating engine flow paths requires the integration of fundamental experiments with the validation of enhanced mathematical models. In this paper, we provide an overview of statistical methods to strategically and efficiently conduct experiments and computational model refinement. Moreover, the integration of experimental and computational research efforts is emphasized. With a statistical engineering perspective, scientific and engineering expertise is combined with statistical sciences to gain deeper insights into experimental phenomenon and code development performance; supporting the overall research objectives. The particular statistical methods discussed are design of experiments, response surface methodology, and uncertainty analysis and planning. Their application is illustrated with a coaxial free jet experiment and a turbulence model refinement investigation. Our goal is to provide an overview, focusing on concepts rather than practice, to demonstrate the benefits of using statistical methods in research and development, thereby encouraging their broader and more systematic application.

  18. The potential for increased power from combining P-values testing the same hypothesis.

    PubMed

    Ganju, Jitendra; Julie Ma, Guoguang

    2017-02-01

    The conventional approach to hypothesis testing for formal inference is to prespecify a single test statistic thought to be optimal. However, we usually have more than one test statistic in mind for testing the null hypothesis of no treatment effect but we do not know which one is the most powerful. Rather than relying on a single p-value, combining p-values from prespecified multiple test statistics can be used for inference. Combining functions include Fisher's combination test and the minimum p-value. Using randomization-based tests, the increase in power can be remarkable when compared with a single test and Simes's method. The versatility of the method is that it also applies when the number of covariates exceeds the number of observations. The increase in power is large enough to prefer combined p-values over a single p-value. The limitation is that the method does not provide an unbiased estimator of the treatment effect and does not apply to situations when the model includes treatment by covariate interaction.

  19. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  20. Hunting Solomonoff's Swans: Exploring the Boundary Between Physics and Statistics in Hydrological Modeling

    NASA Astrophysics Data System (ADS)

    Nearing, G. S.

    2014-12-01

    Statistical models consistently out-perform conceptual models in the short term, however to account for a nonstationary future (or an unobserved past) scientists prefer to base predictions on unchanging and commutable properties of the universe - i.e., physics. The problem with physically-based hydrology models is, of course, that they aren't really based on physics - they are based on statistical approximations of physical interactions, and we almost uniformly lack an understanding of the entropy associated with these approximations. Thermodynamics is successful precisely because entropy statistics are computable for homogeneous (well-mixed) systems, and ergodic arguments explain the success of Newton's laws to describe systems that are fundamentally quantum in nature. Unfortunately, similar arguments do not hold for systems like watersheds that are heterogeneous at a wide range of scales. Ray Solomonoff formalized the situation in 1968 by showing that given infinite evidence, simultaneously minimizing model complexity and entropy in predictions always leads to the best possible model. The open question in hydrology is about what happens when we don't have infinite evidence - for example, when the future will not look like the past, or when one watershed does not behave like another. How do we isolate stationary and commutable components of watershed behavior? I propose that one possible answer to this dilemma lies in a formal combination of physics and statistics. In this talk I outline my recent analogue (Solomonoff's theorem was digital) of Solomonoff's idea that allows us to quantify the complexity/entropy tradeoff in a way that is intuitive to physical scientists. I show how to formally combine "physical" and statistical methods for model development in a way that allows us to derive the theoretically best possible model given any given physics approximation(s) and available observations. Finally, I apply an analogue of Solomonoff's theorem to evaluate the tradeoff between model complexity and prediction power.

  1. Cancer Related-Knowledge - Small Area Estimates

    Cancer.gov

    These model-based estimates are produced using statistical models that combine data from the Health Information National Trends Survey, and auxiliary variables obtained from relevant sources and borrow strength from other areas with similar characteristics.

  2. Statistics of the geomagnetic secular variation for the past 5Ma

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1986-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  3. Statistics of the geomagnetic secular variation for the past 5 m.y

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1988-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  4. Non-linear scaling of a musculoskeletal model of the lower limb using statistical shape models.

    PubMed

    Nolte, Daniel; Tsang, Chui Kit; Zhang, Kai Yu; Ding, Ziyun; Kedgley, Angela E; Bull, Anthony M J

    2016-10-03

    Accurate muscle geometry for musculoskeletal models is important to enable accurate subject-specific simulations. Commonly, linear scaling is used to obtain individualised muscle geometry. More advanced methods include non-linear scaling using segmented bone surfaces and manual or semi-automatic digitisation of muscle paths from medical images. In this study, a new scaling method combining non-linear scaling with reconstructions of bone surfaces using statistical shape modelling is presented. Statistical Shape Models (SSMs) of femur and tibia/fibula were used to reconstruct bone surfaces of nine subjects. Reference models were created by morphing manually digitised muscle paths to mean shapes of the SSMs using non-linear transformations and inter-subject variability was calculated. Subject-specific models of muscle attachment and via points were created from three reference models. The accuracy was evaluated by calculating the differences between the scaled and manually digitised models. The points defining the muscle paths showed large inter-subject variability at the thigh and shank - up to 26mm; this was found to limit the accuracy of all studied scaling methods. Errors for the subject-specific muscle point reconstructions of the thigh could be decreased by 9% to 20% by using the non-linear scaling compared to a typical linear scaling method. We conclude that the proposed non-linear scaling method is more accurate than linear scaling methods. Thus, when combined with the ability to reconstruct bone surfaces from incomplete or scattered geometry data using statistical shape models our proposed method is an alternative to linear scaling methods. Copyright © 2016 The Author. Published by Elsevier Ltd.. All rights reserved.

  5. The value of model averaging and dynamical climate model predictions for improving statistical seasonal streamflow forecasts over Australia

    NASA Astrophysics Data System (ADS)

    Pokhrel, Prafulla; Wang, Q. J.; Robertson, David E.

    2013-10-01

    Seasonal streamflow forecasts are valuable for planning and allocation of water resources. In Australia, the Bureau of Meteorology employs a statistical method to forecast seasonal streamflows. The method uses predictors that are related to catchment wetness at the start of a forecast period and to climate during the forecast period. For the latter, a predictor is selected among a number of lagged climate indices as candidates to give the "best" model in terms of model performance in cross validation. This study investigates two strategies for further improvement in seasonal streamflow forecasts. The first is to combine, through Bayesian model averaging, multiple candidate models with different lagged climate indices as predictors, to take advantage of different predictive strengths of the multiple models. The second strategy is to introduce additional candidate models, using rainfall and sea surface temperature predictions from a global climate model as predictors. This is to take advantage of the direct simulations of various dynamic processes. The results show that combining forecasts from multiple statistical models generally yields more skillful forecasts than using only the best model and appears to moderate the worst forecast errors. The use of rainfall predictions from the dynamical climate model marginally improves the streamflow forecasts when viewed over all the study catchments and seasons, but the use of sea surface temperature predictions provide little additional benefit.

  6. Influence of Texture and Colour in Breast TMA Classification

    PubMed Central

    Fernández-Carrobles, M. Milagro; Bueno, Gloria; Déniz, Oscar; Salido, Jesús; García-Rojo, Marcial; González-López, Lucía

    2015-01-01

    Breast cancer diagnosis is still done by observation of biopsies under the microscope. The development of automated methods for breast TMA classification would reduce diagnostic time. This paper is a step towards the solution for this problem and shows a complete study of breast TMA classification based on colour models and texture descriptors. The TMA images were divided into four classes: i) benign stromal tissue with cellularity, ii) adipose tissue, iii) benign and benign anomalous structures, and iv) ductal and lobular carcinomas. A relevant set of features was obtained on eight different colour models from first and second order Haralick statistical descriptors obtained from the intensity image, Fourier, Wavelets, Multiresolution Gabor, M-LBP and textons descriptors. Furthermore, four types of classification experiments were performed using six different classifiers: (1) classification per colour model individually, (2) classification by combination of colour models, (3) classification by combination of colour models and descriptors, and (4) classification by combination of colour models and descriptors with a previous feature set reduction. The best result shows an average of 99.05% accuracy and 98.34% positive predictive value. These results have been obtained by means of a bagging tree classifier with combination of six colour models and the use of 1719 non-correlated (correlation threshold of 97%) textural features based on Statistical, M-LBP, Gabor and Spatial textons descriptors. PMID:26513238

  7. Assessing the effect of land use change on catchment runoff by combined use of statistical tests and hydrological modelling: Case studies from Zimbabwe

    NASA Astrophysics Data System (ADS)

    Lørup, Jens Kristian; Refsgaard, Jens Christian; Mazvimavi, Dominic

    1998-03-01

    The purpose of this study was to identify and assess long-term impacts of land use change on catchment runoff in semi-arid Zimbabwe, based on analyses of long hydrological time series (25-50 years) from six medium-sized (200-1000 km 2) non-experimental rural catchments. A methodology combining common statistical methods with hydrological modelling was adopted in order to distinguish between the effects of climate variability and the effects of land use change. The hydrological model (NAM) was in general able to simulate the observed hydrographs very well during the reference period, thus providing a means to account for the effects of climate variability and hence strengthening the power of the subsequent statistical tests. In the test period the validated model was used to provide the runoff record which would have occurred in the absence of land use change. The analyses indicated a decrease in the annual runoff for most of the six catchments, with the largest changes occurring for catchments located within communal land, where large increases in population and agricultural intensity have taken place. However, the decrease was only statistically significant at the 5% level for one of the catchments.

  8. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  9. Multi-fidelity stochastic collocation method for computation of statistical moments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu

    We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.

  10. Local sensitivity analysis for inverse problems solved by singular value decomposition

    USGS Publications Warehouse

    Hill, M.C.; Nolan, B.T.

    2010-01-01

    Local sensitivity analysis provides computationally frugal ways to evaluate models commonly used for resource management, risk assessment, and so on. This includes diagnosing inverse model convergence problems caused by parameter insensitivity and(or) parameter interdependence (correlation), understanding what aspects of the model and data contribute to measures of uncertainty, and identifying new data likely to reduce model uncertainty. Here, we consider sensitivity statistics relevant to models in which the process model parameters are transformed using singular value decomposition (SVD) to create SVD parameters for model calibration. The statistics considered include the PEST identifiability statistic, and combined use of the process-model parameter statistics composite scaled sensitivities and parameter correlation coefficients (CSS and PCC). The statistics are complimentary in that the identifiability statistic integrates the effects of parameter sensitivity and interdependence, while CSS and PCC provide individual measures of sensitivity and interdependence. PCC quantifies correlations between pairs or larger sets of parameters; when a set of parameters is intercorrelated, the absolute value of PCC is close to 1.00 for all pairs in the set. The number of singular vectors to include in the calculation of the identifiability statistic is somewhat subjective and influences the statistic. To demonstrate the statistics, we use the USDA’s Root Zone Water Quality Model to simulate nitrogen fate and transport in the unsaturated zone of the Merced River Basin, CA. There are 16 log-transformed process-model parameters, including water content at field capacity (WFC) and bulk density (BD) for each of five soil layers. Calibration data consisted of 1,670 observations comprising soil moisture, soil water tension, aqueous nitrate and bromide concentrations, soil nitrate concentration, and organic matter content. All 16 of the SVD parameters could be estimated by regression based on the range of singular values. Identifiability statistic results varied based on the number of SVD parameters included. Identifiability statistics calculated for four SVD parameters indicate the same three most important process-model parameters as CSS/PCC (WFC1, WFC2, and BD2), but the order differed. Additionally, the identifiability statistic showed that BD1 was almost as dominant as WFC1. The CSS/PCC analysis showed that this results from its high correlation with WCF1 (-0.94), and not its individual sensitivity. Such distinctions, combined with analysis of how high correlations and(or) sensitivities result from the constructed model, can produce important insights into, for example, the use of sensitivity analysis to design monitoring networks. In conclusion, the statistics considered identified similar important parameters. They differ because (1) with CSS/PCC can be more awkward because sensitivity and interdependence are considered separately and (2) identifiability requires consideration of how many SVD parameters to include. A continuing challenge is to understand how these computationally efficient methods compare with computationally demanding global methods like Markov-Chain Monte Carlo given common nonlinear processes and the often even more nonlinear models.

  11. Predicting Statistical Response and Extreme Events in Uncertainty Quantification through Reduced-Order Models

    NASA Astrophysics Data System (ADS)

    Qi, D.; Majda, A.

    2017-12-01

    A low-dimensional reduced-order statistical closure model is developed for quantifying the uncertainty in statistical sensitivity and intermittency in principal model directions with largest variability in high-dimensional turbulent system and turbulent transport models. Imperfect model sensitivity is improved through a recent mathematical strategy for calibrating model errors in a training phase, where information theory and linear statistical response theory are combined in a systematic fashion to achieve the optimal model performance. The idea in the reduced-order method is from a self-consistent mathematical framework for general systems with quadratic nonlinearity, where crucial high-order statistics are approximated by a systematic model calibration procedure. Model efficiency is improved through additional damping and noise corrections to replace the expensive energy-conserving nonlinear interactions. Model errors due to the imperfect nonlinear approximation are corrected by tuning the model parameters using linear response theory with an information metric in a training phase before prediction. A statistical energy principle is adopted to introduce a global scaling factor in characterizing the higher-order moments in a consistent way to improve model sensitivity. Stringent models of barotropic and baroclinic turbulence are used to display the feasibility of the reduced-order methods. Principal statistical responses in mean and variance can be captured by the reduced-order models with accuracy and efficiency. Besides, the reduced-order models are also used to capture crucial passive tracer field that is advected by the baroclinic turbulent flow. It is demonstrated that crucial principal statistical quantities like the tracer spectrum and fat-tails in the tracer probability density functions in the most important large scales can be captured efficiently with accuracy using the reduced-order tracer model in various dynamical regimes of the flow field with distinct statistical structures.

  12. Dose response explorer: an integrated open-source tool for exploring and modelling radiotherapy dose volume outcome relationships

    NASA Astrophysics Data System (ADS)

    El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.

    2006-11-01

    Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.

  13. Prediction of In Vivo Knee Joint Kinematics Using a Combined Dual Fluoroscopy Imaging and Statistical Shape Modeling Technique

    PubMed Central

    Li, Jing-Sheng; Tsai, Tsung-Yuan; Wang, Shaobai; Li, Pingyue; Kwon, Young-Min; Freiberg, Andrew; Rubash, Harry E.; Li, Guoan

    2014-01-01

    Using computed tomography (CT) or magnetic resonance (MR) images to construct 3D knee models has been widely used in biomedical engineering research. Statistical shape modeling (SSM) method is an alternative way to provide a fast, cost-efficient, and subject-specific knee modeling technique. This study was aimed to evaluate the feasibility of using a combined dual-fluoroscopic imaging system (DFIS) and SSM method to investigate in vivo knee kinematics. Three subjects were studied during a treadmill walking. The data were compared with the kinematics obtained using a CT-based modeling technique. Geometric root-mean-square (RMS) errors between the knee models constructed using the SSM and CT-based modeling techniques were 1.16 mm and 1.40 mm for the femur and tibia, respectively. For the kinematics of the knee during the treadmill gait, the SSM model can predict the knee kinematics with RMS errors within 3.3 deg for rotation and within 2.4 mm for translation throughout the stance phase of the gait cycle compared with those obtained using the CT-based knee models. The data indicated that the combined DFIS and SSM technique could be used for quick evaluation of knee joint kinematics. PMID:25320846

  14. Airborne Wireless Communication Modeling and Analysis with MATLAB

    DTIC Science & Technology

    2014-03-27

    research develops a physical layer model that combines antenna modeling using computational electromagnetics and the two-ray propagation model to...predict the received signal strength. The antenna is modeled with triangular patches and analyzed by extending the antenna modeling algorithm by Sergey...7  2.7. Propagation Modeling : Statistical Models ............................................................8  2.8. Antenna Modeling

  15. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  16. Applying Statistical Models and Parametric Distance Measures for Music Similarity Search

    NASA Astrophysics Data System (ADS)

    Lukashevich, Hanna; Dittmar, Christian; Bastuck, Christoph

    Automatic deriving of similarity relations between music pieces is an inherent field of music information retrieval research. Due to the nearly unrestricted amount of musical data, the real-world similarity search algorithms have to be highly efficient and scalable. The possible solution is to represent each music excerpt with a statistical model (ex. Gaussian mixture model) and thus to reduce the computational costs by applying the parametric distance measures between the models. In this paper we discuss the combinations of applying different parametric modelling techniques and distance measures and weigh the benefits of each one against the others.

  17. Bootstrap study of genome-enabled prediction reliabilities using haplotype blocks across Nordic Red cattle breeds.

    PubMed

    Cuyabano, B C D; Su, G; Rosa, G J M; Lund, M S; Gianola, D

    2015-10-01

    This study compared the accuracy of genome-enabled prediction models using individual single nucleotide polymorphisms (SNP) or haplotype blocks as covariates when using either a single breed or a combined population of Nordic Red cattle. The main objective was to compare predictions of breeding values of complex traits using a combined training population with haplotype blocks, with predictions using a single breed as training population and individual SNP as predictors. To compare the prediction reliabilities, bootstrap samples were taken from the test data set. With the bootstrapped samples of prediction reliabilities, we built and graphed confidence ellipses to allow comparisons. Finally, measures of statistical distances were used to calculate the gain in predictive ability. Our analyses are innovative in the context of assessment of predictive models, allowing a better understanding of prediction reliabilities and providing a statistical basis to effectively calibrate whether one prediction scenario is indeed more accurate than another. An ANOVA indicated that use of haplotype blocks produced significant gains mainly when Bayesian mixture models were used but not when Bayesian BLUP was fitted to the data. Furthermore, when haplotype blocks were used to train prediction models in a combined Nordic Red cattle population, we obtained up to a statistically significant 5.5% average gain in prediction accuracy, over predictions using individual SNP and training the model with a single breed. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  18. Hybrid perturbation methods based on statistical time series models

    NASA Astrophysics Data System (ADS)

    San-Juan, Juan Félix; San-Martín, Montserrat; Pérez, Iván; López, Rosario

    2016-04-01

    In this work we present a new methodology for orbit propagation, the hybrid perturbation theory, based on the combination of an integration method and a prediction technique. The former, which can be a numerical, analytical or semianalytical theory, generates an initial approximation that contains some inaccuracies derived from the fact that, in order to simplify the expressions and subsequent computations, not all the involved forces are taken into account and only low-order terms are considered, not to mention the fact that mathematical models of perturbations not always reproduce physical phenomena with absolute precision. The prediction technique, which can be based on either statistical time series models or computational intelligence methods, is aimed at modelling and reproducing missing dynamics in the previously integrated approximation. This combination results in the precision improvement of conventional numerical, analytical and semianalytical theories for determining the position and velocity of any artificial satellite or space debris object. In order to validate this methodology, we present a family of three hybrid orbit propagators formed by the combination of three different orders of approximation of an analytical theory and a statistical time series model, and analyse their capability to process the effect produced by the flattening of the Earth. The three considered analytical components are the integration of the Kepler problem, a first-order and a second-order analytical theories, whereas the prediction technique is the same in the three cases, namely an additive Holt-Winters method.

  19. Tornado damage risk assessment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Reinhold, T.A.; Ellingwood, B.

    1982-09-01

    Several proposed models were evaluated for predicting tornado wind speed probabilities at nuclear plant sites as part of a program to develop statistical data on tornadoes needed for probability-based load combination analysis. A unified model was developed which synthesized the desired aspects of tornado occurrence and damage potential. The sensitivity of wind speed probability estimates to various tornado modeling assumptions are examined, and the probability distributions of tornado wind speed that are needed for load combination studies are presented.

  20. Model-based Small Area Estimates of Cancer Risk Factors and Screening Behaviors - Small Area Estimates

    Cancer.gov

    These model-based estimates use two surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health Interview Survey (NHIS). The two surveys are combined using novel statistical methodology.

  1. Cosmological Constraints from Fourier Phase Statistics

    NASA Astrophysics Data System (ADS)

    Ali, Kamran; Obreschkow, Danail; Howlett, Cullan; Bonvin, Camille; Llinares, Claudio; Oliveira Franco, Felipe; Power, Chris

    2018-06-01

    Most statistical inference from cosmic large-scale structure relies on two-point statistics, i.e. on the galaxy-galaxy correlation function (2PCF) or the power spectrum. These statistics capture the full information encoded in the Fourier amplitudes of the galaxy density field but do not describe the Fourier phases of the field. Here, we quantify the information contained in the line correlation function (LCF), a three-point Fourier phase correlation function. Using cosmological simulations, we estimate the Fisher information (at redshift z = 0) of the 2PCF, LCF and their combination, regarding the cosmological parameters of the standard ΛCDM model, as well as a Warm Dark Matter (WDM) model and the f(R) and Symmetron modified gravity models. The galaxy bias is accounted for at the level of a linear bias. The relative information of the 2PCF and the LCF depends on the survey volume, sampling density (shot noise) and the bias uncertainty. For a volume of 1h^{-3}Gpc^3, sampled with points of mean density \\bar{n} = 2× 10^{-3} h3 Mpc^{-3} and a bias uncertainty of 13%, the LCF improves the parameter constraints by about 20% in the ΛCDM cosmology and potentially even more in alternative models. Finally, since a linear bias only affects the Fourier amplitudes (2PCF), but not the phases (LCF), the combination of the 2PCF and the LCF can be used to break the degeneracy between the linear bias and σ8, present in 2-point statistics.

  2. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

    PubMed

    Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

    2015-08-01

    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.

  3. Prediction of the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors.

    PubMed

    Takahara, Mitsuyoshi; Katakami, Naoto; Kaneto, Hideaki; Noguchi, Midori; Shimomura, Iichiro

    2014-01-01

    The aim of the current study was to develop a predictive model of insulin resistance using general health checkup data in Japanese employees with one or more metabolic risk factors. We used a database of 846 Japanese employees with one or more metabolic risk factors who underwent general health checkup and a 75-g oral glucose tolerance test (OGTT). Logistic regression models were developed to predict existing insulin resistance evaluated using the Matsuda index. The predictive performance of these models was assessed using the C statistic. The C statistics of body mass index (BMI), waist circumference and their combined use were 0.743, 0.732 and 0.749, with no significant differences. The multivariate backward selection model, in which BMI, the levels of plasma glucose, high-density lipoprotein (HDL) cholesterol, log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment remained, had a C statistic of 0.816, with a significant difference compared to the combined use of BMI and waist circumference (p<0.01). The C statistic was not significantly reduced when the levels of log-transformed triglycerides and log-transformed alanine aminotransferase and hypertension under treatment were simultaneously excluded from the multivariate model (p=0.14). On the other hand, further exclusion of any of the remaining three variables significantly reduced the C statistic (all p<0.01). When predicting the presence of insulin resistance using general health checkup data in Japanese employees with metabolic risk factors, it is important to take into consideration the BMI and fasting plasma glucose and HDL cholesterol levels.

  4. A meta-analysis and statistical modelling of nitrates in groundwater at the African scale

    NASA Astrophysics Data System (ADS)

    Ouedraogo, Issoufou; Vanclooster, Marnik

    2016-06-01

    Contamination of groundwater with nitrate poses a major health risk to millions of people around Africa. Assessing the space-time distribution of this contamination, as well as understanding the factors that explain this contamination, is important for managing sustainable drinking water at the regional scale. This study aims to assess the variables that contribute to nitrate pollution in groundwater at the African scale by statistical modelling. We compiled a literature database of nitrate concentration in groundwater (around 250 studies) and combined it with digital maps of physical attributes such as soil, geology, climate, hydrogeology, and anthropogenic data for statistical model development. The maximum, medium, and minimum observed nitrate concentrations were analysed. In total, 13 explanatory variables were screened to explain observed nitrate pollution in groundwater. For the mean nitrate concentration, four variables are retained in the statistical explanatory model: (1) depth to groundwater (shallow groundwater, typically < 50 m); (2) recharge rate; (3) aquifer type; and (4) population density. The first three variables represent intrinsic vulnerability of groundwater systems to pollution, while the latter variable is a proxy for anthropogenic pollution pressure. The model explains 65 % of the variation of mean nitrate contamination in groundwater at the African scale. Using the same proxy information, we could develop a statistical model for the maximum nitrate concentrations that explains 42 % of the nitrate variation. For the maximum concentrations, other environmental attributes such as soil type, slope, rainfall, climate class, and region type improve the prediction of maximum nitrate concentrations at the African scale. As to minimal nitrate concentrations, in the absence of normal distribution assumptions of the data set, we do not develop a statistical model for these data. The data-based statistical model presented here represents an important step towards developing tools that will allow us to accurately predict nitrate distribution at the African scale and thus may support groundwater monitoring and water management that aims to protect groundwater systems. Yet they should be further refined and validated when more detailed and harmonized data become available and/or combined with more conceptual descriptions of the fate of nutrients in the hydrosystem.

  5. Cognitive Complaints After Breast Cancer Treatments: Examining the Relationship With Neuropsychological Test Performance

    PubMed Central

    2013-01-01

    Background Cognitive complaints are reported frequently after breast cancer treatments. Their association with neuropsychological (NP) test performance is not well-established. Methods Early-stage, posttreatment breast cancer patients were enrolled in a prospective, longitudinal, cohort study prior to starting endocrine therapy. Evaluation included an NP test battery and self-report questionnaires assessing symptoms, including cognitive complaints. Multivariable regression models assessed associations among cognitive complaints, mood, treatment exposures, and NP test performance. Results One hundred eighty-nine breast cancer patients, aged 21–65 years, completed the evaluation; 23.3% endorsed higher memory complaints and 19.0% reported higher executive function complaints (>1 SD above the mean for healthy control sample). Regression modeling demonstrated a statistically significant association of higher memory complaints with combined chemotherapy and radiation treatments (P = .01), poorer NP verbal memory performance (P = .02), and higher depressive symptoms (P < .001), controlling for age and IQ. For executive functioning complaints, multivariable modeling controlling for age, IQ, and other confounds demonstrated statistically significant associations with better NP visual memory performance (P = .03) and higher depressive symptoms (P < .001), whereas combined chemotherapy and radiation treatment (P = .05) approached statistical significance. Conclusions About one in five post–adjuvant treatment breast cancer patients had elevated memory and/or executive function complaints that were statistically significantly associated with domain-specific NP test performances and depressive symptoms; combined chemotherapy and radiation treatment was also statistically significantly associated with memory complaints. These results and other emerging studies suggest that subjective cognitive complaints in part reflect objective NP performance, although their etiology and biology appear to be multifactorial, motivating further transdisciplinary research. PMID:23606729

  6. AGR-1 Thermocouple Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jeff Einerson

    2012-05-01

    This report documents an effort to analyze measured and simulated data obtained in the Advanced Gas Reactor (AGR) fuel irradiation test program conducted in the INL's Advanced Test Reactor (ATR) to support the Next Generation Nuclear Plant (NGNP) R&D program. The work follows up on a previous study (Pham and Einerson, 2010), in which statistical analysis methods were applied for AGR-1 thermocouple data qualification. The present work exercises the idea that, while recognizing uncertainties inherent in physics and thermal simulations of the AGR-1 test, results of the numerical simulations can be used in combination with the statistical analysis methods tomore » further improve qualification of measured data. Additionally, the combined analysis of measured and simulation data can generate insights about simulation model uncertainty that can be useful for model improvement. This report also describes an experimental control procedure to maintain fuel target temperature in the future AGR tests using regression relationships that include simulation results. The report is organized into four chapters. Chapter 1 introduces the AGR Fuel Development and Qualification program, AGR-1 test configuration and test procedure, overview of AGR-1 measured data, and overview of physics and thermal simulation, including modeling assumptions and uncertainties. A brief summary of statistical analysis methods developed in (Pham and Einerson 2010) for AGR-1 measured data qualification within NGNP Data Management and Analysis System (NDMAS) is also included for completeness. Chapters 2-3 describe and discuss cases, in which the combined use of experimental and simulation data is realized. A set of issues associated with measurement and modeling uncertainties resulted from the combined analysis are identified. This includes demonstration that such a combined analysis led to important insights for reducing uncertainty in presentation of AGR-1 measured data (Chapter 2) and interpretation of simulation results (Chapter 3). The statistics-based simulation-aided experimental control procedure described for the future AGR tests is developed and demonstrated in Chapter 4. The procedure for controlling the target fuel temperature (capsule peak or average) is based on regression functions of thermocouple readings and other relevant parameters and accounting for possible changes in both physical and thermal conditions and in instrument performance.« less

  7. Forecasting of Radiation Belts: Results From the PROGRESS Project.

    NASA Astrophysics Data System (ADS)

    Balikhin, M. A.; Arber, T. D.; Ganushkina, N. Y.; Walker, S. N.

    2017-12-01

    Forecasting of Radiation Belts: Results from the PROGRESS Project. The overall goal of the PROGRESS project, funded in frame of EU Horizon2020 programme, is to combine first principles based models with the systems science methodologies to achieve reliable forecasts of the geo-space particle radiation environment.The PROGRESS incorporates three themes : The propagation of the solar wind to L1, Forecast of geomagnetic indices, and forecast of fluxes of energetic electrons within the magnetosphere. One of the important aspects of the PROGRESS project is the development of statistical wave models for magnetospheric waves that affect the dynamics of energetic electrons such as lower band chorus, hiss and equatorial noise. The error reduction ratio (ERR) concept has been used to optimise the set of solar wind and geomagnetic parameters for organisation of statistical wave models for these emissions. The resulting sets of parameters and statistical wave models will be presented and discussed. However the ERR analysis also indicates that the combination of solar wind and geomagnetic parameters accounts for only part of the variance of the emissions under investigation (lower band chorus, hiss and equatorial noise). In addition, advances in the forecast of fluxes of energetic electrons, exploiting empirical models and the first principles IMPTAM model achieved by the PROGRESS project is presented.

  8. An Investigation of Item Fit Statistics for Mixed IRT Models

    ERIC Educational Resources Information Center

    Chon, Kyong Hee

    2009-01-01

    The purpose of this study was to investigate procedures for assessing model fit of IRT models for mixed format data. In this study, various IRT model combinations were fitted to data containing both dichotomous and polytomous item responses, and the suitability of the chosen model mixtures was evaluated based on a number of model fit procedures.…

  9. Blinded Validation of Breath Biomarkers of Lung Cancer, a Potential Ancillary to Chest CT Screening

    PubMed Central

    Phillips, Michael; Bauer, Thomas L.; Cataneo, Renee N.; Lebauer, Cassie; Mundada, Mayur; Pass, Harvey I.; Ramakrishna, Naren; Rom, William N.; Vallières, Eric

    2015-01-01

    Background Breath volatile organic compounds (VOCs) have been reported as biomarkers of lung cancer, but it is not known if biomarkers identified in one group can identify disease in a separate independent cohort. Also, it is not known if combining breath biomarkers with chest CT has the potential to improve the sensitivity and specificity of lung cancer screening. Methods Model-building phase (unblinded): Breath VOCs were analyzed with gas chromatography mass spectrometry in 82 asymptomatic smokers having screening chest CT, 84 symptomatic high-risk subjects with a tissue diagnosis, 100 without a tissue diagnosis, and 35 healthy subjects. Multiple Monte Carlo simulations identified breath VOC mass ions with greater than random diagnostic accuracy for lung cancer, and these were combined in a multivariate predictive algorithm. Model-testing phase (blinded validation): We analyzed breath VOCs in an independent cohort of similar subjects (n = 70, 51, 75 and 19 respectively). The algorithm predicted discriminant function (DF) values in blinded replicate breath VOC samples analyzed independently at two laboratories (A and B). Outcome modeling: We modeled the expected effects of combining breath biomarkers with chest CT on the sensitivity and specificity of lung cancer screening. Results Unblinded model-building phase. The algorithm identified lung cancer with sensitivity 74.0%, specificity 70.7% and C-statistic 0.78. Blinded model-testing phase: The algorithm identified lung cancer at Laboratory A with sensitivity 68.0%, specificity 68.4%, C-statistic 0.71; and at Laboratory B with sensitivity 70.1%, specificity 68.0%, C-statistic 0.70, with linear correlation between replicates (r = 0.88). In a projected outcome model, breath biomarkers increased the sensitivity, specificity, and positive and negative predictive values of chest CT for lung cancer when the tests were combined in series or parallel. Conclusions Breath VOC mass ion biomarkers identified lung cancer in a separate independent cohort, in a blinded replicated study. Combining breath biomarkers with chest CT could potentially improve the sensitivity and specificity of lung cancer screening. Trial Registration ClinicalTrials.gov NCT00639067 PMID:26698306

  10. How Statisticians Speak Risk

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Redus, K.S.

    2007-07-01

    The foundation of statistics deals with (a) how to measure and collect data and (b) how to identify models using estimates of statistical parameters derived from the data. Risk is a term used by the statistical community and those that employ statistics to express the results of a statistically based study. Statistical risk is represented as a probability that, for example, a statistical model is sufficient to describe a data set; but, risk is also interpreted as a measure of worth of one alternative when compared to another. The common thread of any risk-based problem is the combination of (a)more » the chance an event will occur, with (b) the value of the event. This paper presents an introduction to, and some examples of, statistical risk-based decision making from a quantitative, visual, and linguistic sense. This should help in understanding areas of radioactive waste management that can be suitably expressed using statistical risk and vice-versa. (authors)« less

  11. Temporal asymmetry in precipitation time series and its influence on flow simulations in combined sewer systems

    NASA Astrophysics Data System (ADS)

    Müller, Thomas; Schütze, Manfred; Bárdossy, András

    2017-09-01

    A property of natural processes is temporal irreversibility. However, this property cannot be reflected by most statistics used to describe precipitation time series and, consequently, is not considered in most precipitation models. In this paper, a new statistic, the asymmetry measure, is introduced and applied to precipitation enabling to detect and quantify irreversibility. It is used to analyze two different data sets of Singapore and Germany. The data of both locations show a significant asymmetry for high temporal resolutions. The asymmetry is more pronounced for Singapore where the climate is dominated by convective precipitation events. The impact of irreversibility on applications is analyzed on two different hydrological sewer system models. The results show that the effect of the irreversibility can lead to biases in combined sewer overflow statistics. This bias is in the same order as the effect that can be achieved by real time control of sewer systems. Consequently, wrong conclusion can be drawn if synthetic time series are used for sewer systems if asymmetry is present, but not considered in precipitation modeling.

  12. Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students.

    PubMed

    Saadati, Farzaneh; Ahmad Tarmizi, Rohani; Mohd Ayub, Ahmad Fauzi; Abu Bakar, Kamariah

    2015-01-01

    Because students' ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is 'value added' because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students' problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students.

  13. Predicting lettuce canopy photosynthesis with statistical and neural network models

    NASA Technical Reports Server (NTRS)

    Frick, J.; Precetti, C.; Mitchell, C. A.

    1998-01-01

    An artificial neural network (NN) and a statistical regression model were developed to predict canopy photosynthetic rates (Pn) for 'Waldman's Green' leaf lettuce (Latuca sativa L.). All data used to develop and test the models were collected for crop stands grown hydroponically and under controlled-environment conditions. In the NN and regression models, canopy Pn was predicted as a function of three independent variables: shootzone CO2 concentration (600 to 1500 micromoles mol-1), photosynthetic photon flux (PPF) (600 to 1100 micromoles m-2 s-1), and canopy age (10 to 20 days after planting). The models were used to determine the combinations of CO2 and PPF setpoints required each day to maintain maximum canopy Pn. The statistical model (a third-order polynomial) predicted Pn more accurately than the simple NN (a three-layer, fully connected net). Over an 11-day validation period, average percent difference between predicted and actual Pn was 12.3% and 24.6% for the statistical and NN models, respectively. Both models lost considerable accuracy when used to determine relatively long-range Pn predictions (> or = 6 days into the future).

  14. Combining forecast weights: Why and how?

    NASA Astrophysics Data System (ADS)

    Yin, Yip Chee; Kok-Haur, Ng; Hock-Eam, Lim

    2012-09-01

    This paper proposes a procedure called forecast weight averaging which is a specific combination of forecast weights obtained from different methods of constructing forecast weights for the purpose of improving the accuracy of pseudo out of sample forecasting. It is found that under certain specified conditions, forecast weight averaging can lower the mean squared forecast error obtained from model averaging. In addition, we show that in a linear and homoskedastic environment, this superior predictive ability of forecast weight averaging holds true irrespective whether the coefficients are tested by t statistic or z statistic provided the significant level is within the 10% range. By theoretical proofs and simulation study, we have shown that model averaging like, variance model averaging, simple model averaging and standard error model averaging, each produces mean squared forecast error larger than that of forecast weight averaging. Finally, this result also holds true marginally when applied to business and economic empirical data sets, Gross Domestic Product (GDP growth rate), Consumer Price Index (CPI) and Average Lending Rate (ALR) of Malaysia.

  15. INTRODUCTION TO A COMBINED MULTIPLE LINEAR REGRESSION AND ARMA MODELING APPROACH FOR BEACH BACTERIA PREDICTION

    EPA Science Inventory

    Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...

  16. A Numerical Simulation and Statistical Modeling of High Intensity Radiated Fields Experiment Data

    NASA Technical Reports Server (NTRS)

    Smith, Laura J.

    2004-01-01

    Tests are conducted on a quad-redundant fault tolerant flight control computer to establish upset characteristics of an avionics system in an electromagnetic field. A numerical simulation and statistical model are described in this work to analyze the open loop experiment data collected in the reverberation chamber at NASA LaRC as a part of an effort to examine the effects of electromagnetic interference on fly-by-wire aircraft control systems. By comparing thousands of simulation and model outputs, the models that best describe the data are first identified and then a systematic statistical analysis is performed on the data. All of these efforts are combined which culminate in an extrapolation of values that are in turn used to support previous efforts used in evaluating the data.

  17. 10 CFR 436.31 - Definitions.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... systems, building load simulation models, statistical regression analysis, or some combination of these..., excluding any cogeneration process for other than a federally owned building or buildings or other federally...

  18. Not Just a Sum? Identifying Different Types of Interplay between Constituents in Combined Interventions

    PubMed Central

    Van Deun, Katrijn; Thorrez, Lieven; van den Berg, Robert A.; Smilde, Age K.; Van Mechelen, Iven

    2015-01-01

    Motivation Experiments in which the effect of combined manipulations is compared with the effects of their pure constituents have received a great deal of attention. Examples include the study of combination therapies and the comparison of double and single knockout model organisms. Often the effect of the combined manipulation is not a mere addition of the effects of its constituents, with quite different forms of interplay between the constituents being possible. Yet, a well-formalized taxonomy of possible forms of interplay is lacking, let alone a statistical methodology to test for their presence in empirical data. Results Starting from a taxonomy of a broad range of forms of interplay between constituents of a combined manipulation, we propose a sound statistical hypothesis testing framework to test for the presence of each particular form of interplay. We illustrate the framework with analyses of public gene expression data on the combined treatment of dendritic cells with curdlan and GM-CSF and show that these lead to valuable insights into the mode of action of the constituent treatments and their combination. Availability and Implementation R code implementing the statistical testing procedure for microarray gene expression data is available as supplementary material. The data are available from the Gene Expression Omnibus with accession number GSE32986. PMID:25965065

  19. Not Just a Sum? Identifying Different Types of Interplay between Constituents in Combined Interventions.

    PubMed

    Van Deun, Katrijn; Thorrez, Lieven; van den Berg, Robert A; Smilde, Age K; Van Mechelen, Iven

    2015-01-01

    Experiments in which the effect of combined manipulations is compared with the effects of their pure constituents have received a great deal of attention. Examples include the study of combination therapies and the comparison of double and single knockout model organisms. Often the effect of the combined manipulation is not a mere addition of the effects of its constituents, with quite different forms of interplay between the constituents being possible. Yet, a well-formalized taxonomy of possible forms of interplay is lacking, let alone a statistical methodology to test for their presence in empirical data. Starting from a taxonomy of a broad range of forms of interplay between constituents of a combined manipulation, we propose a sound statistical hypothesis testing framework to test for the presence of each particular form of interplay. We illustrate the framework with analyses of public gene expression data on the combined treatment of dendritic cells with curdlan and GM-CSF and show that these lead to valuable insights into the mode of action of the constituent treatments and their combination. R code implementing the statistical testing procedure for microarray gene expression data is available as supplementary material. The data are available from the Gene Expression Omnibus with accession number GSE32986.

  20. Probing the exchange statistics of one-dimensional anyon models

    NASA Astrophysics Data System (ADS)

    Greschner, Sebastian; Cardarelli, Lorenzo; Santos, Luis

    2018-05-01

    We propose feasible scenarios for revealing the modified exchange statistics in one-dimensional anyon models in optical lattices based on an extension of the multicolor lattice-depth modulation scheme introduced in [Phys. Rev. A 94, 023615 (2016), 10.1103/PhysRevA.94.023615]. We show that the fast modulation of a two-component fermionic lattice gas in the presence a magnetic field gradient, in combination with additional resonant microwave fields, allows for the quantum simulation of hardcore anyon models with periodic boundary conditions. Such a semisynthetic ring setup allows for realizing an interferometric arrangement sensitive to the anyonic statistics. Moreover, we show as well that simple expansion experiments may reveal the formation of anomalously bound pairs resulting from the anyonic exchange.

  1. External validation of ADO, DOSE, COTE and CODEX at predicting death in primary care patients with COPD using standard and machine learning approaches.

    PubMed

    Morales, Daniel R; Flynn, Rob; Zhang, Jianguo; Trucco, Emmanuel; Quint, Jennifer K; Zutis, Kris

    2018-05-01

    Several models for predicting the risk of death in people with chronic obstructive pulmonary disease (COPD) exist but have not undergone large scale validation in primary care. The objective of this study was to externally validate these models using statistical and machine learning approaches. We used a primary care COPD cohort identified using data from the UK Clinical Practice Research Datalink. Age-standardised mortality rates were calculated for the population by gender and discrimination of ADO (age, dyspnoea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnoea, airflow obstruction, smoking, exacerbations) and CODEX (comorbidity, dyspnoea, airflow obstruction, exacerbations) at predicting death over 1-3 years measured using logistic regression and a support vector machine learning (SVM) method of analysis. The age-standardised mortality rate was 32.8 (95%CI 32.5-33.1) and 25.2 (95%CI 25.4-25.7) per 1000 person years for men and women respectively. Complete data were available for 54879 patients to predict 1-year mortality. ADO performed the best (c-statistic of 0.730) compared with DOSE (c-statistic 0.645), COTE (c-statistic 0.655) and CODEX (c-statistic 0.649) at predicting 1-year mortality. Discrimination of ADO and DOSE improved at predicting 1-year mortality when combined with COTE comorbidities (c-statistic 0.780 ADO + COTE; c-statistic 0.727 DOSE + COTE). Discrimination did not change significantly over 1-3 years. Comparable results were observed using SVM. In primary care, ADO appears superior at predicting death in COPD. Performance of ADO and DOSE improved when combined with COTE comorbidities suggesting better models may be generated with additional data facilitated using novel approaches. Copyright © 2018. Published by Elsevier Ltd.

  2. Spatial Statistical Data Fusion (SSDF)

    NASA Technical Reports Server (NTRS)

    Braverman, Amy J.; Nguyen, Hai M.; Cressie, Noel

    2013-01-01

    As remote sensing for scientific purposes has transitioned from an experimental technology to an operational one, the selection of instruments has become more coordinated, so that the scientific community can exploit complementary measurements. However, tech nological and scientific heterogeneity across devices means that the statistical characteristics of the data they collect are different. The challenge addressed here is how to combine heterogeneous remote sensing data sets in a way that yields optimal statistical estimates of the underlying geophysical field, and provides rigorous uncertainty measures for those estimates. Different remote sensing data sets may have different spatial resolutions, different measurement error biases and variances, and other disparate characteristics. A state-of-the-art spatial statistical model was used to relate the true, but not directly observed, geophysical field to noisy, spatial aggregates observed by remote sensing instruments. The spatial covariances of the true field and the covariances of the true field with the observations were modeled. The observations are spatial averages of the true field values, over pixels, with different measurement noise superimposed. A kriging framework is used to infer optimal (minimum mean squared error and unbiased) estimates of the true field at point locations from pixel-level, noisy observations. A key feature of the spatial statistical model is the spatial mixed effects model that underlies it. The approach models the spatial covariance function of the underlying field using linear combinations of basis functions of fixed size. Approaches based on kriging require the inversion of very large spatial covariance matrices, and this is usually done by making simplifying assumptions about spatial covariance structure that simply do not hold for geophysical variables. In contrast, this method does not require these assumptions, and is also computationally much faster. This method is fundamentally different than other approaches to data fusion for remote sensing data because it is inferential rather than merely descriptive. All approaches combine data in a way that minimizes some specified loss function. Most of these are more or less ad hoc criteria based on what looks good to the eye, or some criteria that relate only to the data at hand.

  3. Nonparametric predictive inference for combining diagnostic tests with parametric copula

    NASA Astrophysics Data System (ADS)

    Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

    2017-09-01

    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.

  4. A Flexible Approach for the Statistical Visualization of Ensemble Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Potter, K.; Wilson, A.; Bremer, P.

    2009-09-29

    Scientists are increasingly moving towards ensemble data sets to explore relationships present in dynamic systems. Ensemble data sets combine spatio-temporal simulation results generated using multiple numerical models, sampled input conditions and perturbed parameters. While ensemble data sets are a powerful tool for mitigating uncertainty, they pose significant visualization and analysis challenges due to their complexity. We present a collection of overview and statistical displays linked through a high level of interactivity to provide a framework for gaining key scientific insight into the distribution of the simulation results as well as the uncertainty associated with the data. In contrast to methodsmore » that present large amounts of diverse information in a single display, we argue that combining multiple linked statistical displays yields a clearer presentation of the data and facilitates a greater level of visual data analysis. We demonstrate this approach using driving problems from climate modeling and meteorology and discuss generalizations to other fields.« less

  5. Kansas environmental and resource study: A Great Plains model. Extraction of agricultural statistics from ERTS-1 data of Kansas. [wheat inventory and agriculture land use

    NASA Technical Reports Server (NTRS)

    Morain, S. A. (Principal Investigator); Williams, D. L.

    1974-01-01

    The author has identified the following significant results. Wheat area, yield, and production statistics as derived from satellite image analysis, combined with a weather model, are presented for a ten county area in southwest Kansas. The data (representing the 1972-73 crop year) are compared for accuracy against both the USDA August estimate and its final (official) tabulation. The area estimates from imagery for both dryland and irrigated winter wheat were within 5% of the official figures for the same area, and predated them by almost one year. Yield on dryland wheat was estimated by the Thompson weather model to within 0.1% of the observed yield. A combined irrigated and dryland wheat production estimate for the ten county area was completed in July, 1973 and was within 1% of the production reported by USDA in February, 1974.

  6. Predicting Drug Combination Index and Simulating the Network-Regulation Dynamics by Mathematical Modeling of Drug-Targeted EGFR-ERK Signaling Pathway

    NASA Astrophysics Data System (ADS)

    Huang, Lu; Jiang, Yuyang; Chen, Yuzong

    2017-01-01

    Synergistic drug combinations enable enhanced therapeutics. Their discovery typically involves the measurement and assessment of drug combination index (CI), which can be facilitated by the development and applications of in-silico CI predictive tools. In this work, we developed and tested the ability of a mathematical model of drug-targeted EGFR-ERK pathway in predicting CIs and in analyzing multiple synergistic drug combinations against observations. Our mathematical model was validated against the literature reported signaling, drug response dynamics, and EGFR-MEK drug combination effect. The predicted CIs and combination therapeutic effects of the EGFR-BRaf, BRaf-MEK, FTI-MEK, and FTI-BRaf inhibitor combinations showed consistent synergism. Our results suggest that existing pathway models may be potentially extended for developing drug-targeted pathway models to predict drug combination CI values, isobolograms, and drug-response surfaces as well as to analyze the dynamics of individual and combinations of drugs. With our model, the efficacy of potential drug combinations can be predicted. Our method complements the developed in-silico methods (e.g. the chemogenomic profile and the statistically-inferenced network models) by predicting drug combination effects from the perspectives of pathway dynamics using experimental or validated molecular kinetic constants, thereby facilitating the collective prediction of drug combination effects in diverse ranges of disease systems.

  7. Analysis and Report on SD2000: A Workshop to Determine Structural Dynamics Research for the Millenium

    DTIC Science & Technology

    2000-04-10

    interest. These include Statistical Energy Analysis (SEA), fuzzy structure theory, and approaches combining modal analysis and SEA. Non-determinism...34 arising with increasing frequency. This has led to Statistical Energy Analysis , in which a system is modelled as a collection of coupled subsystems...22. IUTAM Symposium on Statistical Energy Analysis . 1999 Ed. F.J. Fahy and W.G. Price. Kluwer Academic Publishing. • 23. R.S. Langley and P

  8. α -induced reactions on 115In: Cross section measurements and statistical model analysis

    NASA Astrophysics Data System (ADS)

    Kiss, G. G.; Szücs, T.; Mohr, P.; Török, Zs.; Huszánk, R.; Gyürky, Gy.; Fülöp, Zs.

    2018-05-01

    Background: α -nucleus optical potentials are basic ingredients of statistical model calculations used in nucleosynthesis simulations. While the nucleon+nucleus optical potential is fairly well known, for the α +nucleus optical potential several different parameter sets exist and large deviations, reaching sometimes even an order of magnitude, are found between the cross section predictions calculated using different parameter sets. Purpose: A measurement of the radiative α -capture and the α -induced reaction cross sections on the nucleus 115In at low energies allows a stringent test of statistical model predictions. Since experimental data are scarce in this mass region, this measurement can be an important input to test the global applicability of α +nucleus optical model potentials and further ingredients of the statistical model. Methods: The reaction cross sections were measured by means of the activation method. The produced activities were determined by off-line detection of the γ rays and characteristic x rays emitted during the electron capture decay of the produced Sb isotopes. The 115In(α ,γ )119Sb and 115In(α ,n )Sb118m reaction cross sections were measured between Ec .m .=8.83 and 15.58 MeV, and the 115In(α ,n )Sb118g reaction was studied between Ec .m .=11.10 and 15.58 MeV. The theoretical analysis was performed within the statistical model. Results: The simultaneous measurement of the (α ,γ ) and (α ,n ) cross sections allowed us to determine a best-fit combination of all parameters for the statistical model. The α +nucleus optical potential is identified as the most important input for the statistical model. The best fit is obtained for the new Atomki-V1 potential, and good reproduction of the experimental data is also achieved for the first version of the Demetriou potentials and the simple McFadden-Satchler potential. The nucleon optical potential, the γ -ray strength function, and the level density parametrization are also constrained by the data although there is no unique best-fit combination. Conclusions: The best-fit calculations allow us to extrapolate the low-energy (α ,γ ) cross section of 115In to the astrophysical Gamow window with reasonable uncertainties. However, still further improvements of the α -nucleus potential are required for a global description of elastic (α ,α ) scattering and α -induced reactions in a wide range of masses and energies.

  9. Sound texture perception via statistics of the auditory periphery: Evidence from sound synthesis

    PubMed Central

    McDermott, Josh H.; Simoncelli, Eero P.

    2014-01-01

    Rainstorms, insect swarms, and galloping horses produce “sound textures” – the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics. To test this hypothesis, we processed real-world textures with an auditory model containing filters tuned for sound frequencies and their modulations, and measured statistics of the resulting decomposition. We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures. However, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation. PMID:21903084

  10. A comparison of linear and nonlinear statistical techniques in performance attribution.

    PubMed

    Chan, N H; Genovese, C R

    2001-01-01

    Performance attribution is usually conducted under the linear framework of multifactor models. Although commonly used by practitioners in finance, linear multifactor models are known to be less than satisfactory in many situations. After a brief survey of nonlinear methods, nonlinear statistical techniques are applied to performance attribution of a portfolio constructed from a fixed universe of stocks using factors derived from some commonly used cross sectional linear multifactor models. By rebalancing this portfolio monthly, the cumulative returns for procedures based on standard linear multifactor model and three nonlinear techniques-model selection, additive models, and neural networks-are calculated and compared. It is found that the first two nonlinear techniques, especially in combination, outperform the standard linear model. The results in the neural-network case are inconclusive because of the great variety of possible models. Although these methods are more complicated and may require some tuning, toolboxes are developed and suggestions on calibration are proposed. This paper demonstrates the usefulness of modern nonlinear statistical techniques in performance attribution.

  11. Modeling and Assessment of GPS/BDS Combined Precise Point Positioning.

    PubMed

    Chen, Junping; Wang, Jungang; Zhang, Yize; Yang, Sainan; Chen, Qian; Gong, Xiuqiang

    2016-07-22

    Precise Point Positioning (PPP) technique enables stand-alone receivers to obtain cm-level positioning accuracy. Observations from multi-GNSS systems can augment users with improved positioning accuracy, reliability and availability. In this paper, we present and evaluate the GPS/BDS combined PPP models, including the traditional model and a simplified model, where the inter-system bias (ISB) is treated in different way. To evaluate the performance of combined GPS/BDS PPP, kinematic and static PPP positions are compared to the IGS daily estimates, where 1 month GPS/BDS data of 11 IGS Multi-GNSS Experiment (MGEX) stations are used. The results indicate apparent improvement of GPS/BDS combined PPP solutions in both static and kinematic cases, where much smaller standard deviations are presented in the magnitude distribution of coordinates RMS statistics. Comparisons between the traditional and simplified combined PPP models show no difference in coordinate estimations, and the inter system biases between the GPS/BDS system are assimilated into receiver clock, ambiguities and pseudo-range residuals accordingly.

  12. Effect of Internet-Based Cognitive Apprenticeship Model (i-CAM) on Statistics Learning among Postgraduate Students

    PubMed Central

    Saadati, Farzaneh; Ahmad Tarmizi, Rohani

    2015-01-01

    Because students’ ability to use statistics, which is mathematical in nature, is one of the concerns of educators, embedding within an e-learning system the pedagogical characteristics of learning is ‘value added’ because it facilitates the conventional method of learning mathematics. Many researchers emphasize the effectiveness of cognitive apprenticeship in learning and problem solving in the workplace. In a cognitive apprenticeship learning model, skills are learned within a community of practitioners through observation of modelling and then practice plus coaching. This study utilized an internet-based Cognitive Apprenticeship Model (i-CAM) in three phases and evaluated its effectiveness for improving statistics problem-solving performance among postgraduate students. The results showed that, when compared to the conventional mathematics learning model, the i-CAM could significantly promote students’ problem-solving performance at the end of each phase. In addition, the combination of the differences in students' test scores were considered to be statistically significant after controlling for the pre-test scores. The findings conveyed in this paper confirmed the considerable value of i-CAM in the improvement of statistics learning for non-specialized postgraduate students. PMID:26132553

  13. A hybrid model for predicting carbon monoxide from vehicular exhausts in urban environments

    NASA Astrophysics Data System (ADS)

    Gokhale, Sharad; Khare, Mukesh

    Several deterministic-based air quality models evaluate and predict the frequently occurring pollutant concentration well but, in general, are incapable of predicting the 'extreme' concentrations. In contrast, the statistical distribution models overcome the above limitation of the deterministic models and predict the 'extreme' concentrations. However, the environmental damages are caused by both extremes as well as by the sustained average concentration of pollutants. Hence, the model should predict not only 'extreme' ranges but also the 'middle' ranges of pollutant concentrations, i.e. the entire range. Hybrid modelling is one of the techniques that estimates/predicts the 'entire range' of the distribution of pollutant concentrations by combining the deterministic based models with suitable statistical distribution models ( Jakeman, et al., 1988). In the present paper, a hybrid model has been developed to predict the carbon monoxide (CO) concentration distributions at one of the traffic intersections, Income Tax Office (ITO), in the Delhi city, where the traffic is heterogeneous in nature and meteorology is 'tropical'. The model combines the general finite line source model (GFLSM) as its deterministic, and log logistic distribution (LLD) model, as its statistical components. The hybrid (GFLSM-LLD) model is then applied at the ITO intersection. The results show that the hybrid model predictions match with that of the observed CO concentration data within the 5-99 percentiles range. The model is further validated at different street location, i.e. Sirifort roadway. The validation results show that the model predicts CO concentrations fairly well ( d=0.91) in 10-95 percentiles range. The regulatory compliance is also developed to estimate the probability of exceedance of hourly CO concentration beyond the National Ambient Air Quality Standards (NAAQS) of India. It consists of light vehicles, heavy vehicles, three- wheelers (auto rickshaws) and two-wheelers (scooters, motorcycles, etc).

  14. Non-targeted 1H NMR fingerprinting and multivariate statistical analyses for the characterisation of the geographical origin of Italian sweet cherries.

    PubMed

    Longobardi, F; Ventrella, A; Bianco, A; Catucci, L; Cafagna, I; Gallo, V; Mastrorilli, P; Agostiano, A

    2013-12-01

    In this study, non-targeted (1)H NMR fingerprinting was used in combination with multivariate statistical techniques for the classification of Italian sweet cherries based on their different geographical origins (Emilia Romagna and Puglia). As classification techniques, Soft Independent Modelling of Class Analogy (SIMCA), Partial Least Squares Discriminant Analysis (PLS-DA), and Linear Discriminant Analysis (LDA) were carried out and the results were compared. For LDA, before performing a refined selection of the number/combination of variables, two different strategies for a preliminary reduction of the variable number were tested. The best average recognition and CV prediction abilities (both 100.0%) were obtained for all the LDA models, although PLS-DA also showed remarkable performances (94.6%). All the statistical models were validated by observing the prediction abilities with respect to an external set of cherry samples. The best result (94.9%) was obtained with LDA by performing a best subset selection procedure on a set of 30 principal components previously selected by a stepwise decorrelation. The metabolites that mostly contributed to the classification performances of such LDA model, were found to be malate, glucose, fructose, glutamine and succinate. Copyright © 2013 Elsevier Ltd. All rights reserved.

  15. The study of combining Latin Hypercube Sampling method and LU decomposition method (LULHS method) for constructing spatial random field

    NASA Astrophysics Data System (ADS)

    WANG, P. T.

    2015-12-01

    Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.

  16. Statistical modeling of landslide hazard using GIS

    Treesearch

    Peter V. Gorsevski; Randy B. Foltz; Paul E. Gessler; Terrance W. Cundy

    2001-01-01

    A model for spatial prediction of landslide hazard was applied to a watershed affected by landslide events that occurred during the winter of 1995-96, following heavy rains, and snowmelt. Digital elevation data with 22.86 m x 22.86 m resolution was used for deriving topographic attributes used for modeling. The model is based on the combination of logistic regression...

  17. Integrating satellite imagery with simulation modeling to improve burn severity mapping

    Treesearch

    Eva C. Karau; Pamela G. Sikkink; Robert E. Keane; Gregory K. Dillon

    2014-01-01

    Both satellite imagery and spatial fire effects models are valuable tools for generating burn severity maps that are useful to fire scientists and resource managers. The purpose of this study was to test a new mapping approach that integrates imagery and modeling to create more accurate burn severity maps. We developed and assessed a statistical model that combines the...

  18. Decompression models: review, relevance and validation capabilities.

    PubMed

    Hugon, J

    2014-01-01

    For more than a century, several types of mathematical models have been proposed to describe tissue desaturation mechanisms in order to limit decompression sickness. These models are statistically assessed by DCS cases, and, over time, have gradually included bubble formation biophysics. This paper proposes to review this evolution and discuss its limitations. This review is organized around the comparison of decompression model biophysical criteria and theoretical foundations. Then, the DCS-predictive capability was analyzed to assess whether it could be improved by combining different approaches. Most of the operational decompression models have a neo-Haldanian form. Nevertheless, bubble modeling has been gaining popularity, and the circulating bubble amount has become a major output. By merging both views, it seems possible to build a relevant global decompression model that intends to simulate bubble production while predicting DCS risks for all types of exposures and decompression profiles. A statistical approach combining both DCS and bubble detection databases has to be developed to calibrate a global decompression model. Doppler ultrasound and DCS data are essential: i. to make correlation and validation phases reliable; ii. to adjust biophysical criteria to fit at best the observed bubble kinetics; and iii. to build a relevant risk function.

  19. Valid Statistical Analysis for Logistic Regression with Multiple Sources

    NASA Astrophysics Data System (ADS)

    Fienberg, Stephen E.; Nardi, Yuval; Slavković, Aleksandra B.

    Considerable effort has gone into understanding issues of privacy protection of individual information in single databases, and various solutions have been proposed depending on the nature of the data, the ways in which the database will be used and the precise nature of the privacy protection being offered. Once data are merged across sources, however, the nature of the problem becomes far more complex and a number of privacy issues arise for the linked individual files that go well beyond those that are considered with regard to the data within individual sources. In the paper, we propose an approach that gives full statistical analysis on the combined database without actually combining it. We focus mainly on logistic regression, but the method and tools described may be applied essentially to other statistical models as well.

  20. Society of Thoracic Surgeons 2008 cardiac risk models predict in-hospital mortality of heart valve surgery in a Chinese population: a multicenter study.

    PubMed

    Wang, Lv; Lu, Fang-Lin; Wang, Chong; Tan, Meng-Wei; Xu, Zhi-yun

    2014-12-01

    The Society of Thoracic Surgeons 2008 cardiac surgery risk models have been developed for heart valve surgery with and without coronary artery bypass grafting. The aim of our study was to evaluate the performance of Society of Thoracic Surgeons 2008 cardiac risk models in Chinese patients undergoing single valve surgery and the predicted mortality rates of those undergoing multiple valve surgery derived from the Society of Thoracic Surgeons 2008 risk models. A total of 12,170 patients underwent heart valve surgery from January 2008 to December 2011. Combined congenital heart surgery and aortal surgery cases were excluded. A relatively small number of valve surgery combinations were excluded. The final research population included the following isolated heart valve surgery types: aortic valve replacement, mitral valve replacement, and mitral valve repair. The following combined valve surgery types were included: mitral valve replacement plus tricuspid valve repair, mitral valve replacement plus aortic valve replacement, and mitral valve replacement plus aortic valve replacement and tricuspid valve repair. Evaluation was performed by using the Hosmer-Lemeshow test and C-statistics. Data from 9846 patients were analyzed. The Society of Thoracic Surgeons 2008 cardiac risk models showed reasonable discrimination and poor calibration (C-statistic, 0.712; P = .00006 in Hosmer-Lemeshow test). Society of Thoracic Surgeons 2008 models had better discrimination (C-statistic, 0.734) and calibration (P = .5805) in patients undergoing isolated valve surgery than in patients undergoing multiple valve surgery (C-statistic, 0.694; P = .00002 in Hosmer-Lemeshow test). Estimates derived from the Society of Thoracic Surgeons 2008 models exceeded the mortality rates of multiple valve surgery (observed/expected ratios of 1.44 for multiple valve surgery and 1.17 for single valve surgery). The Society of Thoracic Surgeons 2008 cardiac surgery risk models performed well when predicting the mortality for Chinese patients undergoing valve surgery. The Society of Thoracic Surgeons 2008 models were suitable for single valve surgery in a Chinese population; estimates of mortality for multiple valve surgery derived from the Society of Thoracic Surgeons 2008 models were less accurate. Copyright © 2014 The American Association for Thoracic Surgery. Published by Elsevier Inc. All rights reserved.

  1. Strategies for Reduced-Order Models in Uncertainty Quantification of Complex Turbulent Dynamical Systems

    NASA Astrophysics Data System (ADS)

    Qi, Di

    Turbulent dynamical systems are ubiquitous in science and engineering. Uncertainty quantification (UQ) in turbulent dynamical systems is a grand challenge where the goal is to obtain statistical estimates for key physical quantities. In the development of a proper UQ scheme for systems characterized by both a high-dimensional phase space and a large number of instabilities, significant model errors compared with the true natural signal are always unavoidable due to both the imperfect understanding of the underlying physical processes and the limited computational resources available. One central issue in contemporary research is the development of a systematic methodology for reduced order models that can recover the crucial features both with model fidelity in statistical equilibrium and with model sensitivity in response to perturbations. In the first part, we discuss a general mathematical framework to construct statistically accurate reduced-order models that have skill in capturing the statistical variability in the principal directions of a general class of complex systems with quadratic nonlinearity. A systematic hierarchy of simple statistical closure schemes, which are built through new global statistical energy conservation principles combined with statistical equilibrium fidelity, are designed and tested for UQ of these problems. Second, the capacity of imperfect low-order stochastic approximations to model extreme events in a passive scalar field advected by turbulent flows is investigated. The effects in complicated flow systems are considered including strong nonlinear and non-Gaussian interactions, and much simpler and cheaper imperfect models with model error are constructed to capture the crucial statistical features in the stationary tracer field. Several mathematical ideas are introduced to improve the prediction skill of the imperfect reduced-order models. Most importantly, empirical information theory and statistical linear response theory are applied in the training phase for calibrating model errors to achieve optimal imperfect model parameters; and total statistical energy dynamics are introduced to improve the model sensitivity in the prediction phase especially when strong external perturbations are exerted. The validity of reduced-order models for predicting statistical responses and intermittency is demonstrated on a series of instructive models with increasing complexity, including the stochastic triad model, the Lorenz '96 model, and models for barotropic and baroclinic turbulence. The skillful low-order modeling methods developed here should also be useful for other applications such as efficient algorithms for data assimilation.

  2. A statistical model describing combined irreversible electroporation and electroporation-induced blood-brain barrier disruption.

    PubMed

    Sharabi, Shirley; Kos, Bor; Last, David; Guez, David; Daniels, Dianne; Harnof, Sagi; Mardor, Yael; Miklavcic, Damijan

    2016-03-01

    Electroporation-based therapies such as electrochemotherapy (ECT) and irreversible electroporation (IRE) are emerging as promising tools for treatment of tumors. When applied to the brain, electroporation can also induce transient blood-brain-barrier (BBB) disruption in volumes extending beyond IRE, thus enabling efficient drug penetration. The main objective of this study was to develop a statistical model predicting cell death and BBB disruption induced by electroporation. This model can be used for individual treatment planning. Cell death and BBB disruption models were developed based on the Peleg-Fermi model in combination with numerical models of the electric field. The model calculates the electric field thresholds for cell kill and BBB disruption and describes the dependence on the number of treatment pulses. The model was validated using in vivo experimental data consisting of rats brains MRIs post electroporation treatments. Linear regression analysis confirmed that the model described the IRE and BBB disruption volumes as a function of treatment pulses number (r(2) = 0.79; p < 0.008, r(2) = 0.91; p < 0.001). The results presented a strong plateau effect as the pulse number increased. The ratio between complete cell death and no cell death thresholds was relatively narrow (between 0.88-0.91) even for small numbers of pulses and depended weakly on the number of pulses. For BBB disruption, the ratio increased with the number of pulses. BBB disruption radii were on average 67% ± 11% larger than IRE volumes. The statistical model can be used to describe the dependence of treatment-effects on the number of pulses independent of the experimental setup.

  3. Data mining: sophisticated forms of managed care modeling through artificial intelligence.

    PubMed

    Borok, L S

    1997-01-01

    Data mining is a recent development in computer science that combines artificial intelligence algorithms and relational databases to discover patterns automatically, without the use of traditional statistical methods. Work with data mining tools in health care is in a developmental stage that holds great promise, given the combination of demographic and diagnostic information.

  4. Modeling the Development of Audiovisual Cue Integration in Speech Perception

    PubMed Central

    Getz, Laura M.; Nordeen, Elke R.; Vrabic, Sarah C.; Toscano, Joseph C.

    2017-01-01

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues. PMID:28335558

  5. Modeling the Development of Audiovisual Cue Integration in Speech Perception.

    PubMed

    Getz, Laura M; Nordeen, Elke R; Vrabic, Sarah C; Toscano, Joseph C

    2017-03-21

    Adult speech perception is generally enhanced when information is provided from multiple modalities. In contrast, infants do not appear to benefit from combining auditory and visual speech information early in development. This is true despite the fact that both modalities are important to speech comprehension even at early stages of language acquisition. How then do listeners learn how to process auditory and visual information as part of a unified signal? In the auditory domain, statistical learning processes provide an excellent mechanism for acquiring phonological categories. Is this also true for the more complex problem of acquiring audiovisual correspondences, which require the learner to integrate information from multiple modalities? In this paper, we present simulations using Gaussian mixture models (GMMs) that learn cue weights and combine cues on the basis of their distributional statistics. First, we simulate the developmental process of acquiring phonological categories from auditory and visual cues, asking whether simple statistical learning approaches are sufficient for learning multi-modal representations. Second, we use this time course information to explain audiovisual speech perception in adult perceivers, including cases where auditory and visual input are mismatched. Overall, we find that domain-general statistical learning techniques allow us to model the developmental trajectory of audiovisual cue integration in speech, and in turn, allow us to better understand the mechanisms that give rise to unified percepts based on multiple cues.

  6. Studies in the use of cloud type statistics in mission simulation

    NASA Technical Reports Server (NTRS)

    Fowler, M. G.; Willand, J. H.; Chang, D. T.; Cogan, J. L.

    1974-01-01

    A study to further improve NASA's global cloud statistics for mission simulation is reported. Regional homogeneity in cloud types was examined; most of the original region boundaries defined for cloud cover amount in previous studies were supported by the statistics on cloud types and the number of cloud layers. Conditionality in cloud statistics was also examined with special emphasis on temporal and spatial dependencies, and cloud type interdependence. Temporal conditionality was found up to 12 hours, and spatial conditionality up to 200 miles; the diurnal cycle in convective cloudiness was clearly evident. As expected, the joint occurrence of different cloud types reflected the dynamic processes which form the clouds. Other phases of the study improved the cloud type statistics for several region and proposed a mission simulation scheme combining the 4-dimensional atmospheric model, sponsored by MSFC, with the global cloud model.

  7. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  8. Identification of reliable gridded reference data for statistical downscaling methods in Alberta

    NASA Astrophysics Data System (ADS)

    Eum, H. I.; Gupta, A.

    2017-12-01

    Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system was developed to allow users to easily select the most reliable reference climate data at each target point based on the elevation of grid cell. By constructing the best combination of reference data for the study domain, the accurate and reliable statistically downscaled climate projections could be significantly improved.

  9. Progress of statistical analysis in biomedical research through the historical review of the development of the Framingham score.

    PubMed

    Ignjatović, Aleksandra; Stojanović, Miodrag; Milošević, Zoran; Anđelković Apostolović, Marija

    2017-12-02

    The interest in developing risk models in medicine not only is appealing, but also associated with many obstacles in different aspects of predictive model development. Initially, the association of biomarkers or the association of more markers with the specific outcome was proven by statistical significance, but novel and demanding questions required the development of new and more complex statistical techniques. Progress of statistical analysis in biomedical research can be observed the best through the history of the Framingham study and development of the Framingham score. Evaluation of predictive models comes from a combination of the facts which are results of several metrics. Using logistic regression and Cox proportional hazards regression analysis, the calibration test, and the ROC curve analysis should be mandatory and eliminatory, and the central place should be taken by some new statistical techniques. In order to obtain complete information related to the new marker in the model, recently, there is a recommendation to use the reclassification tables by calculating the net reclassification index and the integrated discrimination improvement. Decision curve analysis is a novel method for evaluating the clinical usefulness of a predictive model. It may be noted that customizing and fine-tuning of the Framingham risk score initiated the development of statistical analysis. Clinically applicable predictive model should be a trade-off between all abovementioned statistical metrics, a trade-off between calibration and discrimination, accuracy and decision-making, costs and benefits, and quality and quantity of patient's life.

  10. Relationship of body weight parameters with the incidence of common spontaneous tumors in Tg.rasH2 mice.

    PubMed

    Paranjpe, Madhav G; Denton, Melissa D; Vidmar, Tom J; Elbekai, Reem H

    2014-10-01

    The mechanistic relationship between increased food consumption, increased body weights, and increased incidence of tumors has been well established in 2-year rodent models. Body weight parameters such as initial body weights, terminal body weights, food consumption, and the body weight gains in grams and percentages were analyzed to determine whether such relationship exists between these parameters with the incidence of common spontaneous tumors in Tg.rasH2 mice. None of these body weight parameters had any statistically significant relationship with the incidence of common spontaneous tumors in Tg.rasH2 males, namely lung tumors, splenic hemangiosarcomas, nonsplenic hemangiosarcomas, combined incidence of all hemangiosarcomas, and Harderian gland tumors. These parameters also did not have any statistically significant relationship with the incidence of lung and Harderian gland tumors in females. However, in females, increased initial body weights did have a statistically significant relationship with the nonsplenic hemangiosarcomas, and increased terminal body weights did have a statistically significant relationship with the incidence of splenic hemangiosarcomas, nonsplenic hemangiosarcomas, and the combined incidence of all hemangiosarcomas. In addition, increased body weight gains in grams and percentages had a statistically significant relationship with the combined incidence of all hemangiosarcomas in females, but not separately with splenic and nonsplenic hemangiosarcomas. © 2013 by The Author(s).

  11. Modeling Nonignorable Missing Data in Speeded Tests

    ERIC Educational Resources Information Center

    Glas, Cees A. W.; Pimentel, Jonald L.

    2008-01-01

    In tests with time limits, items at the end are often not reached. Usually, the pattern of missing responses depends on the ability level of the respondents; therefore, missing data are not ignorable in statistical inference. This study models data using a combination of two item response theory (IRT) models: one for the observed response data and…

  12. Probabilistic/Fracture-Mechanics Model For Service Life

    NASA Technical Reports Server (NTRS)

    Watkins, T., Jr.; Annis, C. G., Jr.

    1991-01-01

    Computer program makes probabilistic estimates of lifetime of engine and components thereof. Developed to fill need for more accurate life-assessment technique that avoids errors in estimated lives and provides for statistical assessment of levels of risk created by engineering decisions in designing system. Implements mathematical model combining techniques of statistics, fatigue, fracture mechanics, nondestructive analysis, life-cycle cost analysis, and management of engine parts. Used to investigate effects of such engine-component life-controlling parameters as return-to-service intervals, stresses, capabilities for nondestructive evaluation, and qualities of materials.

  13. Counteracting structural errors in ensemble forecast of influenza outbreaks.

    PubMed

    Pei, Sen; Shaman, Jeffrey

    2017-10-13

    For influenza forecasts generated using dynamical models, forecast inaccuracy is partly attributable to the nonlinear growth of error. As a consequence, quantification of the nonlinear error structure in current forecast models is needed so that this growth can be corrected and forecast skill improved. Here, we inspect the error growth of a compartmental influenza model and find that a robust error structure arises naturally from the nonlinear model dynamics. By counteracting these structural errors, diagnosed using error breeding, we develop a new forecast approach that combines dynamical error correction and statistical filtering techniques. In retrospective forecasts of historical influenza outbreaks for 95 US cities from 2003 to 2014, overall forecast accuracy for outbreak peak timing, peak intensity and attack rate, are substantially improved for predicted lead times up to 10 weeks. This error growth correction method can be generalized to improve the forecast accuracy of other infectious disease dynamical models.Inaccuracy of influenza forecasts based on dynamical models is partly due to nonlinear error growth. Here the authors address the error structure of a compartmental influenza model, and develop a new improved forecast approach combining dynamical error correction and statistical filtering techniques.

  14. A model for indexing medical documents combining statistical and symbolic knowledge.

    PubMed

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-10-11

    To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. The use of several terminologies leads to more precise indexing. The improvement achieved in the models implementation performances as a result of using semantic relationships is encouraging.

  15. Liver segmentation from CT images using a sparse priori statistical shape model (SP-SSM).

    PubMed

    Wang, Xuehu; Zheng, Yongchang; Gan, Lan; Wang, Xuan; Sang, Xinting; Kong, Xiangfeng; Zhao, Jie

    2017-01-01

    This study proposes a new liver segmentation method based on a sparse a priori statistical shape model (SP-SSM). First, mark points are selected in the liver a priori model and the original image. Then, the a priori shape and its mark points are used to obtain a dictionary for the liver boundary information. Second, the sparse coefficient is calculated based on the correspondence between mark points in the original image and those in the a priori model, and then the sparse statistical model is established by combining the sparse coefficients and the dictionary. Finally, the intensity energy and boundary energy models are built based on the intensity information and the specific boundary information of the original image. Then, the sparse matching constraint model is established based on the sparse coding theory. These models jointly drive the iterative deformation of the sparse statistical model to approximate and accurately extract the liver boundaries. This method can solve the problems of deformation model initialization and a priori method accuracy using the sparse dictionary. The SP-SSM can achieve a mean overlap error of 4.8% and a mean volume difference of 1.8%, whereas the average symmetric surface distance and the root mean square symmetric surface distance can reach 0.8 mm and 1.4 mm, respectively.

  16. Statistical Methods for Rapid Aerothermal Analysis and Design Technology: Validation

    NASA Technical Reports Server (NTRS)

    DePriest, Douglas; Morgan, Carolyn

    2003-01-01

    The cost and safety goals for NASA s next generation of reusable launch vehicle (RLV) will require that rapid high-fidelity aerothermodynamic design tools be used early in the design cycle. To meet these requirements, it is desirable to identify adequate statistical models that quantify and improve the accuracy, extend the applicability, and enable combined analyses using existing prediction tools. The initial research work focused on establishing suitable candidate models for these purposes. The second phase is focused on assessing the performance of these models to accurately predict the heat rate for a given candidate data set. This validation work compared models and methods that may be useful in predicting the heat rate.

  17. Local Inflammation in Fracture Hematoma: Results from a Combined Trauma Model in Pigs

    PubMed Central

    Horst, K.; Eschbach, D.; Pfeifer, R.; Hübenthal, S.; Sassen, M.; Steinfeldt, T.; Wulf, H.; Ruchholtz, S.; Pape, H. C.; Hildebrand, F.

    2015-01-01

    Background. Previous studies showed significant interaction between the local and systemic inflammatory response after severe trauma in small animal models. The purpose of this study was to establish a new combined trauma model in pigs to investigate fracture-associated local inflammation and gain information about the early inflammatory stages after polytrauma. Material and Methods. Combined trauma consisted of tibial fracture, lung contusion, liver laceration, and controlled hemorrhage. Animals were mechanically ventilated and under ICU-monitoring for 48 h. Blood and fracture hematoma samples were collected during the time course of the study. Local and systemic levels of serum cytokines and diverse alarmins were measured by ELISA kit. Results. A statistical significant difference in the systemic serum values of IL-6 and HMGB1 was observed when compared to the sham. Moreover, there was a statistical significant difference in the serum values of the fracture hematoma of IL-6, IL-8, IL-10, and HMGB1 when compared to the systemic inflammatory response. However a decrease of local proinflammatory concentrations was observed while anti-inflammatory mediators increased. Conclusion. Our data showed a time-dependent activation of the local and systemic inflammatory response. Indeed it is the first study focusing on the local and systemic inflammatory response to multiple-trauma in a large animal model. PMID:25694748

  18. Statistical learning and adaptive decision-making underlie human response time variability in inhibitory control.

    PubMed

    Ma, Ning; Yu, Angela J

    2015-01-01

    Response time (RT) is an oft-reported behavioral measure in psychological and neurocognitive experiments, but the high level of observed trial-to-trial variability in this measure has often limited its usefulness. Here, we combine computational modeling and psychophysics to examine the hypothesis that fluctuations in this noisy measure reflect dynamic computations in human statistical learning and corresponding cognitive adjustments. We present data from the stop-signal task (SST), in which subjects respond to a go stimulus on each trial, unless instructed not to by a subsequent, infrequently presented stop signal. We model across-trial learning of stop signal frequency, P(stop), and stop-signal onset time, SSD (stop-signal delay), with a Bayesian hidden Markov model, and within-trial decision-making with an optimal stochastic control model. The combined model predicts that RT should increase with both expected P(stop) and SSD. The human behavioral data (n = 20) bear out this prediction, showing P(stop) and SSD both to be significant, independent predictors of RT, with P(stop) being a more prominent predictor in 75% of the subjects, and SSD being more prominent in the remaining 25%. The results demonstrate that humans indeed readily internalize environmental statistics and adjust their cognitive/behavioral strategy accordingly, and that subtle patterns in RT variability can serve as a valuable tool for validating models of statistical learning and decision-making. More broadly, the modeling tools presented in this work can be generalized to a large body of behavioral paradigms, in order to extract insights about cognitive and neural processing from apparently quite noisy behavioral measures. We also discuss how this behaviorally validated model can then be used to conduct model-based analysis of neural data, in order to help identify specific brain areas for representing and encoding key computational quantities in learning and decision-making.

  19. Statistical limitations in functional neuroimaging. I. Non-inferential methods and statistical models.

    PubMed Central

    Petersson, K M; Nichols, T E; Poline, J B; Holmes, A P

    1999-01-01

    Functional neuroimaging (FNI) provides experimental access to the intact living brain making it possible to study higher cognitive functions in humans. In this review and in a companion paper in this issue, we discuss some common methods used to analyse FNI data. The emphasis in both papers is on assumptions and limitations of the methods reviewed. There are several methods available to analyse FNI data indicating that none is optimal for all purposes. In order to make optimal use of the methods available it is important to know the limits of applicability. For the interpretation of FNI results it is also important to take into account the assumptions, approximations and inherent limitations of the methods used. This paper gives a brief overview over some non-inferential descriptive methods and common statistical models used in FNI. Issues relating to the complex problem of model selection are discussed. In general, proper model selection is a necessary prerequisite for the validity of the subsequent statistical inference. The non-inferential section describes methods that, combined with inspection of parameter estimates and other simple measures, can aid in the process of model selection and verification of assumptions. The section on statistical models covers approaches to global normalization and some aspects of univariate, multivariate, and Bayesian models. Finally, approaches to functional connectivity and effective connectivity are discussed. In the companion paper we review issues related to signal detection and statistical inference. PMID:10466149

  20. A Statistical Method for Synthesizing Mediation Analyses Using the Product of Coefficient Approach Across Multiple Trials

    PubMed Central

    Huang, Shi; MacKinnon, David P.; Perrino, Tatiana; Gallo, Carlos; Cruden, Gracelyn; Brown, C Hendricks

    2016-01-01

    Mediation analysis often requires larger sample sizes than main effect analysis to achieve the same statistical power. Combining results across similar trials may be the only practical option for increasing statistical power for mediation analysis in some situations. In this paper, we propose a method to estimate: 1) marginal means for mediation path a, the relation of the independent variable to the mediator; 2) marginal means for path b, the relation of the mediator to the outcome, across multiple trials; and 3) the between-trial level variance-covariance matrix based on a bivariate normal distribution. We present the statistical theory and an R computer program to combine regression coefficients from multiple trials to estimate a combined mediated effect and confidence interval under a random effects model. Values of coefficients a and b, along with their standard errors from each trial are the input for the method. This marginal likelihood based approach with Monte Carlo confidence intervals provides more accurate inference than the standard meta-analytic approach. We discuss computational issues, apply the method to two real-data examples and make recommendations for the use of the method in different settings. PMID:28239330

  1. Lessons learned while integrating habitat, dispersal, disturbance, and life-history traits into species habitat models under climate change

    Treesearch

    Louis R. Iverson; Anantha M. Prasad; Stephen N. Matthews; Matthew P. Peters

    2011-01-01

    We present an approach to modeling potential climate-driven changes in habitat for tree and bird species in the eastern United States. First, we took an empirical-statistical modeling approach, using randomForest, with species abundance data from national inventories combined with soil, climate, and landscape variables, to build abundance-based habitat models for 134...

  2. Deep space network software cost estimation model

    NASA Technical Reports Server (NTRS)

    Tausworthe, R. C.

    1981-01-01

    A parametric software cost estimation model prepared for Jet PRopulsion Laboratory (JPL) Deep Space Network (DSN) Data System implementation tasks is described. The resource estimation mdel modifies and combines a number of existing models. The model calibrates the task magnitude and difficulty, development environment, and software technology effects through prompted responses to a set of approximately 50 questions. Parameters in the model are adjusted to fit JPL software life-cycle statistics.

  3. Is There Any Real Observational Contradictoty To The Lcdm Model?

    NASA Astrophysics Data System (ADS)

    Ma, Yin-Zhe

    2011-01-01

    In this talk, I am going to question the two apparent observational contradictories to LCDM cosmology---- the lack of large angle correlations in the cosmic microwave background, and the very large bulk flow of galaxy peculiar velocities. On the super-horizon scale, "Copi etal. (2009)” have been arguing that the lack of large angular correlations of the CMB temperature field provides strong evidence against the standard, statistically isotropic, LCDM cosmology. I am going to argue that the "ad-hoc” discrepancy is due to the sub-optimal estimator of the low-l multipoles, and a posteriori statistics, which exaggerates the statistical significance. On Galactic scales, "Watkins et al. (2008)” shows that the very large bulk flow prefers a very large density fluctuation, which seems to contradict to the LCDM model. I am going to show that these results are due to their underestimation of the small scale velocity dispersion, and an arbitrary way of combining catalogues. With the appropriate way of combining catalogue data, as well as the treating the small scale velocity dispersion as a free parameter, the peculiar velocity field provides unconvincing evidence against LCDM cosmology.

  4. Structure-Specific Statistical Mapping of White Matter Tracts

    PubMed Central

    Yushkevich, Paul A.; Zhang, Hui; Simon, Tony; Gee, James C.

    2008-01-01

    We present a new model-based framework for the statistical analysis of diffusion imaging data associated with specific white matter tracts. The framework takes advantage of the fact that several of the major white matter tracts are thin sheet-like structures that can be effectively modeled by medial representations. The approach involves segmenting major tracts and fitting them with deformable geometric medial models. The medial representation makes it possible to average and combine tensor-based features along directions locally perpendicular to the tracts, thus reducing data dimensionality and accounting for errors in normalization. The framework enables the analysis of individual white matter structures, and provides a range of possibilities for computing statistics and visualizing differences between cohorts. The framework is demonstrated in a study of white matter differences in pediatric chromosome 22q11.2 deletion syndrome. PMID:18407524

  5. LES/PDF studies of joint statistics of mixture fraction and progress variable in piloted methane jet flames with inhomogeneous inlet flows

    NASA Astrophysics Data System (ADS)

    Zhang, Pei; Barlow, Robert; Masri, Assaad; Wang, Haifeng

    2016-11-01

    The mixture fraction and progress variable are often used as independent variables for describing turbulent premixed and non-premixed flames. There is a growing interest in using these two variables for describing partially premixed flames. The joint statistical distribution of the mixture fraction and progress variable is of great interest in developing models for partially premixed flames. In this work, we conduct predictive studies of the joint statistics of mixture fraction and progress variable in a series of piloted methane jet flames with inhomogeneous inlet flows. The employed models combine large eddy simulations with the Monte Carlo probability density function (PDF) method. The joint PDFs and marginal PDFs are examined in detail by comparing the model predictions and the measurements. Different presumed shapes of the joint PDFs are also evaluated.

  6. The influence of the interactions between anthropogenic activities and multiple ecological factors on land surface temperatures of urban forests

    NASA Astrophysics Data System (ADS)

    Ren, Y.

    2017-12-01

    Context Land surface temperatures (LSTs) spatio-temporal distribution pattern of urban forests are influenced by many ecological factors; the identification of interaction between these factors can improve simulations and predictions of spatial patterns of urban cold islands. This quantitative research requires an integrated method that combines multiple sources data with spatial statistical analysis. Objectives The purpose of this study was to clarify urban forest LST influence interaction between anthropogenic activities and multiple ecological factors using cluster analysis of hot and cold spots and Geogdetector model. We introduced the hypothesis that anthropogenic activity interacts with certain ecological factors, and their combination influences urban forests LST. We also assumed that spatio-temporal distributions of urban forest LST should be similar to those of ecological factors and can be represented quantitatively. Methods We used Jinjiang as a representative city in China as a case study. Population density was employed to represent anthropogenic activity. We built up a multi-source data (forest inventory, digital elevation models (DEM), population, and remote sensing imagery) on a unified urban scale to support urban forest LST influence interaction research. Through a combination of spatial statistical analysis results, multi-source spatial data, and Geogdetector model, the interaction mechanisms of urban forest LST were revealed. Results Although different ecological factors have different influences on forest LST, in two periods with different hot spots and cold spots, the patch area and dominant tree species were the main factors contributing to LST clustering in urban forests. The interaction between anthropogenic activity and multiple ecological factors increased LST in urban forest stands, linearly and nonlinearly. Strong interactions between elevation and dominant species were generally observed and were prevalent in either hot or cold spots areas in different years. Conclusions In conclusion, a combination of spatial statistics and GeogDetector models should be effective for quantitatively evaluating interactive relationships among ecological factors, anthropogenic activity and LST.

  7. Modeling forest biomass and growth: Coupling long-term inventory and LiDAR data

    Treesearch

    Chad Babcock; Andrew O. Finley; Bruce D. Cook; Aaron Weiskittel; Christopher W. Woodall

    2016-01-01

    Combining spatially-explicit long-term forest inventory and remotely sensed information from Light Detection and Ranging (LiDAR) datasets through statistical models can be a powerful tool for predicting and mapping above-ground biomass (AGB) at a range of geographic scales. We present and examine a novel modeling approach to improve prediction of AGB and estimate AGB...

  8. Modelling eWork in Europe: Estimates, Models and Forecasts from the EMERGENCE Project. IES Report.

    ERIC Educational Resources Information Center

    Bates, P.; Huws, U.

    A study combined results of a survey of employers in 18 European countries to establish the extent to which they are currently using eWork with European official statistics to develop models, estimates, and forecasts of the numbers of eWorkers in Europe. These four types of "individual" eWork were identified: telehomeworking;…

  9. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning.

    PubMed

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-06-17

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults.

  10. Fault Diagnosis for Rotating Machinery Using Vibration Measurement Deep Statistical Feature Learning

    PubMed Central

    Li, Chuan; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego

    2016-01-01

    Fault diagnosis is important for the maintenance of rotating machinery. The detection of faults and fault patterns is a challenging part of machinery fault diagnosis. To tackle this problem, a model for deep statistical feature learning from vibration measurements of rotating machinery is presented in this paper. Vibration sensor signals collected from rotating mechanical systems are represented in the time, frequency, and time-frequency domains, each of which is then used to produce a statistical feature set. For learning statistical features, real-value Gaussian-Bernoulli restricted Boltzmann machines (GRBMs) are stacked to develop a Gaussian-Bernoulli deep Boltzmann machine (GDBM). The suggested approach is applied as a deep statistical feature learning tool for both gearbox and bearing systems. The fault classification performances in experiments using this approach are 95.17% for the gearbox, and 91.75% for the bearing system. The proposed approach is compared to such standard methods as a support vector machine, GRBM and a combination model. In experiments, the best fault classification rate was detected using the proposed model. The results show that deep learning with statistical feature extraction has an essential improvement potential for diagnosing rotating machinery faults. PMID:27322273

  11. Generation of future potential scenarios in an Alpine Catchment by applying bias-correction techniques, delta-change approaches and stochastic Weather Generators at different spatial scale. Analysis of their influence on basic and drought statistics.

    NASA Astrophysics Data System (ADS)

    Collados-Lara, Antonio-Juan; Pulido-Velazquez, David; Pardo-Iguzquiza, Eulogio

    2017-04-01

    Assessing impacts of potential future climate change scenarios in precipitation and temperature is essential to design adaptive strategies in water resources systems. The objective of this work is to analyze the possibilities of different statistical downscaling methods to generate future potential scenarios in an Alpine Catchment from historical data and the available climate models simulations performed in the frame of the CORDEX EU project. The initial information employed to define these downscaling approaches are the historical climatic data (taken from the Spain02 project for the period 1971-2000 with a spatial resolution of 12.5 Km) and the future series provided by climatic models in the horizon period 2071-2100 . We have used information coming from nine climate model simulations (obtained from five different Regional climate models (RCM) nested to four different Global Climate Models (GCM)) from the European CORDEX project. In our application we have focused on the Representative Concentration Pathways (RCP) 8.5 emissions scenario, which is the most unfavorable scenario considered in the fifth Assessment Report (AR5) by the Intergovernmental Panel on Climate Change (IPCC). For each RCM we have generated future climate series for the period 2071-2100 by applying two different approaches, bias correction and delta change, and five different transformation techniques (first moment correction, first and second moment correction, regression functions, quantile mapping using distribution derived transformation and quantile mapping using empirical quantiles) for both of them. Ensembles of the obtained series were proposed to obtain more representative potential future climate scenarios to be employed to study potential impacts. In this work we propose a non-equifeaseble combination of the future series giving more weight to those coming from models (delta change approaches) or combination of models and techniques that provides better approximation to the basic and drought statistic of the historical data. A multi-objective analysis using basic statistics (mean, standard deviation and asymmetry coefficient) and droughts statistics (duration, magnitude and intensity) has been performed to identify which models are better in terms of goodness of fit to reproduce the historical series. The drought statistics have been obtained from the Standard Precipitation index (SPI) series using the Theory of Runs. This analysis allows discriminate the best RCM and the best combination of model and correction technique in the bias-correction method. We have also analyzed the possibilities of using different Stochastic Weather Generators to approximate the basic and droughts statistics of the historical series. These analyses have been performed in our case study in a lumped and in a distributed way in order to assess its sensibility to the spatial scale. The statistic of the future temperature series obtained with different ensemble options are quite homogeneous, but the precipitation shows a higher sensibility to the adopted method and spatial scale. The global increment in the mean temperature values are 31.79 %, 31.79 %, 31.03 % and 31.74 % for the distributed bias-correction, distributed delta-change, lumped bias-correction and lumped delta-change ensembles respectively and in the precipitation they are -25.48 %, -28.49 %, -26.42 % and -27.35% respectively. Acknowledgments: This research work has been partially supported by the GESINHIMPADAPT project (CGL2013-48424-C2-2-R) with Spanish MINECO funds. We would also like to thank Spain02 and CORDEX projects for the data provided for this study and the R package qmap.

  12. Equilibrium statistical-thermal models in high-energy physics

    NASA Astrophysics Data System (ADS)

    Tawfik, Abdel Nasser

    2014-05-01

    We review some recent highlights from the applications of statistical-thermal models to different experimental measurements and lattice QCD thermodynamics that have been made during the last decade. We start with a short review of the historical milestones on the path of constructing statistical-thermal models for heavy-ion physics. We discovered that Heinz Koppe formulated in 1948, an almost complete recipe for the statistical-thermal models. In 1950, Enrico Fermi generalized this statistical approach, in which he started with a general cross-section formula and inserted into it, the simplifying assumptions about the matrix element of the interaction process that likely reflects many features of the high-energy reactions dominated by density in the phase space of final states. In 1964, Hagedorn systematically analyzed the high-energy phenomena using all tools of statistical physics and introduced the concept of limiting temperature based on the statistical bootstrap model. It turns to be quite often that many-particle systems can be studied with the help of statistical-thermal methods. The analysis of yield multiplicities in high-energy collisions gives an overwhelming evidence for the chemical equilibrium in the final state. The strange particles might be an exception, as they are suppressed at lower beam energies. However, their relative yields fulfill statistical equilibrium, as well. We review the equilibrium statistical-thermal models for particle production, fluctuations and collective flow in heavy-ion experiments. We also review their reproduction of the lattice QCD thermodynamics at vanishing and finite chemical potential. During the last decade, five conditions have been suggested to describe the universal behavior of the chemical freeze-out parameters. The higher order moments of multiplicity have been discussed. They offer deep insights about particle production and to critical fluctuations. Therefore, we use them to describe the freeze-out parameters and suggest the location of the QCD critical endpoint. Various extensions have been proposed in order to take into consideration the possible deviations of the ideal hadron gas. We highlight various types of interactions, dissipative properties and location-dependences (spatial rapidity). Furthermore, we review three models combining hadronic with partonic phases; quasi-particle model, linear sigma model with Polyakov potentials and compressible bag model.

  13. A statistical model describing combined irreversible electroporation and electroporation-induced blood-brain barrier disruption

    PubMed Central

    Sharabi, Shirley; Kos, Bor; Last, David; Guez, David; Daniels, Dianne; Harnof, Sagi; Miklavcic, Damijan

    2016-01-01

    Background Electroporation-based therapies such as electrochemotherapy (ECT) and irreversible electroporation (IRE) are emerging as promising tools for treatment of tumors. When applied to the brain, electroporation can also induce transient blood-brain-barrier (BBB) disruption in volumes extending beyond IRE, thus enabling efficient drug penetration. The main objective of this study was to develop a statistical model predicting cell death and BBB disruption induced by electroporation. This model can be used for individual treatment planning. Material and methods Cell death and BBB disruption models were developed based on the Peleg-Fermi model in combination with numerical models of the electric field. The model calculates the electric field thresholds for cell kill and BBB disruption and describes the dependence on the number of treatment pulses. The model was validated using in vivo experimental data consisting of rats brains MRIs post electroporation treatments. Results Linear regression analysis confirmed that the model described the IRE and BBB disruption volumes as a function of treatment pulses number (r2 = 0.79; p < 0.008, r2 = 0.91; p < 0.001). The results presented a strong plateau effect as the pulse number increased. The ratio between complete cell death and no cell death thresholds was relatively narrow (between 0.88-0.91) even for small numbers of pulses and depended weakly on the number of pulses. For BBB disruption, the ratio increased with the number of pulses. BBB disruption radii were on average 67% ± 11% larger than IRE volumes. Conclusions The statistical model can be used to describe the dependence of treatment-effects on the number of pulses independent of the experimental setup. PMID:27069447

  14. A Data Analytical Framework for Improving Real-Time, Decision Support Systems in Healthcare

    ERIC Educational Resources Information Center

    Yahav, Inbal

    2010-01-01

    In this dissertation we develop a framework that combines data mining, statistics and operations research methods for improving real-time decision support systems in healthcare. Our approach consists of three main concepts: data gathering and preprocessing, modeling, and deployment. We introduce the notion of offline and semi-offline modeling to…

  15. Modelling Agent-Environment Interaction in Multi-Agent Simulations with Affordances

    DTIC Science & Technology

    2010-04-01

    allow operations analysts to conduct statistical studies comparing the effectiveness of different systems or tactics in different scenarios. 11 Instead of...in a Monte-Carlo batch mode, producing statistical outcomes for particular measures of effectiveness. They typically also run at many times faster...Combined with annotated signs, the affordances allowed the traveller agents to find their way around the virtual airport and to conduct their business

  16. Crossing statistic: reconstructing the expansion history of the universe

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shafieloo, Arman, E-mail: arman@ewha.ac.kr

    2012-08-01

    We present that by combining Crossing Statistic [1,2] and Smoothing method [3-5] one can reconstruct the expansion history of the universe with a very high precision without considering any prior on the cosmological quantities such as the equation of state of dark energy. We show that the presented method performs very well in reconstruction of the expansion history of the universe independent of the underlying models and it works well even for non-trivial dark energy models with fast or slow changes in the equation of state of dark energy. Accuracy of the reconstructed quantities along with independence of the methodmore » to any prior or assumption gives the proposed method advantages to the other non-parametric methods proposed before in the literature. Applying on the Union 2.1 supernovae combined with WiggleZ BAO data we present the reconstructed results and test the consistency of the two data sets in a model independent manner. Results show that latest available supernovae and BAO data are in good agreement with each other and spatially flat ΛCDM model is in concordance with the current data.« less

  17. Statistical Evaluation of Utilization of the ISS

    NASA Technical Reports Server (NTRS)

    Andrews, Ross; Andrews, Alida

    2006-01-01

    PayLoad Utilization Modeler (PLUM) is a statistical-modeling computer program used to evaluate the effectiveness of utilization of the International Space Station (ISS) in terms of the number of research facilities that can be operated within a specified interval of time. PLUM is designed to balance the requirements of research facilities aboard the ISS against the resources available on the ISS. PLUM comprises three parts: an interface for the entry of data on constraints and on required and available resources, a database that stores these data as well as the program output, and a modeler. The modeler comprises two subparts: one that generates tens of thousands of random combinations of research facilities and another that calculates the usage of resources for each of those combinations. The results of these calculations are used to generate graphical and tabular reports to determine which facilities are most likely to be operable on the ISS, to identify which ISS resources are inadequate to satisfy the demands upon them, and to generate other data useful in allocation of and planning of resources.

  18. Geographical origin discrimination of lentils (Lens culinaris Medik.) using 1H NMR fingerprinting and multivariate statistical analyses.

    PubMed

    Longobardi, Francesco; Innamorato, Valentina; Di Gioia, Annalisa; Ventrella, Andrea; Lippolis, Vincenzo; Logrieco, Antonio F; Catucci, Lucia; Agostiano, Angela

    2017-12-15

    Lentil samples coming from two different countries, i.e. Italy and Canada, were analysed using untargeted 1 H NMR fingerprinting in combination with chemometrics in order to build models able to classify them according to their geographical origin. For such aim, Soft Independent Modelling of Class Analogy (SIMCA), k-Nearest Neighbor (k-NN), Principal Component Analysis followed by Linear Discriminant Analysis (PCA-LDA) and Partial Least Squares-Discriminant Analysis (PLS-DA) were applied to the NMR data and the results were compared. The best combination of average recognition (100%) and cross-validation prediction abilities (96.7%) was obtained for the PCA-LDA. All the statistical models were validated both by using a test set and by carrying out a Monte Carlo Cross Validation: the obtained performances were found to be satisfying for all the models, with prediction abilities higher than 95% demonstrating the suitability of the developed methods. Finally, the metabolites that mostly contributed to the lentil discrimination were indicated. Copyright © 2017 Elsevier Ltd. All rights reserved.

  19. Action detection by double hierarchical multi-structure space-time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-03-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  20. A Statistical Examination of Magnetic Field Model Accuracy for Mapping Geosynchronous Solar Energetic Particle Observations to Lower Earth Orbits

    NASA Astrophysics Data System (ADS)

    Young, S. L.; Kress, B. T.; Rodriguez, J. V.; McCollough, J. P.

    2013-12-01

    Operational specifications of space environmental hazards can be an important input used by decision makers. Ideally the specification would come from on-board sensors, but for satellites where that capability is not available another option is to map data from remote observations to the location of the satellite. This requires a model of the physical environment and an understanding of its accuracy for mapping applications. We present a statistical comparison between magnetic field model mappings of solar energetic particle observations made by NOAA's Geostationary Operational Environmental Satellites (GOES) to the location of the Combined Release and Radiation Effects Satellite (CRRES). Because CRRES followed a geosynchronous transfer orbit which precessed in local time this allows us to examine the model accuracy between LEO and GEO orbits across a range of local times. We examine the accuracy of multiple magnetic field models using a variety of statistics and examine their utility for operational purposes.

  1. Action detection by double hierarchical multi-structure space–time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-06-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  2. A question of separation: disentangling tracer bias and gravitational non-linearity with counts-in-cells statistics

    NASA Astrophysics Data System (ADS)

    Uhlemann, C.; Feix, M.; Codis, S.; Pichon, C.; Bernardeau, F.; L'Huillier, B.; Kim, J.; Hong, S. E.; Laigle, C.; Park, C.; Shin, J.; Pogosyan, D.

    2018-02-01

    Starting from a very accurate model for density-in-cells statistics of dark matter based on large deviation theory, a bias model for the tracer density in spheres is formulated. It adopts a mean bias relation based on a quadratic bias model to relate the log-densities of dark matter to those of mass-weighted dark haloes in real and redshift space. The validity of the parametrized bias model is established using a parametrization-independent extraction of the bias function. This average bias model is then combined with the dark matter PDF, neglecting any scatter around it: it nevertheless yields an excellent model for densities-in-cells statistics of mass tracers that is parametrized in terms of the underlying dark matter variance and three bias parameters. The procedure is validated on measurements of both the one- and two-point statistics of subhalo densities in the state-of-the-art Horizon Run 4 simulation showing excellent agreement for measured dark matter variance and bias parameters. Finally, it is demonstrated that this formalism allows for a joint estimation of the non-linear dark matter variance and the bias parameters using solely the statistics of subhaloes. Having verified that galaxy counts in hydrodynamical simulations sampled on a scale of 10 Mpc h-1 closely resemble those of subhaloes, this work provides important steps towards making theoretical predictions for density-in-cells statistics applicable to upcoming galaxy surveys like Euclid or WFIRST.

  3. Weighted Statistical Binning: Enabling Statistically Consistent Genome-Scale Phylogenetic Analyses

    PubMed Central

    Bayzid, Md Shamsuzzoha; Mirarab, Siavash; Boussau, Bastien; Warnow, Tandy

    2015-01-01

    Because biological processes can result in different loci having different evolutionary histories, species tree estimation requires multiple loci from across multiple genomes. While many processes can result in discord between gene trees and species trees, incomplete lineage sorting (ILS), modeled by the multi-species coalescent, is considered to be a dominant cause for gene tree heterogeneity. Coalescent-based methods have been developed to estimate species trees, many of which operate by combining estimated gene trees, and so are called "summary methods". Because summary methods are generally fast (and much faster than more complicated coalescent-based methods that co-estimate gene trees and species trees), they have become very popular techniques for estimating species trees from multiple loci. However, recent studies have established that summary methods can have reduced accuracy in the presence of gene tree estimation error, and also that many biological datasets have substantial gene tree estimation error, so that summary methods may not be highly accurate in biologically realistic conditions. Mirarab et al. (Science 2014) presented the "statistical binning" technique to improve gene tree estimation in multi-locus analyses, and showed that it improved the accuracy of MP-EST, one of the most popular coalescent-based summary methods. Statistical binning, which uses a simple heuristic to evaluate "combinability" and then uses the larger sets of genes to re-calculate gene trees, has good empirical performance, but using statistical binning within a phylogenomic pipeline does not have the desirable property of being statistically consistent. We show that weighting the re-calculated gene trees by the bin sizes makes statistical binning statistically consistent under the multispecies coalescent, and maintains the good empirical performance. Thus, "weighted statistical binning" enables highly accurate genome-scale species tree estimation, and is also statistically consistent under the multi-species coalescent model. New data used in this study are available at DOI: http://dx.doi.org/10.6084/m9.figshare.1411146, and the software is available at https://github.com/smirarab/binning. PMID:26086579

  4. A bilayer Double Semion model with symmetry-enriched topological order

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ortiz, L., E-mail: lauraort@ucm.es; Martin-Delgado, M.A.

    2016-12-15

    We construct a new model of two-dimensional quantum spin systems that combines intrinsic topological orders and a global symmetry called flavour symmetry. It is referred as the bilayer Doubled Semion model (bDS) and is an instance of symmetry-enriched topological order. A honeycomb bilayer lattice is introduced to combine a Double Semion Topological Order with a global spin–flavour symmetry to get the fractionalization of its quasiparticles. The bDS model exhibits non-trivial braiding self-statistics of excitations and its dual model constitutes a Symmetry-Protected Topological Order with novel edge states. This dual model gives rise to a bilayer Non-Trivial Paramagnet that is invariantmore » under the flavour symmetry and the well-known spin flip symmetry.« less

  5. Improvement and extension of a radar forest backscattering model

    NASA Technical Reports Server (NTRS)

    Simonett, David S.; Wang, Yong

    1989-01-01

    Radar modeling of mangal forest stands, in the Sundarbans area of Southern Bangladesh, was developed. The modeling employs radar system parameters with forest data on tree height, spacing, biomass, species combinations, and water (including slightly conductive water), content both in leaves and trunks of the mangal. For Sundri and Gewa tropical mangal forests, six model components are proposed, which are required to explain the contributions of various forest species combinations in the attenuation and scattering of mangal vegetated nonflooded or flooded surfaces. Statistical data of simulated images were compared with those of SIR-B images both to refine the modeling procedures and to appropriately characterize the model output. The possibility of delineation of flooded or nonflooded boundaries is discussed.

  6. A novelty detection diagnostic methodology for gearboxes operating under fluctuating operating conditions using probabilistic techniques

    NASA Astrophysics Data System (ADS)

    Schmidt, S.; Heyns, P. S.; de Villiers, J. P.

    2018-02-01

    In this paper, a fault diagnostic methodology is developed which is able to detect, locate and trend gear faults under fluctuating operating conditions when only vibration data from a single transducer, measured on a healthy gearbox are available. A two-phase feature extraction and modelling process is proposed to infer the operating condition and based on the operating condition, to detect changes in the machine condition. Information from optimised machine and operating condition hidden Markov models are statistically combined to generate a discrepancy signal which is post-processed to infer the condition of the gearbox. The discrepancy signal is processed and combined with statistical methods for automatic fault detection and localisation and to perform fault trending over time. The proposed methodology is validated on experimental data and a tacholess order tracking methodology is used to enhance the cost-effectiveness of the diagnostic methodology.

  7. Combination of complementary data mining methods for geographical characterization of extra virgin olive oils based on mineral composition.

    PubMed

    Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles

    2018-09-30

    This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.

  8. Estimation of unemployment rates using small area estimation model by combining time series and cross-sectional data

    NASA Astrophysics Data System (ADS)

    Muchlisoh, Siti; Kurnia, Anang; Notodiputro, Khairil Anwar; Mangku, I. Wayan

    2016-02-01

    Labor force surveys conducted over time by the rotating panel design have been carried out in many countries, including Indonesia. Labor force survey in Indonesia is regularly conducted by Statistics Indonesia (Badan Pusat Statistik-BPS) and has been known as the National Labor Force Survey (Sakernas). The main purpose of Sakernas is to obtain information about unemployment rates and its changes over time. Sakernas is a quarterly survey. The quarterly survey is designed only for estimating the parameters at the provincial level. The quarterly unemployment rate published by BPS (official statistics) is calculated based on only cross-sectional methods, despite the fact that the data is collected under rotating panel design. The study purpose to estimate a quarterly unemployment rate at the district level used small area estimation (SAE) model by combining time series and cross-sectional data. The study focused on the application and comparison between the Rao-Yu model and dynamic model in context estimating the unemployment rate based on a rotating panel survey. The goodness of fit of both models was almost similar. Both models produced an almost similar estimation and better than direct estimation, but the dynamic model was more capable than the Rao-Yu model to capture a heterogeneity across area, although it was reduced over time.

  9. Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation

    NASA Technical Reports Server (NTRS)

    Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.

    1998-01-01

    The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.

  10. Statistically Assessing Time-Averaged and Paleosecular Variation Field Models Against Paleomagnetic Directional Data Sets. Can Likely non-Zonal Features be Detected in a Robust way ?

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.

    2007-12-01

    We recently introduced a method to rigorously test the statistical compatibility of combined time-averaged (TAF) and paleosecular variation (PSV) field models against any lava flow paleomagnetic database (Khokhlov et al., 2001, 2006). Applying this method to test (TAF+PSV) models against synthetic data produced from those shows that the method is very efficient at discriminating models, and very sensitive, provided those data errors are properly taken into account. This prompted us to test a variety of published combined (TAF+PSV) models against a test Bruhnes stable polarity data set extracted from the Quidelleur et al. (1994) data base. Not surprisingly, ignoring data errors leads all models to be rejected. But taking data errors into account leads to the stimulating conclusion that at least one (TAF+PSV) model appears to be compatible with the selected data set, this model being purely axisymmetric. This result shows that in practice also, and with the data bases currently available, the method can discriminate various candidate models and decide which actually best fits a given data set. But it also shows that likely non-zonal signatures of non-homogeneous boundary conditions imposed by the mantle are difficult to identify as statistically robust from paleomagnetic directional data sets. In the present paper, we will discuss the possibility that such signatures could eventually be identified as robust with the help of more recent data sets (such as the one put together under the collaborative "TAFI" effort, see e.g. Johnson et al. abstract #GP21A-0013, AGU Fall Meeting, 2005) or by taking additional information into account (such as the possible coincidence of non-zonal time-averaged field patterns with analogous patterns in the modern field).

  11. Combination Treatment With Meropenem Plus Levofloxacin Is Synergistic Against Pseudomonas aeruginosa Infection in a Murine Model of Pneumonia

    PubMed Central

    Louie, Arnold; Liu, Weiguo; VanGuilder, Michael; Neely, Michael N.; Schumitzky, Alan; Jelliffe, Roger; Fikes, Steven; Kurhanewicz, Stephanie; Robbins, Nichole; Brown, David; Baluya, Dodge; Drusano, George L.

    2015-01-01

    Background. Meropenem plus levofloxacin treatment was shown to be a promising combination in our in vitro hollow fiber infection model. We strove to validate this finding in a murine Pseudomonas pneumonia model. Methods. A dose-ranging study with meropenem and levofloxacin alone and in combination against Pseudomonas aeruginosa was performed in a granulocytopenic murine pneumonia model. Meropenem and levofloxacin were administered to partially humanize their pharmacokinetic profiles in mouse serum. Total and resistant bacterial populations were estimated after 24 hours of therapy. Pharmacokinetic profiling of both drugs was performed in plasma and epithelial lining fluid, using a population model. Results. Meropenem and levofloxacin penetrations into epithelial lining fluid were 39.3% and 64.3%, respectively. Both monotherapies demonstrated good exposure responses. An innovative combination-therapy analytic approach demonstrated that the combination was statistically significantly synergistic (α = 2.475), as was shown in the hollow fiber infection model. Bacterial resistant to levofloxacin and meropenem was seen in the control arm. Levofloxacin monotherapy selected for resistance to itself. No resistant subpopulations were observed in any combination therapy arm. Conclusions. The combination of meropenem plus levofloxacin was synergistic, producing good bacterial kill and resistance suppression. Given the track record of safety of each agent, this combination may be worthy of clinical trial. PMID:25362196

  12. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment

    PubMed Central

    Hashim, Mazlan

    2015-01-01

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning. PMID:25898919

  13. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment.

    PubMed

    Shahabi, Himan; Hashim, Mazlan

    2015-04-22

    This research presents the results of the GIS-based statistical models for generation of landslide susceptibility mapping using geographic information system (GIS) and remote-sensing data for Cameron Highlands area in Malaysia. Ten factors including slope, aspect, soil, lithology, NDVI, land cover, distance to drainage, precipitation, distance to fault, and distance to road were extracted from SAR data, SPOT 5 and WorldView-1 images. The relationships between the detected landslide locations and these ten related factors were identified by using GIS-based statistical models including analytical hierarchy process (AHP), weighted linear combination (WLC) and spatial multi-criteria evaluation (SMCE) models. The landslide inventory map which has a total of 92 landslide locations was created based on numerous resources such as digital aerial photographs, AIRSAR data, WorldView-1 images, and field surveys. Then, 80% of the landslide inventory was used for training the statistical models and the remaining 20% was used for validation purpose. The validation results using the Relative landslide density index (R-index) and Receiver operating characteristic (ROC) demonstrated that the SMCE model (accuracy is 96%) is better in prediction than AHP (accuracy is 91%) and WLC (accuracy is 89%) models. These landslide susceptibility maps would be useful for hazard mitigation purpose and regional planning.

  14. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

    PubMed

    Austin, Peter C; Steyerberg, Ewout W

    2012-06-20

    When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

  15. 2-Point microstructure archetypes for improved elastic properties

    NASA Astrophysics Data System (ADS)

    Adams, Brent L.; Gao, Xiang

    2004-01-01

    Rectangular models of material microstructure are described by their 1- and 2-point (spatial) correlation statistics of placement of local state. In the procedure described here the local state space is described in discrete form; and the focus is on placement of local state within a finite number of cells comprising rectangular models. It is illustrated that effective elastic properties (generalized Hashin Shtrikman bounds) can be obtained that are linear in components of the correlation statistics. Within this framework the concept of an eigen-microstructure within the microstructure hull is useful. Given the practical innumerability of the microstructure hull, however, we introduce a method for generating a sequence of archetypes of eigen-microstructure, from the 2-point correlation statistics of local state, assuming that the 1-point statistics are stationary. The method is illustrated by obtaining an archetype for an imaginary two-phase material where the objective is to maximize the combination C_{xxxx}^{*} + C_{xyxy}^{*}

  16. Investigation of Super Learner Methodology on HIV-1 Small Sample: Application on Jaguar Trial Data.

    PubMed

    Houssaïni, Allal; Assoumou, Lambert; Marcelin, Anne Geneviève; Molina, Jean Michel; Calvez, Vincent; Flandre, Philippe

    2012-01-01

    Background. Many statistical models have been tested to predict phenotypic or virological response from genotypic data. A statistical framework called Super Learner has been introduced either to compare different methods/learners (discrete Super Learner) or to combine them in a Super Learner prediction method. Methods. The Jaguar trial is used to apply the Super Learner framework. The Jaguar study is an "add-on" trial comparing the efficacy of adding didanosine to an on-going failing regimen. Our aim was also to investigate the impact on the use of different cross-validation strategies and different loss functions. Four different repartitions between training set and validations set were tested through two loss functions. Six statistical methods were compared. We assess performance by evaluating R(2) values and accuracy by calculating the rates of patients being correctly classified. Results. Our results indicated that the more recent Super Learner methodology of building a new predictor based on a weighted combination of different methods/learners provided good performance. A simple linear model provided similar results to those of this new predictor. Slight discrepancy arises between the two loss functions investigated, and slight difference arises also between results based on cross-validated risks and results from full dataset. The Super Learner methodology and linear model provided around 80% of patients correctly classified. The difference between the lower and higher rates is around 10 percent. The number of mutations retained in different learners also varys from one to 41. Conclusions. The more recent Super Learner methodology combining the prediction of many learners provided good performance on our small dataset.

  17. Order statistics applied to the most massive and most distant galaxy clusters

    NASA Astrophysics Data System (ADS)

    Waizmann, J.-C.; Ettori, S.; Bartelmann, M.

    2013-06-01

    In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.

  18. Comparisons of Three-Dimensional Variational Data Assimilation and Model Output Statistics in Improving Atmospheric Chemistry Forecasts

    NASA Astrophysics Data System (ADS)

    Ma, Chaoqun; Wang, Tijian; Zang, Zengliang; Li, Zhijin

    2018-07-01

    Atmospheric chemistry models usually perform badly in forecasting wintertime air pollution because of their uncertainties. Generally, such uncertainties can be decreased effectively by techniques such as data assimilation (DA) and model output statistics (MOS). However, the relative importance and combined effects of the two techniques have not been clarified. Here, a one-month air quality forecast with the Weather Research and Forecasting-Chemistry (WRF-Chem) model was carried out in a virtually operational setup focusing on Hebei Province, China. Meanwhile, three-dimensional variational (3DVar) DA and MOS based on one-dimensional Kalman filtering were implemented separately and simultaneously to investigate their performance in improving the model forecast. Comparison with observations shows that the chemistry forecast with MOS outperforms that with 3DVar DA, which could be seen in all the species tested over the whole 72 forecast hours. Combined use of both techniques does not guarantee a better forecast than MOS only, with the improvements and degradations being small and appearing rather randomly. Results indicate that the implementation of MOS is more suitable than 3DVar DA in improving the operational forecasting ability of WRF-Chem.

  19. On Improving the Quality and Interpretation of Environmental Assessments using Statistical Analysis and Geographic Information Systems

    NASA Astrophysics Data System (ADS)

    Karuppiah, R.; Faldi, A.; Laurenzi, I.; Usadi, A.; Venkatesh, A.

    2014-12-01

    An increasing number of studies are focused on assessing the environmental footprint of different products and processes, especially using life cycle assessment (LCA). This work shows how combining statistical methods and Geographic Information Systems (GIS) with environmental analyses can help improve the quality of results and their interpretation. Most environmental assessments in literature yield single numbers that characterize the environmental impact of a process/product - typically global or country averages, often unchanging in time. In this work, we show how statistical analysis and GIS can help address these limitations. For example, we demonstrate a method to separately quantify uncertainty and variability in the result of LCA models using a power generation case study. This is important for rigorous comparisons between the impacts of different processes. Another challenge is lack of data that can affect the rigor of LCAs. We have developed an approach to estimate environmental impacts of incompletely characterized processes using predictive statistical models. This method is applied to estimate unreported coal power plant emissions in several world regions. There is also a general lack of spatio-temporal characterization of the results in environmental analyses. For instance, studies that focus on water usage do not put in context where and when water is withdrawn. Through the use of hydrological modeling combined with GIS, we quantify water stress on a regional and seasonal basis to understand water supply and demand risks for multiple users. Another example where it is important to consider regional dependency of impacts is when characterizing how agricultural land occupation affects biodiversity in a region. We developed a data-driven methodology used in conjuction with GIS to determine if there is a statistically significant difference between the impacts of growing different crops on different species in various biomes of the world.

  20. Infants are superior in implicit crossmodal learning and use other learning mechanisms than adults

    PubMed Central

    von Frieling, Marco; Röder, Brigitte

    2017-01-01

    During development internal models of the sensory world must be acquired which have to be continuously adapted later. We used event-related potentials (ERP) to test the hypothesis that infants extract crossmodal statistics implicitly while adults learn them when task relevant. Participants were passively exposed to frequent standard audio-visual combinations (A1V1, A2V2, p=0.35 each), rare recombinations of these standard stimuli (A1V2, A2V1, p=0.10 each), and a rare audio-visual deviant with infrequent auditory and visual elements (A3V3, p=0.10). While both six-month-old infants and adults differentiated between rare deviants and standards involving early neural processing stages only infants were sensitive to crossmodal statistics as indicated by a late ERP difference between standard and recombined stimuli. A second experiment revealed that adults differentiated recombined and standard combinations when crossmodal combinations were task relevant. These results demonstrate a heightened sensitivity for crossmodal statistics in infants and a change in learning mode from infancy to adulthood. PMID:28949291

  1. A Model for Indexing Medical Documents Combining Statistical and Symbolic Knowledge.

    PubMed Central

    Avillach, Paul; Joubert, Michel; Fieschi, Marius

    2007-01-01

    OBJECTIVES: To develop and evaluate an information processing method based on terminologies, in order to index medical documents in any given documentary context. METHODS: We designed a model using both symbolic general knowledge extracted from the Unified Medical Language System (UMLS) and statistical knowledge extracted from a domain of application. Using statistical knowledge allowed us to contextualize the general knowledge for every particular situation. For each document studied, the extracted terms are ranked to highlight the most significant ones. The model was tested on a set of 17,079 French standardized discharge summaries (SDSs). RESULTS: The most important ICD-10 term of each SDS was ranked 1st or 2nd by the method in nearly 90% of the cases. CONCLUSIONS: The use of several terminologies leads to more precise indexing. The improvement achieved in the model’s implementation performances as a result of using semantic relationships is encouraging. PMID:18693792

  2. Response Surface Modeling of Combined-Cycle Propulsion Components using Computational Fluid Dynamics

    NASA Technical Reports Server (NTRS)

    Steffen, C. J., Jr.

    2002-01-01

    Three examples of response surface modeling with CFD are presented for combined cycle propulsion components. The examples include a mixed-compression-inlet during hypersonic flight, a hydrogen-fueled scramjet combustor during hypersonic flight, and a ducted-rocket nozzle during all-rocket flight. Three different experimental strategies were examined, including full factorial, fractionated central-composite, and D-optimal with embedded Plackett-Burman designs. The response variables have been confined to integral data extracted from multidimensional CFD results. Careful attention to uncertainty assessment and modeling bias has been addressed. The importance of automating experimental setup and effectively communicating statistical results are emphasized.

  3. Using open source computational tools for predicting human metabolic stability and additional absorption, distribution, metabolism, excretion, and toxicity properties.

    PubMed

    Gupta, Rishi R; Gifford, Eric M; Liston, Ted; Waller, Chris L; Hohman, Moses; Bunin, Barry A; Ekins, Sean

    2010-11-01

    Ligand-based computational models could be more readily shared between researchers and organizations if they were generated with open source molecular descriptors [e.g., chemistry development kit (CDK)] and modeling algorithms, because this would negate the requirement for proprietary commercial software. We initially evaluated open source descriptors and model building algorithms using a training set of approximately 50,000 molecules and a test set of approximately 25,000 molecules with human liver microsomal metabolic stability data. A C5.0 decision tree model demonstrated that CDK descriptors together with a set of Smiles Arbitrary Target Specification (SMARTS) keys had good statistics [κ = 0.43, sensitivity = 0.57, specificity = 0.91, and positive predicted value (PPV) = 0.64], equivalent to those of models built with commercial Molecular Operating Environment 2D (MOE2D) and the same set of SMARTS keys (κ = 0.43, sensitivity = 0.58, specificity = 0.91, and PPV = 0.63). Extending the dataset to ∼193,000 molecules and generating a continuous model using Cubist with a combination of CDK and SMARTS keys or MOE2D and SMARTS keys confirmed this observation. When the continuous predictions and actual values were binned to get a categorical score we observed a similar κ statistic (0.42). The same combination of descriptor set and modeling method was applied to passive permeability and P-glycoprotein efflux data with similar model testing statistics. In summary, open source tools demonstrated predictive results comparable to those of commercial software with attendant cost savings. We discuss the advantages and disadvantages of open source descriptors and the opportunity for their use as a tool for organizations to share data precompetitively, avoiding repetition and assisting drug discovery.

  4. Bridging the Gap between Theory and Model: A Reflection on the Balance Scale Task.

    ERIC Educational Resources Information Center

    Turner, Geoffrey F. W.; Thomas, Hoben

    2002-01-01

    Focuses on individual strengths of articles by Jensen and van der Maas, and Halford et al., and the power of their combined perspectives. Suggests a performance model that can both evaluate specific theoretical claims and reveal important data features that had been previously obscured using conventional statistical analyses. Maintains that the…

  5. Combining inventories of land cover and forest resources with prediction models and remotely sensed data

    Treesearch

    Raymond L. Czaplewski

    1989-01-01

    It is difficult to design systems for national and global resource inventory and analysis that efficiently satisfy changing, and increasingly complex objectives. It is proposed that individual inventory, monitoring, modeling, and remote sensing systems be specialized to achieve portions of the objectives. These separate systems can be statistically linked to accomplish...

  6. Kalman filter to update forest cover estimates

    Treesearch

    Raymond L. Czaplewski

    1990-01-01

    The Kalman filter is a statistical estimator that combines a time-series of independent estimates, using a prediction model that describes expected changes in the state of a system over time. An expensive inventory can be updated using model predictions that are adjusted with more recent, but less expensive and precise, monitoring data. The concepts of the Kalman...

  7. On the combined gradient-stochastic plasticity model: Application to Mo-micropillar compression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Konstantinidis, A. A., E-mail: akonsta@civil.auth.gr; Zhang, X., E-mail: zhangxu26@126.com; Aifantis, E. C., E-mail: mom@mom.gen.auth.gr

    2015-02-17

    A formulation for addressing heterogeneous material deformation is proposed. It is based on the use of a stochasticity-enhanced gradient plasticity model implemented through a cellular automaton. The specific application is on Mo-micropillar compression, for which the irregularities of the strain bursts observed have been experimentally measured and theoretically interpreted through Tsallis' q-statistics.

  8. Development of a globally applicable model for near real-time prediction of seismically induced landslides

    USGS Publications Warehouse

    Nowicki, M. Anna; Wald, David J.; Hamburger, Michael W.; Hearne, Mike; Thompson, Eric M.

    2014-01-01

    Substantial effort has been invested to understand where seismically induced landslides may occur in the future, as they are a costly and frequently fatal threat in mountainous regions. The goal of this work is to develop a statistical model for estimating the spatial distribution of landslides in near real-time around the globe for use in conjunction with the U.S. Geological Survey (USGS) Prompt Assessment of Global Earthquakes for Response (PAGER) system. This model uses standardized outputs of ground shaking from the USGS ShakeMap Atlas 2.0 to develop an empirical landslide probability model, combining shaking estimates with broadly available landslide susceptibility proxies, i.e., topographic slope, surface geology, and climate parameters. We focus on four earthquakes for which digitally mapped landslide inventories and well-constrainedShakeMaps are available. The resulting database is used to build a predictive model of the probability of landslide occurrence. The landslide database includes the Guatemala (1976), Northridge (1994), Chi-Chi (1999), and Wenchuan (2008) earthquakes. Performance of the regression model is assessed using statistical goodness-of-fit metrics and a qualitative review to determine which combination of the proxies provides both the optimum prediction of landslide-affected areas and minimizes the false alarms in non-landslide zones. Combined with near real-time ShakeMaps, these models can be used to make generalized predictions of whether or not landslides are likely to occur (and if so, where) for earthquakes around the globe, and eventually to inform loss estimates within the framework of the PAGER system.

  9. Fatigue Life Prediction of Fiber-Reinforced Ceramic-Matrix Composites with Different Fiber Preforms at Room and Elevated Temperatures

    PubMed Central

    Li, Longbiao

    2016-01-01

    In this paper, the fatigue life of fiber-reinforced ceramic-matrix composites (CMCs) with different fiber preforms, i.e., unidirectional, cross-ply, 2D (two dimensional), 2.5D and 3D CMCs at room and elevated temperatures in air and oxidative environments, has been predicted using the micromechanics approach. An effective coefficient of the fiber volume fraction along the loading direction (ECFL) was introduced to describe the fiber architecture of preforms. The statistical matrix multicracking model and fracture mechanics interface debonding criterion were used to determine the matrix crack spacing and interface debonded length. Under cyclic fatigue loading, the fiber broken fraction was determined by combining the interface wear model and fiber statistical failure model at room temperature, and interface/fiber oxidation model, interface wear model and fiber statistical failure model at elevated temperatures, based on the assumption that the fiber strength is subjected to two-parameter Weibull distribution and the load carried by broken and intact fibers satisfies the Global Load Sharing (GLS) criterion. When the broken fiber fraction approaches the critical value, the composites fatigue fracture. PMID:28773332

  10. Modeling to Optimize Terminal Stem Cell Differentiation

    PubMed Central

    Gallicano, G. Ian

    2013-01-01

    Embryonic stem cell (ESC), iPCs, and adult stem cells (ASCs) all are among the most promising potential treatments for heart failure, spinal cord injury, neurodegenerative diseases, and diabetes. However, considerable uncertainty in the production of ESC-derived terminally differentiated cell types has limited the efficiency of their development. To address this uncertainty, we and other investigators have begun to employ a comprehensive statistical model of ESC differentiation for determining the role of intracellular pathways (e.g., STAT3) in ESC differentiation and determination of germ layer fate. The approach discussed here applies the Baysian statistical model to cell/developmental biology combining traditional flow cytometry methodology and specific morphological observations with advanced statistical and probabilistic modeling and experimental design. The final result of this study is a unique tool and model that enhances the understanding of how and when specific cell fates are determined during differentiation. This model provides a guideline for increasing the production efficiency of therapeutically viable ESCs/iPSCs/ASC derived neurons or any other cell type and will eventually lead to advances in stem cell therapy. PMID:24278782

  11. Body size affects the strength of social interactions and spatial organization of a schooling fish (Pseudomugil signifer)

    NASA Astrophysics Data System (ADS)

    Romenskyy, Maksym; Herbert-Read, James E.; Ward, Ashley J. W.; Sumpter, David J. T.

    2017-04-01

    While a rich variety of self-propelled particle models propose to explain the collective motion of fish and other animals, rigorous statistical comparison between models and data remains a challenge. Plausible models should be flexible enough to capture changes in the collective behaviour of animal groups at their different developmental stages and group sizes. Here, we analyse the statistical properties of schooling fish (Pseudomugil signifer) through a combination of experiments and simulations. We make novel use of a Boltzmann inversion method, usually applied in molecular dynamics, to identify the effective potential of the mean force of fish interactions. Specifically, we show that larger fish have a larger repulsion zone, but stronger attraction, resulting in greater alignment in their collective motion. We model the collective dynamics of schools using a self-propelled particle model, modified to include varying particle speed and a local repulsion rule. We demonstrate that the statistical properties of the fish schools are reproduced by our model, thereby capturing a number of features of the behaviour and development of schooling fish.

  12. Fragment size distribution statistics in dynamic fragmentation of laser shock-loaded tin

    NASA Astrophysics Data System (ADS)

    He, Weihua; Xin, Jianting; Zhao, Yongqiang; Chu, Genbai; Xi, Tao; Shui, Min; Lu, Feng; Gu, Yuqiu

    2017-06-01

    This work investigates the geometric statistics method to characterize the size distribution of tin fragments produced in the laser shock-loaded dynamic fragmentation process. In the shock experiments, the ejection of the tin sample with etched V-shape groove in the free surface are collected by the soft recovery technique. Subsequently, the produced fragments are automatically detected with the fine post-shot analysis techniques including the X-ray micro-tomography and the improved watershed method. To characterize the size distributions of the fragments, a theoretical random geometric statistics model based on Poisson mixtures is derived for dynamic heterogeneous fragmentation problem, which reveals linear combinational exponential distribution. The experimental data related to fragment size distributions of the laser shock-loaded tin sample are examined with the proposed theoretical model, and its fitting performance is compared with that of other state-of-the-art fragment size distribution models. The comparison results prove that our proposed model can provide far more reasonable fitting result for the laser shock-loaded tin.

  13. Long-term evolution of a planetesimal swarm in the vicinity of a protoplanet

    NASA Technical Reports Server (NTRS)

    Kary, David M.; Lissauer, Jack J.

    1991-01-01

    Many models of planet formation involve scenarios in which one or a few large protoplanets interact with a swarm of much smaller planetesimals. In such scenarios, three-body perturbations by the protoplanet as well as mutual collisions and gravitational interactions between the swarm bodies are important in determining the velocity distribution of the swarm. We are developing a model to examine the effects of these processes on the evolution of a planetesimal swarm. The model consists of a combination of numerical integrations of the gravitational influence of one (or a few) massive protoplanets on swarm bodies together with a statistical treatment of the interactions between the planetesimals. Integrating the planetesimal orbits allows us to take into account effects that are difficult to model analytically or statistically, such as three-body collision cross-sections and resonant perturbations by the protoplanet, while using a statistical treatment for the particle-particle interactions allows us to use a large enough sample to obtain meaningful results.

  14. Statistical models of power-combining circuits for O-type traveling-wave tube amplifiers

    NASA Astrophysics Data System (ADS)

    Kats, A. M.; Klinaev, Iu. V.; Gleizer, V. V.

    1982-11-01

    The design outlined here allows for imbalances in the power of the devices being combined and for differences in phase. It is shown that the coefficient of combination is described by a beta distribution of the first type when a small number of devices are being combined and that the coefficient is asymptotically normal in relation to both the number of devices and the phase variance of the tube's output signals. Relations are derived that make it possible to calculate the efficiency of a power-combining circuit and the reproducibility of the design parameters when standard devices are used.

  15. A Survey of Insider Attack Detection Research

    DTIC Science & Technology

    2008-08-25

    modeling of statistical features , such as the frequency of events, the duration of events, the co-occurrence of multiple events combined through...forms of attack that have been reported [Error! Reference source not found.]. For example: • Unauthorized extraction , duplication, or exfiltration...network level. Schultz pointed out that not one approach will work but solutions need to be based on multiple sensors to be able to find any combination

  16. User's Manual for Downscaler Fusion Software

    EPA Science Inventory

    Recently, a series of 3 papers has been published in the statistical literature that details the use of downscaling to obtain more accurate and precise predictions of air pollution across the conterminous U.S. This downscaling approach combines CMAQ gridded numerical model output...

  17. LANDSCAPE ASSESSMENT TOOLS FOR WATERSHED CHARACTERIZATION

    EPA Science Inventory

    A combination of process-based, empirical and statistical models has been developed to assist states in their efforts to assess water quality, locate impairments over large areas, and calculate TMDL allocations. By synthesizing outputs from a number of these tools, LIPS demonstr...

  18. Improvement and extension of a radar forest backscattering model

    NASA Technical Reports Server (NTRS)

    Simonett, David S.; Wang, Yong

    1989-01-01

    Radar modeling of mangal forest stands, in the Sundarbans area of Southern Bangladesh, was developed. The modeling employs radar system parameters such as wavelength, polarization, and incidence angle, with forest data on tree height, spacing, biomass, species combinations, and water content (including slightly conductive water) both in leaves and trunks of the mangal. For Sundri and Gewa tropical mangal forests, five model components are proposed, which are required to explain the contributions of various forest species combinations in the attenuation and scattering of mangal vegetated nonflooded or flooded surfaces. Statistical data of simulated images (HH components only) were compared with those of SIR-B images both to refine the modeling procedures and to appropriately characterize the model output. The possibility of delineation of flooded or non-flooded boundaries is discussed.

  19. In vivo serial MRI-based models and statistical methods to quantify sensitivity and specificity of mechanical predictors for carotid plaque rupture: location and beyond.

    PubMed

    Wu, Zheyang; Yang, Chun; Tang, Dalin

    2011-06-01

    It has been hypothesized that mechanical risk factors may be used to predict future atherosclerotic plaque rupture. Truly predictive methods for plaque rupture and methods to identify the best predictor(s) from all the candidates are lacking in the literature. A novel combination of computational and statistical models based on serial magnetic resonance imaging (MRI) was introduced to quantify sensitivity and specificity of mechanical predictors to identify the best candidate for plaque rupture site prediction. Serial in vivo MRI data of carotid plaque from one patient was acquired with follow-up scan showing ulceration. 3D computational fluid-structure interaction (FSI) models using both baseline and follow-up data were constructed and plaque wall stress (PWS) and strain (PWSn) and flow maximum shear stress (FSS) were extracted from all 600 matched nodal points (100 points per matched slice, baseline matching follow-up) on the lumen surface for analysis. Each of the 600 points was marked "ulcer" or "nonulcer" using follow-up scan. Predictive statistical models for each of the seven combinations of PWS, PWSn, and FSS were trained using the follow-up data and applied to the baseline data to assess their sensitivity and specificity using the 600 data points for ulcer predictions. Sensitivity of prediction is defined as the proportion of the true positive outcomes that are predicted to be positive. Specificity of prediction is defined as the proportion of the true negative outcomes that are correctly predicted to be negative. Using probability 0.3 as a threshold to infer ulcer occurrence at the prediction stage, the combination of PWS and PWSn provided the best predictive accuracy with (sensitivity, specificity) = (0.97, 0.958). Sensitivity and specificity given by PWS, PWSn, and FSS individually were (0.788, 0.968), (0.515, 0.968), and (0.758, 0.928), respectively. The proposed computational-statistical process provides a novel method and a framework to assess the sensitivity and specificity of various risk indicators and offers the potential to identify the optimized predictor for plaque rupture using serial MRI with follow-up scan showing ulceration as the gold standard for method validation. While serial MRI data with actual rupture are hard to acquire, this single-case study suggests that combination of multiple predictors may provide potential improvement to existing plaque assessment schemes. With large-scale patient studies, this predictive modeling process may provide more solid ground for rupture predictor selection strategies and methods for image-based plaque vulnerability assessment.

  20. Future mission studies: Preliminary comparisons of solar flux models

    NASA Technical Reports Server (NTRS)

    Ashrafi, S.

    1991-01-01

    The results of comparisons of the solar flux models are presented. (The wavelength lambda = 10.7 cm radio flux is the best indicator of the strength of the ionizing radiations such as solar ultraviolet and x-ray emissions that directly affect the atmospheric density thereby changing the orbit lifetime of satellites. Thus, accurate forecasting of solar flux F sub 10.7 is crucial for orbit determination of spacecrafts.) The measured solar flux recorded by National Oceanic and Atmospheric Administration (NOAA) is compared against the forecasts made by Schatten, MSFC, and NOAA itself. The possibility of a combined linear, unbiased minimum-variance estimation that properly combines all three models into one that minimizes the variance is also discussed. All the physics inherent in each model are combined. This is considered to be the dead-end statistical approach to solar flux forecasting before any nonlinear chaotic approach.

  1. Mixture EMOS model for calibrating ensemble forecasts of wind speed.

    PubMed

    Baran, S; Lerch, S

    2016-03-01

    Ensemble model output statistics (EMOS) is a statistical tool for post-processing forecast ensembles of weather variables obtained from multiple runs of numerical weather prediction models in order to produce calibrated predictive probability density functions. The EMOS predictive probability density function is given by a parametric distribution with parameters depending on the ensemble forecasts. We propose an EMOS model for calibrating wind speed forecasts based on weighted mixtures of truncated normal (TN) and log-normal (LN) distributions where model parameters and component weights are estimated by optimizing the values of proper scoring rules over a rolling training period. The new model is tested on wind speed forecasts of the 50 member European Centre for Medium-range Weather Forecasts ensemble, the 11 member Aire Limitée Adaptation dynamique Développement International-Hungary Ensemble Prediction System ensemble of the Hungarian Meteorological Service, and the eight-member University of Washington mesoscale ensemble, and its predictive performance is compared with that of various benchmark EMOS models based on single parametric families and combinations thereof. The results indicate improved calibration of probabilistic and accuracy of point forecasts in comparison with the raw ensemble and climatological forecasts. The mixture EMOS model significantly outperforms the TN and LN EMOS methods; moreover, it provides better calibrated forecasts than the TN-LN combination model and offers an increased flexibility while avoiding covariate selection problems. © 2016 The Authors Environmetrics Published by JohnWiley & Sons Ltd.

  2. The integration of probabilistic information during sensorimotor estimation is unimpaired in children with Cerebral Palsy

    PubMed Central

    Sokhey, Taegh; Gaebler-Spira, Deborah; Kording, Konrad P.

    2017-01-01

    Background It is important to understand the motor deficits of children with Cerebral Palsy (CP). Our understanding of this motor disorder can be enriched by computational models of motor control. One crucial stage in generating movement involves combining uncertain information from different sources, and deficits in this process could contribute to reduced motor function in children with CP. Healthy adults can integrate previously-learned information (prior) with incoming sensory information (likelihood) in a close-to-optimal way when estimating object location, consistent with the use of Bayesian statistics. However, there are few studies investigating how children with CP perform sensorimotor integration. We compare sensorimotor estimation in children with CP and age-matched controls using a model-based analysis to understand the process. Methods and findings We examined Bayesian sensorimotor integration in children with CP, aged between 5 and 12 years old, with Gross Motor Function Classification System (GMFCS) levels 1–3 and compared their estimation behavior with age-matched typically-developing (TD) children. We used a simple sensorimotor estimation task which requires participants to combine probabilistic information from different sources: a likelihood distribution (current sensory information) with a prior distribution (learned target information). In order to examine sensorimotor integration, we quantified how participants weighed statistical information from the two sources (prior and likelihood) and compared this to the statistical optimal weighting. We found that the weighing of statistical information in children with CP was as statistically efficient as that of TD children. Conclusions We conclude that Bayesian sensorimotor integration is not impaired in children with CP and therefore, does not contribute to their motor deficits. Future research has the potential to enrich our understanding of motor disorders by investigating the stages of motor processing set out by computational models. Therapeutic interventions should exploit the ability of children with CP to use statistical information. PMID:29186196

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nose, Y.

    Methods were developed for generating an integrated, statistical model of the anatomical structures within the human thorax relevant to radioisotope powered artificial heart implantation. These methods involve measurement and analysis of anatomy in four areas: chest wall, pericardium, vascular connections, and great vessels. A model for the prediction of thorax outline from radiograms was finalized. These models were combined with 100 radiograms to arrive at a size distribution representing the adult male and female populations. (CH)

  4. Evaluating Site-Specific and Generic Spatial Models of Aboveground Forest Biomass Based on Landsat Time-Series and LiDAR Strip Samples in the Eastern USA

    Treesearch

    Ram Deo; Matthew Russell; Grant Domke; Hans-Erik Andersen; Warren Cohen; Christopher Woodall

    2017-01-01

    Large-area assessment of aboveground tree biomass (AGB) to inform regional or national forest monitoring programs can be efficiently carried out by combining remotely sensed data and field sample measurements through a generic statistical model, in contrast to site-specific models. We integrated forest inventory plot data with spatial predictors from Landsat time-...

  5. Acetaminophen and non-steroidal anti-inflammatory drugs interact with morphine and tramadol analgesia for the treatment of neuropathic pain in rats.

    PubMed

    Shinozaki, Tomonari; Yamada, Toshihiko; Nonaka, Takahiro; Yamamoto, Tatsuo

    2015-06-01

    Although non-steroidal anti-inflammatory drugs and acetaminophen have no proven efficacy against neuropathic pain, they are frequently prescribed for neuropathic pain patients. We examined whether the combination of opioids (tramadol and morphine) with indomethacin or acetaminophen produce favorable effects on neuropathic pain and compared the efficacy for neuropathic pain with that for inflammatory pain. The carrageenan model was used as the inflammatory pain model while the tibial neuroma transposition (TNT) model was used as the neuropathic pain model. The tibial nerve is transected in the TNT model, with the tibial nerve stump then transpositioned to the lateral aspect of the hindlimb. Neuropathic pain (mechanical allodynia and neuroma pain) is observed after TNT injury. Drugs were administered orally. In the carrageenan model, all drugs produced anti-allodynic effects and all drug combinations, but not tramadol + indomethacin combination, produced synergistic anti-allodynic effects. In the TNT model, tramadol and morphine, but not acetaminophen and indomethacin, produced anti-neuropathic pain effects. In the combination, with the exception of morphine + acetaminophen combination, both acetaminophen and indomethacin reduced the 50% effective dose (ED50) of tramadol and morphine as compared with the ED50s for the single drug study in the TNT model. The ED50s of tramadol and morphine in the carrageenan combination test were not statistically significantly different from the ED50s in the TNT model combination study. The combination of opioids with indomethacin or acetaminophen produced a synergistic analgesic effect both in inflammatory and neuropathic pain with some exceptions. The efficacy of these combinations for neuropathic pain was not different from that for inflammatory pain.

  6. Technical note: Combining quantile forecasts and predictive distributions of streamflows

    NASA Astrophysics Data System (ADS)

    Bogner, Konrad; Liechti, Katharina; Zappa, Massimiliano

    2017-11-01

    The enhanced availability of many different hydro-meteorological modelling and forecasting systems raises the issue of how to optimally combine this great deal of information. Especially the usage of deterministic and probabilistic forecasts with sometimes widely divergent predicted future streamflow values makes it even more complicated for decision makers to sift out the relevant information. In this study multiple streamflow forecast information will be aggregated based on several different predictive distributions, and quantile forecasts. For this combination the Bayesian model averaging (BMA) approach, the non-homogeneous Gaussian regression (NGR), also known as the ensemble model output statistic (EMOS) techniques, and a novel method called Beta-transformed linear pooling (BLP) will be applied. By the help of the quantile score (QS) and the continuous ranked probability score (CRPS), the combination results for the Sihl River in Switzerland with about 5 years of forecast data will be compared and the differences between the raw and optimally combined forecasts will be highlighted. The results demonstrate the importance of applying proper forecast combination methods for decision makers in the field of flood and water resource management.

  7. Efficient Geological Modelling of Large AEM Surveys

    NASA Astrophysics Data System (ADS)

    Bach, Torben; Martlev Pallesen, Tom; Jørgensen, Flemming; Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas

    2014-05-01

    Combining geological expert knowledge with geophysical observations into a final 3D geological model is, in most cases, not a straight forward process. It typically involves many types of data and requires both an understanding of the data and the geological target. When dealing with very large areas, such as modelling of large AEM surveys, the manual task for the geologist to correctly evaluate and properly utilise all the data available in the survey area, becomes overwhelming. In the ERGO project (Efficient High-Resolution Geological Modelling) we address these issues and propose a new modelling methodology enabling fast and consistent modelling of very large areas. The vision of the project is to build a user friendly expert system that enables the combination of very large amounts of geological and geophysical data with geological expert knowledge. This is done in an "auto-pilot" type functionality, named Smart Interpretation, designed to aid the geologist in the interpretation process. The core of the expert system is a statistical model that describes the relation between data and geological interpretation made by a geological expert. This facilitates fast and consistent modelling of very large areas. It will enable the construction of models with high resolution as the system will "learn" the geology of an area directly from interpretations made by a geological expert, and instantly apply it to all hard data in the survey area, ensuring the utilisation of all the data available in the geological model. Another feature is that the statistical model the system creates for one area can be used in another area with similar data and geology. This feature can be useful as an aid to an untrained geologist to build a geological model, guided by the experienced geologist way of interpretation, as quantified by the expert system in the core statistical model. In this project presentation we provide some examples of the problems we are aiming to address in the project, and show some preliminary results.

  8. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, Antoine; Edwards, T.C.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  9. Impact of Early and Late Visual Deprivation on the Structure of the Corpus Callosum: A Study Combining Thickness Profile with Surface Tensor-Based Morphometry.

    PubMed

    Shi, Jie; Collignon, Olivier; Xu, Liang; Wang, Gang; Kang, Yue; Leporé, Franco; Lao, Yi; Joshi, Anand A; Leporé, Natasha; Wang, Yalin

    2015-07-01

    Blindness represents a unique model to study how visual experience may shape the development of brain organization. Exploring how the structure of the corpus callosum (CC) reorganizes ensuing visual deprivation is of particular interest due to its important functional implication in vision (e.g., via the splenium of the CC). Moreover, comparing early versus late visually deprived individuals has the potential to unravel the existence of a sensitive period for reshaping the CC structure. Here, we develop a novel framework to capture a complete set of shape differences in the CC between congenitally blind (CB), late blind (LB) and sighted control (SC) groups. The CCs were manually segmented from T1-weighted brain MRI and modeled by 3D tetrahedral meshes. We statistically compared the combination of local area and thickness at each point between subject groups. Differences in area are found using surface tensor-based morphometry; thickness is estimated by tracing the streamlines in the volumetric harmonic field. Group differences were assessed on this combined measure using Hotelling's T(2) test. Interestingly, we observed that the total callosal volume did not differ between the groups. However, our fine-grained analysis reveals significant differences mostly localized around the splenium areas between both blind groups and the sighted group (general effects of blindness) and, importantly, specific dissimilarities between the LB and CB groups, illustrating the existence of a sensitive period for reorganization. The new multivariate statistics also gave better effect sizes for detecting morphometric differences, relative to other statistics. They may boost statistical power for CC morphometric analyses.

  10. IMPACT OF EARLY AND LATE VISUAL DEPRIVATION ON THE STRUCTURE OF THE CORPUS CALLOSUM: A STUDY COMBINING THICKNESS PROFILE WITH SURFACE TENSOR-BASED MORPHOMETRY

    PubMed Central

    Shi, Jie; Collignon, Olivier; Xu, Liang; Wang, Gang; Kang, Yue; Leporé, Franco; Lao, Yi; Joshi, Anand A.

    2015-01-01

    Blindness represents a unique model to study how visual experience may shape the development of brain organization. Exploring how the structure of the corpus callosum (CC) reorganizes ensuing visual deprivation is of particular interest due to its important functional implication in vision (e.g. via the splenium of the CC). Moreover, comparing early versus late visually deprived individuals has the potential to unravel the existence of a sensitive period for reshaping the CC structure. Here, we develop a novel framework to capture a complete set of shape differences in the CC between congenitally blind (CB), late blind (LB) and sighted control (SC) groups. The CCs were manually segmented from T1-weighted brain MRI and modeled by 3D tetrahedral meshes. We statistically compared the combination of local area and thickness at each point between subject groups. Differences in area are found using surface tensor-based morphometry; thickness is estimated by tracing the streamlines in the volumetric harmonic field. Group differences were assessed on this combined measure using Hotelling’s T2 test. Interestingly, we observed that the total callosal volume did not differ between the groups. However, our fine-grained analysis reveals significant differences mostly localized around the splenium areas between both blind groups and the sighted group (general effects of blindness) and, importantly, specific dissimilarities between the LB and CB groups, illustrating the existence of a sensitive period for reorganization. The new multivariate statistics also gave better effect sizes for detecting morphometric differences, relative to other statistics. They may boost statistical power for CC morphometric analyses. PMID:25649876

  11. A canonical neural mechanism for behavioral variability

    NASA Astrophysics Data System (ADS)

    Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David

    2017-05-01

    The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5-6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these `universal' statistics.

  12. VizieR Online Data Catalog: HARPS timeseries data for HD41248 (Jenkins+, 2014)

    NASA Astrophysics Data System (ADS)

    Jenkins, J. S.; Tuomi, M.

    2017-05-01

    We modeled the HARPS radial velocities of HD 42148 by adopting the analysis techniques and the statistical model applied in Tuomi et al. (2014, arXiv:1405.2016). This model contains Keplerian signals, a linear trend, a moving average component with exponential smoothing, and linear correlations with activity indices, namely, BIS, FWHM, and chromospheric activity S index. We applied our statistical model outlined above to the full data set of radial velocities for HD 41248, combining the previously published data in Jenkins et al. (2013ApJ...771...41J) with the newly published data in Santos et al. (2014, J/A+A/566/A35), giving rise to a total time series of 223 HARPS (Mayor et al. 2003Msngr.114...20M) velocities. (1 data file).

  13. Statistical Analysis of Individual Participant Data Meta-Analyses: A Comparison of Methods and Recommendations for Practice

    PubMed Central

    Stewart, Gavin B.; Altman, Douglas G.; Askie, Lisa M.; Duley, Lelia; Simmonds, Mark C.; Stewart, Lesley A.

    2012-01-01

    Background Individual participant data (IPD) meta-analyses that obtain “raw” data from studies rather than summary data typically adopt a “two-stage” approach to analysis whereby IPD within trials generate summary measures, which are combined using standard meta-analytical methods. Recently, a range of “one-stage” approaches which combine all individual participant data in a single meta-analysis have been suggested as providing a more powerful and flexible approach. However, they are more complex to implement and require statistical support. This study uses a dataset to compare “two-stage” and “one-stage” models of varying complexity, to ascertain whether results obtained from the approaches differ in a clinically meaningful way. Methods and Findings We included data from 24 randomised controlled trials, evaluating antiplatelet agents, for the prevention of pre-eclampsia in pregnancy. We performed two-stage and one-stage IPD meta-analyses to estimate overall treatment effect and to explore potential treatment interactions whereby particular types of women and their babies might benefit differentially from receiving antiplatelets. Two-stage and one-stage approaches gave similar results, showing a benefit of using anti-platelets (Relative risk 0.90, 95% CI 0.84 to 0.97). Neither approach suggested that any particular type of women benefited more or less from antiplatelets. There were no material differences in results between different types of one-stage model. Conclusions For these data, two-stage and one-stage approaches to analysis produce similar results. Although one-stage models offer a flexible environment for exploring model structure and are useful where across study patterns relating to types of participant, intervention and outcome mask similar relationships within trials, the additional insights provided by their usage may not outweigh the costs of statistical support for routine application in syntheses of randomised controlled trials. Researchers considering undertaking an IPD meta-analysis should not necessarily be deterred by a perceived need for sophisticated statistical methods when combining information from large randomised trials. PMID:23056232

  14. FORSPAN Model Users Guide

    USGS Publications Warehouse

    Klett, T.R.; Charpentier, Ronald R.

    2003-01-01

    The USGS FORSPAN model is designed for the assessment of continuous accumulations of crude oil, natural gas, and natural gas liquids (collectively called petroleum). Continuous (also called ?unconventional?) accumulations have large spatial dimensions and lack well defined down-dip petroleum/water contacts. Oil and natural gas therefore are not localized by buoyancy in water in these accumulations. Continuous accumulations include ?tight gas reservoirs,? coalbed gas, oil and gas in shale, oil and gas in chalk, and shallow biogenic gas. The FORSPAN model treats a continuous accumulation as a collection of petroleumcontaining cells for assessment purposes. Each cell is capable of producing oil or gas, but the cells may vary significantly from one another in their production (and thus economic) characteristics. The potential additions to reserves from continuous petroleum resources are calculated by statistically combining probability distributions of the estimated number of untested cells having the potential for additions to reserves with the estimated volume of oil and natural gas that each of the untested cells may potentially produce (total recovery). One such statistical method for combination of number of cells with total recovery, used by the USGS, is called ACCESS.

  15. A comparative analysis of chaotic particle swarm optimizations for detecting single nucleotide polymorphism barcodes.

    PubMed

    Chuang, Li-Yeh; Moi, Sin-Hua; Lin, Yu-Da; Yang, Cheng-Hong

    2016-10-01

    Evolutionary algorithms could overcome the computational limitations for the statistical evaluation of large datasets for high-order single nucleotide polymorphism (SNP) barcodes. Previous studies have proposed several chaotic particle swarm optimization (CPSO) methods to detect SNP barcodes for disease analysis (e.g., for breast cancer and chronic diseases). This work evaluated additional chaotic maps combined with the particle swarm optimization (PSO) method to detect SNP barcodes using a high-dimensional dataset. Nine chaotic maps were used to improve PSO method results and compared the searching ability amongst all CPSO methods. The XOR and ZZ disease models were used to compare all chaotic maps combined with PSO method. Efficacy evaluations of CPSO methods were based on statistical values from the chi-square test (χ 2 ). The results showed that chaotic maps could improve the searching ability of PSO method when population are trapped in the local optimum. The minor allele frequency (MAF) indicated that, amongst all CPSO methods, the numbers of SNPs, sample size, and the highest χ 2 value in all datasets were found in the Sinai chaotic map combined with PSO method. We used the simple linear regression results of the gbest values in all generations to compare the all methods. Sinai chaotic map combined with PSO method provided the highest β values (β≥0.32 in XOR disease model and β≥0.04 in ZZ disease model) and the significant p-value (p-value<0.001 in both the XOR and ZZ disease models). The Sinai chaotic map was found to effectively enhance the fitness values (χ 2 ) of PSO method, indicating that the Sinai chaotic map combined with PSO method is more effective at detecting potential SNP barcodes in both the XOR and ZZ disease models. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. A methodology for least-squares local quasi-geoid modelling using a noisy satellite-only gravity field model

    NASA Astrophysics Data System (ADS)

    Klees, R.; Slobbe, D. C.; Farahani, H. H.

    2018-04-01

    The paper is about a methodology to combine a noisy satellite-only global gravity field model (GGM) with other noisy datasets to estimate a local quasi-geoid model using weighted least-squares techniques. In this way, we attempt to improve the quality of the estimated quasi-geoid model and to complement it with a full noise covariance matrix for quality control and further data processing. The methodology goes beyond the classical remove-compute-restore approach, which does not account for the noise in the satellite-only GGM. We suggest and analyse three different approaches of data combination. Two of them are based on a local single-scale spherical radial basis function (SRBF) model of the disturbing potential, and one is based on a two-scale SRBF model. Using numerical experiments, we show that a single-scale SRBF model does not fully exploit the information in the satellite-only GGM. We explain this by a lack of flexibility of a single-scale SRBF model to deal with datasets of significantly different bandwidths. The two-scale SRBF model performs well in this respect, provided that the model coefficients representing the two scales are estimated separately. The corresponding methodology is developed in this paper. Using the statistics of the least-squares residuals and the statistics of the errors in the estimated two-scale quasi-geoid model, we demonstrate that the developed methodology provides a two-scale quasi-geoid model, which exploits the information in all datasets.

  17. Detecting changes in dynamic and complex acoustic environments

    PubMed Central

    Boubenec, Yves; Lawlor, Jennifer; Górska, Urszula; Shamma, Shihab; Englitz, Bernhard

    2017-01-01

    Natural sounds such as wind or rain, are characterized by the statistical occurrence of their constituents. Despite their complexity, listeners readily detect changes in these contexts. We here address the neural basis of statistical decision-making using a combination of psychophysics, EEG and modelling. In a texture-based, change-detection paradigm, human performance and reaction times improved with longer pre-change exposure, consistent with improved estimation of baseline statistics. Change-locked and decision-related EEG responses were found in a centro-parietal scalp location, whose slope depended on change size, consistent with sensory evidence accumulation. The potential's amplitude scaled with the duration of pre-change exposure, suggesting a time-dependent decision threshold. Auditory cortex-related potentials showed no response to the change. A dual timescale, statistical estimation model accounted for subjects' performance. Furthermore, a decision-augmented auditory cortex model accounted for performance and reaction times, suggesting that the primary cortical representation requires little post-processing to enable change-detection in complex acoustic environments. DOI: http://dx.doi.org/10.7554/eLife.24910.001 PMID:28262095

  18. Evaluating pictogram prediction in a location-aware augmentative and alternative communication system.

    PubMed

    Garcia, Luís Filipe; de Oliveira, Luís Caldas; de Matos, David Martins

    2016-01-01

    This study compared the performance of two statistical location-aware pictogram prediction mechanisms, with an all-purpose (All) pictogram prediction mechanism, having no location knowledge. The All approach had a unique language model under all locations. One of the location-aware alternatives, the location-specific (Spec) approach, made use of specific language models for pictogram prediction in each location of interest. The other location-aware approach resulted from combining the Spec and the All approaches, and was designated the mixed approach (Mix). In this approach, the language models acquired knowledge from all locations, but a higher relevance was assigned to the vocabulary from the associated location. Results from simulations showed that the Mix and Spec approaches could only outperform the baseline in a statistically significant way if pictogram users reuse more than 50% and 75% of their sentences, respectively. Under low sentence reuse conditions there were no statistically significant differences between the location-aware approaches and the All approach. Under these conditions, the Mix approach performed better than the Spec approach in a statistically significant way.

  19. The potential of composite cognitive scores for tracking progression in Huntington's disease.

    PubMed

    Jones, Rebecca; Stout, Julie C; Labuschagne, Izelle; Say, Miranda; Justo, Damian; Coleman, Allison; Dumas, Eve M; Hart, Ellen; Owen, Gail; Durr, Alexandra; Leavitt, Blair R; Roos, Raymund; O'Regan, Alison; Langbehn, Doug; Tabrizi, Sarah J; Frost, Chris

    2014-01-01

    Composite scores derived from joint statistical modelling of individual risk factors are widely used to identify individuals who are at increased risk of developing disease or of faster disease progression. We investigated the ability of composite measures developed using statistical models to differentiate progressive cognitive deterioration in Huntington's disease (HD) from natural decline in healthy controls. Using longitudinal data from TRACK-HD, the optimal combinations of quantitative cognitive measures to differentiate premanifest and early stage HD individuals respectively from controls was determined using logistic regression. Composite scores were calculated from the parameters of each statistical model. Linear regression models were used to calculate effect sizes (ES) quantifying the difference in longitudinal change over 24 months between premanifest and early stage HD groups respectively and controls. ES for the composites were compared with ES for individual cognitive outcomes and other measures used in HD research. The 0.632 bootstrap was used to eliminate biases which result from developing and testing models in the same sample. In early HD, the composite score from the HD change prediction model produced an ES for difference in rate of 24-month change relative to controls of 1.14 (95% CI: 0.90 to 1.39), larger than the ES for any individual cognitive outcome and UHDRS Total Motor Score and Total Functional Capacity. In addition, this composite gave a statistically significant difference in rate of change in premanifest HD compared to controls over 24-months (ES: 0.24; 95% CI: 0.04 to 0.44), even though none of the individual cognitive outcomes produced statistically significant ES over this period. Composite scores developed using appropriate statistical modelling techniques have the potential to materially reduce required sample sizes for randomised controlled trials.

  20. Detecting anomalies in CMB maps: a new method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Neelakanta, Jayanth T., E-mail: jayanthtn@gmail.com

    2015-10-01

    Ever since WMAP announced its first results, different analyses have shown that there is weak evidence for several large-scale anomalies in the CMB data. While the evidence for each anomaly appears to be weak, the fact that there are multiple seemingly unrelated anomalies makes it difficult to account for them via a single statistical fluke. So, one is led to considering a combination of these anomalies. But, if we ''hand-pick'' the anomalies (test statistics) to consider, we are making an a posteriori choice. In this article, we propose two statistics that do not suffer from this problem. The statistics aremore » linear and quadratic combinations of the a{sub ℓ m}'s with random co-efficients, and they test the null hypothesis that the a{sub ℓ m}'s are independent, normally-distributed, zero-mean random variables with an m-independent variance. The motivation for considering multiple modes is this: because most physical models that lead to large-scale anomalies result in coupling multiple ℓ and m modes, the ''coherence'' of this coupling should get enhanced if a combination of different modes is considered. In this sense, the statistics are thus much more generic than those that have been hitherto considered in literature. Using fiducial data, we demonstrate that the method works and discuss how it can be used with actual CMB data to make quite general statements about the incompatibility of the data with the null hypothesis.« less

  1. Identifying pleiotropic genes in genome-wide association studies from related subjects using the linear mixed model and Fisher combination function.

    PubMed

    Yang, James J; Williams, L Keoki; Buu, Anne

    2017-08-24

    A multivariate genome-wide association test is proposed for analyzing data on multivariate quantitative phenotypes collected from related subjects. The proposed method is a two-step approach. The first step models the association between the genotype and marginal phenotype using a linear mixed model. The second step uses the correlation between residuals of the linear mixed model to estimate the null distribution of the Fisher combination test statistic. The simulation results show that the proposed method controls the type I error rate and is more powerful than the marginal tests across different population structures (admixed or non-admixed) and relatedness (related or independent). The statistical analysis on the database of the Study of Addiction: Genetics and Environment (SAGE) demonstrates that applying the multivariate association test may facilitate identification of the pleiotropic genes contributing to the risk for alcohol dependence commonly expressed by four correlated phenotypes. This study proposes a multivariate method for identifying pleiotropic genes while adjusting for cryptic relatedness and population structure between subjects. The two-step approach is not only powerful but also computationally efficient even when the number of subjects and the number of phenotypes are both very large.

  2. Improving phylogenetic analyses by incorporating additional information from genetic sequence databases.

    PubMed

    Liang, Li-Jung; Weiss, Robert E; Redelings, Benjamin; Suchard, Marc A

    2009-10-01

    Statistical analyses of phylogenetic data culminate in uncertain estimates of underlying model parameters. Lack of additional data hinders the ability to reduce this uncertainty, as the original phylogenetic dataset is often complete, containing the entire gene or genome information available for the given set of taxa. Informative priors in a Bayesian analysis can reduce posterior uncertainty; however, publicly available phylogenetic software specifies vague priors for model parameters by default. We build objective and informative priors using hierarchical random effect models that combine additional datasets whose parameters are not of direct interest but are similar to the analysis of interest. We propose principled statistical methods that permit more precise parameter estimates in phylogenetic analyses by creating informative priors for parameters of interest. Using additional sequence datasets from our lab or public databases, we construct a fully Bayesian semiparametric hierarchical model to combine datasets. A dynamic iteratively reweighted Markov chain Monte Carlo algorithm conveniently recycles posterior samples from the individual analyses. We demonstrate the value of our approach by examining the insertion-deletion (indel) process in the enolase gene across the Tree of Life using the phylogenetic software BALI-PHY; we incorporate prior information about indels from 82 curated alignments downloaded from the BAliBASE database.

  3. Therminator: Configuring the Underlying Statistical Mechanics Model

    DTIC Science & Technology

    2003-12-01

    1502, Oct. 7–10, 2002. [13] Ralph P . Grimaldi , Discrete and Combinational Mathematics, 4th Edition, Addison Wesley Longman, New York, 2000. [14...Eagle Co-Advisor John P . Powers Chairman, Department of Electrical and Computer Engineering Peter J. Denning Chairman, Department of...

  4. Model Considerations for Memory-based Automatic Music Transcription

    NASA Astrophysics Data System (ADS)

    Albrecht, Štěpán; Šmídl, Václav

    2009-12-01

    The problem of automatic music description is considered. The recorded music is modeled as a superposition of known sounds from a library weighted by unknown weights. Similar observation models are commonly used in statistics and machine learning. Many methods for estimation of the weights are available. These methods differ in the assumptions imposed on the weights. In Bayesian paradigm, these assumptions are typically expressed in the form of prior probability density function (pdf) on the weights. In this paper, commonly used assumptions about music signal are summarized and complemented by a new assumption. These assumptions are translated into pdfs and combined into a single prior density using combination of pdfs. Validity of the model is tested in simulation using synthetic data.

  5. Landsat test of diffuse reflectance models for aquatic suspended solids measurement

    NASA Technical Reports Server (NTRS)

    Munday, J. C., Jr.; Alfoldi, T. T.

    1979-01-01

    Landsat radiance data were used to test mathematical models relating diffuse reflectance to aquatic suspended solids concentration. Digital CCT data for Landsat passes over the Bay of Fundy, Nova Scotia were analyzed on a General Electric Co. Image 100 multispectral analysis system. Three data sets were studied separately and together in all combinations with and without solar angle correction. Statistical analysis and chromaticity analysis show that a nonlinear relationship between Landsat radiance and suspended solids concentration is better at curve-fitting than a linear relationship. In particular, the quasi-single-scattering diffuse reflectance model developed by Gordon and coworkers is corroborated. The Gordon model applied to 33 points of MSS 5 data combined from three dates produced r = 0.98.

  6. Composite load spectra for select space propulsion structural components

    NASA Technical Reports Server (NTRS)

    Newell, J. F.; Kurth, R. E.; Ho, H.

    1991-01-01

    The objective of this program is to develop generic load models with multiple levels of progressive sophistication to simulate the composite (combined) load spectra that are induced in space propulsion system components, representative of Space Shuttle Main Engines (SSME), such as transfer ducts, turbine blades, and liquid oxygen posts and system ducting. The first approach will consist of using state of the art probabilistic methods to describe the individual loading conditions and combinations of these loading conditions to synthesize the composite load spectra simulation. The second approach will consist of developing coupled models for composite load spectra simulation which combine the deterministic models for composite load dynamic, acoustic, high pressure, and high rotational speed, etc., load simulation using statistically varying coefficients. These coefficients will then be determined using advanced probabilistic simulation methods with and without strategically selected experimental data.

  7. Accuracy Evaluation of the Unified P-Value from Combining Correlated P-Values

    PubMed Central

    Alves, Gelio; Yu, Yi-Kuo

    2014-01-01

    Meta-analysis methods that combine -values into a single unified -value are frequently employed to improve confidence in hypothesis testing. An assumption made by most meta-analysis methods is that the -values to be combined are independent, which may not always be true. To investigate the accuracy of the unified -value from combining correlated -values, we have evaluated a family of statistical methods that combine: independent, weighted independent, correlated, and weighted correlated -values. Statistical accuracy evaluation by combining simulated correlated -values showed that correlation among -values can have a significant effect on the accuracy of the combined -value obtained. Among the statistical methods evaluated those that weight -values compute more accurate combined -values than those that do not. Also, statistical methods that utilize the correlation information have the best performance, producing significantly more accurate combined -values. In our study we have demonstrated that statistical methods that combine -values based on the assumption of independence can produce inaccurate -values when combining correlated -values, even when the -values are only weakly correlated. Therefore, to prevent from drawing false conclusions during hypothesis testing, our study advises caution be used when interpreting the -value obtained from combining -values of unknown correlation. However, when the correlation information is available, the weighting-capable statistical method, first introduced by Brown and recently modified by Hou, seems to perform the best amongst the methods investigated. PMID:24663491

  8. Statistical core design methodology using the VIPRE thermal-hydraulics code

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lloyd, M.W.; Feltus, M.A.

    1994-12-31

    This Penn State Statistical Core Design Methodology (PSSCDM) is unique because it not only includes the EPRI correlation/test data standard deviation but also the computational uncertainty for the VIPRE code model and the new composite box design correlation. The resultant PSSCDM equation mimics the EPRI DNBR correlation results well, with an uncertainty of 0.0389. The combined uncertainty yields a new DNBR limit of 1.18 that will provide more plant operational flexibility. This methodology and its associated correlation and uniqe coefficients are for a very particular VIPRE model; thus, the correlation will be specifically linked with the lumped channel and subchannelmore » layout. The results of this research and methodology, however, can be applied to plant-specific VIPRE models.« less

  9. Extracting and Converting Quantitative Data into Human Error Probabilities

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tuan Q. Tran; Ronald L. Boring; Jeffrey C. Joe

    2007-08-01

    This paper discusses a proposed method using a combination of advanced statistical approaches (e.g., meta-analysis, regression, structural equation modeling) that will not only convert different empirical results into a common metric for scaling individual PSFs effects, but will also examine the complex interrelationships among PSFs. Furthermore, the paper discusses how the derived statistical estimates (i.e., effect sizes) can be mapped onto a HRA method (e.g. SPAR-H) to generate HEPs that can then be use in probabilistic risk assessment (PRA). The paper concludes with a discussion of the benefits of using academic literature in assisting HRA analysts in generating sound HEPsmore » and HRA developers in validating current HRA models and formulating new HRA models.« less

  10. Comparison of Histograms for Use in Cloud Observation and Modeling

    NASA Technical Reports Server (NTRS)

    Green, Lisa; Xu, Kuan-Man

    2005-01-01

    Cloud observation and cloud modeling data can be presented in histograms for each characteristic to be measured. Combining information from single-cloud histograms yields a summary histogram. Summary histograms can be compared to each other to reach conclusions about the behavior of an ensemble of clouds in different places at different times or about the accuracy of a particular cloud model. As in any scientific comparison, it is necessary to decide whether any apparent differences are statistically significant. The usual methods of deciding statistical significance when comparing histograms do not apply in this case because they assume independent data. Thus, a new method is necessary. The proposed method uses the Euclidean distance metric and bootstrapping to calculate the significance level.

  11. A Review of Statistical Failure Time Models with Application of a Discrete Hazard Based Model to 1Cr1Mo-0.25V Steel for Turbine Rotors and Shafts

    PubMed Central

    2017-01-01

    Producing predictions of the probabilistic risks of operating materials for given lengths of time at stated operating conditions requires the assimilation of existing deterministic creep life prediction models (that only predict the average failure time) with statistical models that capture the random component of creep. To date, these approaches have rarely been combined to achieve this objective. The first half of this paper therefore provides a summary review of some statistical models to help bridge the gap between these two approaches. The second half of the paper illustrates one possible assimilation using 1Cr1Mo-0.25V steel. The Wilshire equation for creep life prediction is integrated into a discrete hazard based statistical model—the former being chosen because of its novelty and proven capability in accurately predicting average failure times and the latter being chosen because of its flexibility in modelling the failure time distribution. Using this model it was found that, for example, if this material had been in operation for around 15 years at 823 K and 130 MPa, the chances of failure in the next year is around 35%. However, if this material had been in operation for around 25 years, the chance of failure in the next year rises dramatically to around 80%. PMID:29039773

  12. A mathematical model for HIV and hepatitis C co-infection and its assessment from a statistical perspective.

    PubMed

    Castro Sanchez, Amparo Yovanna; Aerts, Marc; Shkedy, Ziv; Vickerman, Peter; Faggiano, Fabrizio; Salamina, Guiseppe; Hens, Niel

    2013-03-01

    The hepatitis C virus (HCV) and the human immunodeficiency virus (HIV) are a clear threat for public health, with high prevalences especially in high risk groups such as injecting drug users. People with HIV infection who are also infected by HCV suffer from a more rapid progression to HCV-related liver disease and have an increased risk for cirrhosis and liver cancer. Quantifying the impact of HIV and HCV co-infection is therefore of great importance. We propose a new joint mathematical model accounting for co-infection with the two viruses in the context of injecting drug users (IDUs). Statistical concepts and methods are used to assess the model from a statistical perspective, in order to get further insights in: (i) the comparison and selection of optional model components, (ii) the unknown values of the numerous model parameters, (iii) the parameters to which the model is most 'sensitive' and (iv) the combinations or patterns of values in the high-dimensional parameter space which are most supported by the data. Data from a longitudinal study of heroin users in Italy are used to illustrate the application of the proposed joint model and its statistical assessment. The parameters associated with contact rates (sharing syringes) and the transmission rates per syringe-sharing event are shown to play a major role. Copyright © 2013 Elsevier B.V. All rights reserved.

  13. An expert panel-based study on recognition of gastro-esophageal reflux in difficult esophageal pH-impedance tracings.

    PubMed

    Smits, M J; Loots, C M; van Wijk, M P; Bredenoord, A J; Benninga, M A; Smout, A J P M

    2015-05-01

    Despite existing criteria for scoring gastro-esophageal reflux (GER) in esophageal multichannel pH-impedance measurement (pH-I) tracings, inter- and intra-rater variability is large and agreement with automated analysis is poor. To identify parameters of difficult to analyze pH-I patterns and combine these into a statistical model that can identify GER episodes with an international consensus as gold standard. Twenty-one experts from 10 countries were asked to mark GER presence for adult and pediatric pH-I patterns in an online pre-assessment. During a consensus meeting, experts voted on patterns not reaching majority consensus (>70% agreement). Agreement was calculated between raters, between consensus and individual raters, and between consensus and software generated automated analysis. With eight selected parameters, multiple logistic regression analysis was performed to describe an algorithm sensitive and specific for detection of GER. Majority consensus was reached for 35/79 episodes in the online pre-assessment (interrater κ = 0.332). Mean agreement between pre-assessment scores and final consensus was moderate (κ = 0.466). Combining eight pH-I parameters did not result in a statistically significant model able to identify presence of GER. Recognizing a pattern as retrograde is the best indicator of GER, with 100% sensitivity and 81% specificity with expert consensus as gold standard. Agreement between experts scoring difficult impedance patterns for presence or absence of GER is poor. Combining several characteristics into a statistical model did not improve diagnostic accuracy. Only the parameter 'retrograde propagation pattern' is an indicator of GER in difficult pH-I patterns. © 2015 John Wiley & Sons Ltd.

  14. Cosmology Constraints from the Weak Lensing Peak Counts and the Power Spectrum in CFHTLenS

    DOE PAGES

    Liu, Jia; May, Morgan; Petri, Andrea; ...

    2015-03-04

    Lensing peaks have been proposed as a useful statistic, containing cosmological information from non-Gaussianities that is inaccessible from traditional two-point statistics such as the power spectrum or two-point correlation functions. Here we examine constraints on cosmological parameters from weak lensing peak counts, using the publicly available data from the 154 deg2 CFHTLenS survey. We utilize a new suite of ray-tracing N-body simulations on a grid of 91 cosmological models, covering broad ranges of the three parameters Ω m, σ 8, and w, and replicating the galaxy sky positions, redshifts, and shape noise in the CFHTLenS observations. We then build anmore » emulator that interpolates the power spectrum and the peak counts to an accuracy of ≤ 5%, and compute the likelihood in the three-dimensional parameter space (Ω m, σ 8, w) from both observables. We find that constraints from peak counts are comparable to those from the power spectrum, and somewhat tighter when different smoothing scales are combined. Neither observable can constrain w without external data. When the power spectrum and peak counts are combined, the area of the error “banana” in the (Ω m, σ 8) plane reduces by a factor of ≈ two, compared to using the power spectrum alone. For a flat Λ cold dark matter model, combining both statistics, we obtain the constraint σ 8(Ω m/0.27)0.63 = 0.85 +0.03 -0.03.« less

  15. Cosmology Constraints from the Weak Lensing Peak Counts and the Power Spectrum in CFHTLenS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Jia; May, Morgan; Petri, Andrea

    Lensing peaks have been proposed as a useful statistic, containing cosmological information from non-Gaussianities that is inaccessible from traditional two-point statistics such as the power spectrum or two-point correlation functions. Here we examine constraints on cosmological parameters from weak lensing peak counts, using the publicly available data from the 154 deg2 CFHTLenS survey. We utilize a new suite of ray-tracing N-body simulations on a grid of 91 cosmological models, covering broad ranges of the three parameters Ω m, σ 8, and w, and replicating the galaxy sky positions, redshifts, and shape noise in the CFHTLenS observations. We then build anmore » emulator that interpolates the power spectrum and the peak counts to an accuracy of ≤ 5%, and compute the likelihood in the three-dimensional parameter space (Ω m, σ 8, w) from both observables. We find that constraints from peak counts are comparable to those from the power spectrum, and somewhat tighter when different smoothing scales are combined. Neither observable can constrain w without external data. When the power spectrum and peak counts are combined, the area of the error “banana” in the (Ω m, σ 8) plane reduces by a factor of ≈ two, compared to using the power spectrum alone. For a flat Λ cold dark matter model, combining both statistics, we obtain the constraint σ 8(Ω m/0.27)0.63 = 0.85 +0.03 -0.03.« less

  16. A new in silico classification model for ready biodegradability, based on molecular fragments.

    PubMed

    Lombardo, Anna; Pizzo, Fabiola; Benfenati, Emilio; Manganaro, Alberto; Ferrari, Thomas; Gini, Giuseppina

    2014-08-01

    Regulations such as the European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) often require chemicals to be evaluated for ready biodegradability, to assess the potential risk for environmental and human health. Because not all chemicals can be tested, there is an increasing demand for tools for quick and inexpensive biodegradability screening, such as computer-based (in silico) theoretical models. We developed an in silico model starting from a dataset of 728 chemicals with ready biodegradability data (MITI-test Ministry of International Trade and Industry). We used the novel software SARpy to automatically extract, through a structural fragmentation process, a set of substructures statistically related to ready biodegradability. Then, we analysed these substructures in order to build some general rules. The model consists of a rule-set made up of the combination of the statistically relevant fragments and of the expert-based rules. The model gives good statistical performance with 92%, 82% and 76% accuracy on the training, test and external set respectively. These results are comparable with other in silico models like BIOWIN developed by the United States Environmental Protection Agency (EPA); moreover this new model includes an easily understandable explanation. Copyright © 2014 Elsevier Ltd. All rights reserved.

  17. Medial-based deformable models in nonconvex shape-spaces for medical image segmentation.

    PubMed

    McIntosh, Chris; Hamarneh, Ghassan

    2012-01-01

    We explore the application of genetic algorithms (GA) to deformable models through the proposition of a novel method for medical image segmentation that combines GA with nonconvex, localized, medial-based shape statistics. We replace the more typical gradient descent optimizer used in deformable models with GA, and the convex, implicit, global shape statistics with nonconvex, explicit, localized ones. Specifically, we propose GA to reduce typical deformable model weaknesses pertaining to model initialization, pose estimation and local minima, through the simultaneous evolution of a large number of models. Furthermore, we constrain the evolution, and thus reduce the size of the search-space, by using statistically-based deformable models whose deformations are intuitive (stretch, bulge, bend) and are driven in terms of localized principal modes of variation, instead of modes of variation across the entire shape that often fail to capture localized shape changes. Although GA are not guaranteed to achieve the global optima, our method compares favorably to the prevalent optimization techniques, convex/nonconvex gradient-based optimizers and to globally optimal graph-theoretic combinatorial optimization techniques, when applied to the task of corpus callosum segmentation in 50 mid-sagittal brain magnetic resonance images.

  18. Contrail Tracking and ARM Data Product Development

    NASA Technical Reports Server (NTRS)

    Duda, David P.; Russell, James, III

    2005-01-01

    A contrail tracking system was developed to help in the assessment of the effect of commercial jet contrails on the Earth's radiative budget. The tracking system was built by combining meteorological data from the Rapid Update Cycle (RUC) numerical weather prediction model with commercial air traffic flight track data and satellite imagery. A statistical contrail-forecasting model was created a combination of surface-based contrail observations and numerical weather analyses and forecasts. This model allows predictions of widespread contrail occurrences for contrail research on either a real-time basis or for long-term time scales. Satellite-derived cirrus cloud properties in polluted and unpolluted regions were compared to determine the impact of air traffic on cirrus.

  19. Evaluation of respiratory and cardiac motion correction schemes in dual gated PET/CT cardiac imaging.

    PubMed

    Lamare, F; Le Maitre, A; Dawood, M; Schäfers, K P; Fernandez, P; Rimoldi, O E; Visvikis, D

    2014-07-01

    Cardiac imaging suffers from both respiratory and cardiac motion. One of the proposed solutions involves double gated acquisitions. Although such an approach may lead to both respiratory and cardiac motion compensation there are issues associated with (a) the combination of data from cardiac and respiratory motion bins, and (b) poor statistical quality images as a result of using only part of the acquired data. The main objective of this work was to evaluate different schemes of combining binned data in order to identify the best strategy to reconstruct motion free cardiac images from dual gated positron emission tomography (PET) acquisitions. A digital phantom study as well as seven human studies were used in this evaluation. PET data were acquired in list mode (LM). A real-time position management system and an electrocardiogram device were used to provide the respiratory and cardiac motion triggers registered within the LM file. Acquired data were subsequently binned considering four and six cardiac gates, or the diastole only in combination with eight respiratory amplitude gates. PET images were corrected for attenuation, but no randoms nor scatter corrections were included. Reconstructed images from each of the bins considered above were subsequently used in combination with an affine or an elastic registration algorithm to derive transformation parameters allowing the combination of all acquired data in a particular position in the cardiac and respiratory cycles. Images were assessed in terms of signal-to-noise ratio (SNR), contrast, image profile, coefficient-of-variation (COV), and relative difference of the recovered activity concentration. Regardless of the considered motion compensation strategy, the nonrigid motion model performed better than the affine model, leading to higher SNR and contrast combined with a lower COV. Nevertheless, when compensating for respiration only, no statistically significant differences were observed in the performance of the two motion models considered. Superior image SNR and contrast were seen using the affine respiratory motion model in combination with the diastole cardiac bin in comparison to the use of the whole cardiac cycle. In contrast, when simultaneously correcting for cardiac beating and respiration, the elastic respiratory motion model outperformed the affine model. In this context, four cardiac bins associated with eight respiratory amplitude bins seemed to be adequate. Considering the compensation of respiratory motion effects only, both affine and elastic based approaches led to an accurate resizing and positioning of the myocardium. The use of the diastolic phase combined with an affine model based respiratory motion correction may therefore be a simple approach leading to significant quality improvements in cardiac PET imaging. However, the best performance was obtained with the combined correction for both cardiac and respiratory movements considering all the dual-gated bins independently through the use of an elastic model based motion compensation.

  20. QCD Precision Measurements and Structure Function Extraction at a High Statistics, High Energy Neutrino Scattering Experiment:. NuSOnG

    NASA Astrophysics Data System (ADS)

    Adams, T.; Batra, P.; Bugel, L.; Camilleri, L.; Conrad, J. M.; de Gouvêa, A.; Fisher, P. H.; Formaggio, J. A.; Jenkins, J.; Karagiorgi, G.; Kobilarcik, T. R.; Kopp, S.; Kyle, G.; Loinaz, W. A.; Mason, D. A.; Milner, R.; Moore, R.; Morfín, J. G.; Nakamura, M.; Naples, D.; Nienaber, P.; Olness, F. I.; Owens, J. F.; Pate, S. F.; Pronin, A.; Seligman, W. G.; Shaevitz, M. H.; Schellman, H.; Schienbein, I.; Syphers, M. J.; Tait, T. M. P.; Takeuchi, T.; Tan, C. Y.; van de Water, R. G.; Yamamoto, R. K.; Yu, J. Y.

    We extend the physics case for a new high-energy, ultra-high statistics neutrino scattering experiment, NuSOnG (Neutrino Scattering On Glass) to address a variety of issues including precision QCD measurements, extraction of structure functions, and the derived Parton Distribution Functions (PDF's). This experiment uses a Tevatron-based neutrino beam to obtain a sample of Deep Inelastic Scattering (DIS) events which is over two orders of magnitude larger than past samples. We outline an innovative method for fitting the structure functions using a parametrized energy shift which yields reduced systematic uncertainties. High statistics measurements, in combination with improved systematics, will enable NuSOnG to perform discerning tests of fundamental Standard Model parameters as we search for deviations which may hint of "Beyond the Standard Model" physics.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhang, Fangyan; Zhang, Song; Chung Wong, Pak

    Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less

  2. Inferring Causalities in Landscape Genetics: An Extension of Wright's Causal Modeling to Distance Matrices.

    PubMed

    Fourtune, Lisa; Prunier, Jérôme G; Paz-Vinas, Ivan; Loot, Géraldine; Veyssière, Charlotte; Blanchet, Simon

    2018-04-01

    Identifying landscape features that affect functional connectivity among populations is a major challenge in fundamental and applied sciences. Landscape genetics combines landscape and genetic data to address this issue, with the main objective of disentangling direct and indirect relationships among an intricate set of variables. Causal modeling has strong potential to address the complex nature of landscape genetic data sets. However, this statistical approach was not initially developed to address the pairwise distance matrices commonly used in landscape genetics. Here, we aimed to extend the applicability of two causal modeling methods-that is, maximum-likelihood path analysis and the directional separation test-by developing statistical approaches aimed at handling distance matrices and improving functional connectivity inference. Using simulations, we showed that these approaches greatly improved the robustness of the absolute (using a frequentist approach) and relative (using an information-theoretic approach) fits of the tested models. We used an empirical data set combining genetic information on a freshwater fish species (Gobio occitaniae) and detailed landscape descriptors to demonstrate the usefulness of causal modeling to identify functional connectivity in wild populations. Specifically, we demonstrated how direct and indirect relationships involving altitude, temperature, and oxygen concentration influenced within- and between-population genetic diversity of G. occitaniae.

  3. Spectroscopic Diagnosis of Arsenic Contamination in Agricultural Soils

    PubMed Central

    Shi, Tiezhu; Liu, Huizeng; Chen, Yiyun; Fei, Teng; Wang, Junjie; Wu, Guofeng

    2017-01-01

    This study investigated the abilities of pre-processing, feature selection and machine-learning methods for the spectroscopic diagnosis of soil arsenic contamination. The spectral data were pre-processed by using Savitzky-Golay smoothing, first and second derivatives, multiplicative scatter correction, standard normal variate, and mean centering. Principle component analysis (PCA) and the RELIEF algorithm were used to extract spectral features. Machine-learning methods, including random forests (RF), artificial neural network (ANN), radial basis function- and linear function- based support vector machine (RBF- and LF-SVM) were employed for establishing diagnosis models. The model accuracies were evaluated and compared by using overall accuracies (OAs). The statistical significance of the difference between models was evaluated by using McNemar’s test (Z value). The results showed that the OAs varied with the different combinations of pre-processing, feature selection, and classification methods. Feature selection methods could improve the modeling efficiencies and diagnosis accuracies, and RELIEF often outperformed PCA. The optimal models established by RF (OA = 86%), ANN (OA = 89%), RBF- (OA = 89%) and LF-SVM (OA = 87%) had no statistical difference in diagnosis accuracies (Z < 1.96, p < 0.05). These results indicated that it was feasible to diagnose soil arsenic contamination using reflectance spectroscopy. The appropriate combination of multivariate methods was important to improve diagnosis accuracies. PMID:28471412

  4. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study.

    PubMed

    van der Krieke, Lian; Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith Gm; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-08-07

    Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher's tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use.

  5. Ecological Momentary Assessments and Automated Time Series Analysis to Promote Tailored Health Care: A Proof-of-Principle Study

    PubMed Central

    Emerencia, Ando C; Bos, Elisabeth H; Rosmalen, Judith GM; Riese, Harriëtte; Aiello, Marco; Sytema, Sjoerd; de Jonge, Peter

    2015-01-01

    Background Health promotion can be tailored by combining ecological momentary assessments (EMA) with time series analysis. This combined method allows for studying the temporal order of dynamic relationships among variables, which may provide concrete indications for intervention. However, application of this method in health care practice is hampered because analyses are conducted manually and advanced statistical expertise is required. Objective This study aims to show how this limitation can be overcome by introducing automated vector autoregressive modeling (VAR) of EMA data and to evaluate its feasibility through comparisons with results of previously published manual analyses. Methods We developed a Web-based open source application, called AutoVAR, which automates time series analyses of EMA data and provides output that is intended to be interpretable by nonexperts. The statistical technique we used was VAR. AutoVAR tests and evaluates all possible VAR models within a given combinatorial search space and summarizes their results, thereby replacing the researcher’s tasks of conducting the analysis, making an informed selection of models, and choosing the best model. We compared the output of AutoVAR to the output of a previously published manual analysis (n=4). Results An illustrative example consisting of 4 analyses was provided. Compared to the manual output, the AutoVAR output presents similar model characteristics and statistical results in terms of the Akaike information criterion, the Bayesian information criterion, and the test statistic of the Granger causality test. Conclusions Results suggest that automated analysis and interpretation of times series is feasible. Compared to a manual procedure, the automated procedure is more robust and can save days of time. These findings may pave the way for using time series analysis for health promotion on a larger scale. AutoVAR was evaluated using the results of a previously conducted manual analysis. Analysis of additional datasets is needed in order to validate and refine the application for general use. PMID:26254160

  6. An Electrochemical Impedance Spectroscopy-Based Technique to Identify and Quantify Fermentable Sugars in Pineapple Waste Valorization for Bioethanol Production

    PubMed Central

    Conesa, Claudia; García-Breijo, Eduardo; Loeff, Edwin; Seguí, Lucía; Fito, Pedro; Laguarda-Miró, Nicolás

    2015-01-01

    Electrochemical Impedance Spectroscopy (EIS) has been used to develop a methodology able to identify and quantify fermentable sugars present in the enzymatic hydrolysis phase of second-generation bioethanol production from pineapple waste. Thus, a low-cost non-destructive system consisting of a stainless double needle electrode associated to an electronic equipment that allows the implementation of EIS was developed. In order to validate the system, different concentrations of glucose, fructose and sucrose were added to the pineapple waste and analyzed both individually and in combination. Next, statistical data treatment enabled the design of specific Artificial Neural Networks-based mathematical models for each one of the studied sugars and their respective combinations. The obtained prediction models are robust and reliable and they are considered statistically valid (CCR% > 93.443%). These results allow us to introduce this EIS-based technique as an easy, fast, non-destructive, and in-situ alternative to the traditional laboratory methods for enzymatic hydrolysis monitoring. PMID:26378537

  7. NETWORK ASSISTED ANALYSIS TO REVEAL THE GENETIC BASIS OF AUTISM1

    PubMed Central

    Liu, Li; Lei, Jing; Roeder, Kathryn

    2016-01-01

    While studies show that autism is highly heritable, the nature of the genetic basis of this disorder remains illusive. Based on the idea that highly correlated genes are functionally interrelated and more likely to affect risk, we develop a novel statistical tool to find more potentially autism risk genes by combining the genetic association scores with gene co-expression in specific brain regions and periods of development. The gene dependence network is estimated using a novel partial neighborhood selection (PNS) algorithm, where node specific properties are incorporated into network estimation for improved statistical and computational efficiency. Then we adopt a hidden Markov random field (HMRF) model to combine the estimated network and the genetic association scores in a systematic manner. The proposed modeling framework can be naturally extended to incorporate additional structural information concerning the dependence between genes. Using currently available genetic association data from whole exome sequencing studies and brain gene expression levels, the proposed algorithm successfully identified 333 genes that plausibly affect autism risk. PMID:27134692

  8. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox

    PubMed Central

    Tu, Yu-Kang; Gunnell, David; Gilthorpe, Mark S

    2008-01-01

    This article discusses three statistical paradoxes that pervade epidemiological research: Simpson's paradox, Lord's paradox, and suppression. These paradoxes have important implications for the interpretation of evidence from observational studies. This article uses hypothetical scenarios to illustrate how the three paradoxes are different manifestations of one phenomenon – the reversal paradox – depending on whether the outcome and explanatory variables are categorical, continuous or a combination of both; this renders the issues and remedies for any one to be similar for all three. Although the three statistical paradoxes occur in different types of variables, they share the same characteristic: the association between two variables can be reversed, diminished, or enhanced when another variable is statistically controlled for. Understanding the concepts and theory behind these paradoxes provides insights into some controversial or contradictory research findings. These paradoxes show that prior knowledge and underlying causal theory play an important role in the statistical modelling of epidemiological data, where incorrect use of statistical models might produce consistent, replicable, yet erroneous results. PMID:18211676

  9. Intratumoral delivery of docetaxel enhances antitumor activity of Ad-p53 in murine head and neck cancer xenograft model.

    PubMed

    Yoo, George H; Subramanian, Geetha; Ezzat, Waleed H; Tulunay, Ozlem E; Tran, Vivian R; Lonardo, Fulvio; Ensley, John F; Kim, Harold; Won, Joshua; Stevens, Timothy; Zumstein, Louis A; Lin, Ho-Sheng

    2010-01-01

    The aim of this study is to determine the ability of intratumorally delivered docetaxel to enhance the antitumor activity of adenovirus-mediated delivery of p53 (Ad-p53) in murine head and neck cancer xenograft model. A xenograft head and neck squamous cell carcinoma mouse model was used. Mice were randomized into 4 groups of 6 mice receiving 6 weeks of biweekly intratumoral injection of (a) diluent, (b) Ad-p53 (1 x 10(10) viral particles per injection), (c) docetaxel (1 mg/kg per injection), and (d) combination of Ad-p53 (1 x 10(10) viral particles per injection) and docetaxel (1 mg/kg per injection). Tumor size, weight, toxicity, and overall and disease-free survival rates were determined. Intratumoral treatments with either docetaxel alone or Ad-p53 alone resulted in statistically significant antitumor activity and improved survival compared with control group. Furthermore, combined delivery of Ad-p53 and docetaxel resulted in a statistically significant reduction in tumor weight when compared to treatment with either Ad-p53 or docetaxel alone. Intratumoral delivery of docetaxel enhanced the antitumor effect of Ad-p53 in murine head and neck cancer xenograft model. The result of this preclinical in vivo study is promising and supports further clinical testing to evaluate efficacy of combined intratumoral docetaxel and Ad-p53 in treatment of head and neck cancer. Copyright (c) 2010 Elsevier Inc. All rights reserved.

  10. Reduced-Drift Virtual Gyro from an Array of Low-Cost Gyros.

    PubMed

    Vaccaro, Richard J; Zaki, Ahmed S

    2017-02-11

    A Kalman filter approach for combining the outputs of an array of high-drift gyros to obtain a virtual lower-drift gyro has been known in the literature for more than a decade. The success of this approach depends on the correlations of the random drift components of the individual gyros. However, no method of estimating these correlations has appeared in the literature. This paper presents an algorithm for obtaining the statistical model for an array of gyros, including the cross-correlations of the individual random drift components. In order to obtain this model, a new statistic, called the "Allan covariance" between two gyros, is introduced. The gyro array model can be used to obtain the Kalman filter-based (KFB) virtual gyro. Instead, we consider a virtual gyro obtained by taking a linear combination of individual gyro outputs. The gyro array model is used to calculate the optimal coefficients, as well as to derive a formula for the drift of the resulting virtual gyro. The drift formula for the optimal linear combination (OLC) virtual gyro is identical to that previously derived for the KFB virtual gyro. Thus, a Kalman filter is not necessary to obtain a minimum drift virtual gyro. The theoretical results of this paper are demonstrated using simulated as well as experimental data. In experimental results with a 28-gyro array, the OLC virtual gyro has a drift spectral density 40 times smaller than that obtained by taking the average of the gyro signals.

  11. 3D automatic anatomy recognition based on iterative graph-cut-ASM

    NASA Astrophysics Data System (ADS)

    Chen, Xinjian; Udupa, Jayaram K.; Bagci, Ulas; Alavi, Abass; Torigian, Drew A.

    2010-02-01

    We call the computerized assistive process of recognizing, delineating, and quantifying organs and tissue regions in medical imaging, occurring automatically during clinical image interpretation, automatic anatomy recognition (AAR). The AAR system we are developing includes five main parts: model building, object recognition, object delineation, pathology detection, and organ system quantification. In this paper, we focus on the delineation part. For the modeling part, we employ the active shape model (ASM) strategy. For recognition and delineation, we integrate several hybrid strategies of combining purely image based methods with ASM. In this paper, an iterative Graph-Cut ASM (IGCASM) method is proposed for object delineation. An algorithm called GC-ASM was presented at this symposium last year for object delineation in 2D images which attempted to combine synergistically ASM and GC. Here, we extend this method to 3D medical image delineation. The IGCASM method effectively combines the rich statistical shape information embodied in ASM with the globally optimal delineation capability of the GC method. We propose a new GC cost function, which effectively integrates the specific image information with the ASM shape model information. The proposed methods are tested on a clinical abdominal CT data set. The preliminary results show that: (a) it is feasible to explicitly bring prior 3D statistical shape information into the GC framework; (b) the 3D IGCASM delineation method improves on ASM and GC and can provide practical operational time on clinical images.

  12. Association analysis of multiple traits by an approach of combining P values.

    PubMed

    Chen, Lili; Wang, Yong; Zhou, Yajing

    2018-03-01

    Increasing evidence shows that one variant can affect multiple traits, which is a widespread phenomenon in complex diseases. Joint analysis of multiple traits can increase statistical power of association analysis and uncover the underlying genetic mechanism. Although there are many statistical methods to analyse multiple traits, most of these methods are usually suitable for detecting common variants associated with multiple traits. However, because of low minor allele frequency of rare variant, these methods are not optimal for rare variant association analysis. In this paper, we extend an adaptive combination of P values method (termed ADA) for single trait to test association between multiple traits and rare variants in the given region. For a given region, we use reverse regression model to test each rare variant associated with multiple traits and obtain the P value of single-variant test. Further, we take the weighted combination of these P values as the test statistic. Extensive simulation studies show that our approach is more powerful than several other comparison methods in most cases and is robust to the inclusion of a high proportion of neutral variants and the different directions of effects of causal variants.

  13. Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

    NASA Astrophysics Data System (ADS)

    Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

    2018-07-01

    A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.

  14. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  15. A comparison of resampling schemes for estimating model observer performance with small ensembles

    NASA Astrophysics Data System (ADS)

    Elshahaby, Fatma E. A.; Jha, Abhinav K.; Ghaly, Michael; Frey, Eric C.

    2017-09-01

    In objective assessment of image quality, an ensemble of images is used to compute the 1st and 2nd order statistics of the data. Often, only a finite number of images is available, leading to the issue of statistical variability in numerical observer performance. Resampling-based strategies can help overcome this issue. In this paper, we compared different combinations of resampling schemes (the leave-one-out (LOO) and the half-train/half-test (HT/HT)) and model observers (the conventional channelized Hotelling observer (CHO), channelized linear discriminant (CLD) and channelized quadratic discriminant). Observer performance was quantified by the area under the ROC curve (AUC). For a binary classification task and for each observer, the AUC value for an ensemble size of 2000 samples per class served as a gold standard for that observer. Results indicated that each observer yielded a different performance depending on the ensemble size and the resampling scheme. For a small ensemble size, the combination [CHO, HT/HT] had more accurate rankings than the combination [CHO, LOO]. Using the LOO scheme, the CLD and CHO had similar performance for large ensembles. However, the CLD outperformed the CHO and gave more accurate rankings for smaller ensembles. As the ensemble size decreased, the performance of the [CHO, LOO] combination seriously deteriorated as opposed to the [CLD, LOO] combination. Thus, it might be desirable to use the CLD with the LOO scheme when smaller ensemble size is available.

  16. Statistical appearance models based on probabilistic correspondences.

    PubMed

    Krüger, Julia; Ehrhardt, Jan; Handels, Heinz

    2017-04-01

    Model-based image analysis is indispensable in medical image processing. One key aspect of building statistical shape and appearance models is the determination of one-to-one correspondences in the training data set. At the same time, the identification of these correspondences is the most challenging part of such methods. In our earlier work, we developed an alternative method using correspondence probabilities instead of exact one-to-one correspondences for a statistical shape model (Hufnagel et al., 2008). In this work, a new approach for statistical appearance models without one-to-one correspondences is proposed. A sparse image representation is used to build a model that combines point position and appearance information at the same time. Probabilistic correspondences between the derived multi-dimensional feature vectors are used to omit the need for extensive preprocessing of finding landmarks and correspondences as well as to reduce the dependence of the generated model on the landmark positions. Model generation and model fitting can now be expressed by optimizing a single global criterion derived from a maximum a-posteriori (MAP) approach with respect to model parameters that directly affect both shape and appearance of the considered objects inside the images. The proposed approach describes statistical appearance modeling in a concise and flexible mathematical framework. Besides eliminating the demand for costly correspondence determination, the method allows for additional constraints as topological regularity in the modeling process. In the evaluation the model was applied for segmentation and landmark identification in hand X-ray images. The results demonstrate the feasibility of the model to detect hand contours as well as the positions of the joints between finger bones for unseen test images. Further, we evaluated the model on brain data of stroke patients to show the ability of the proposed model to handle partially corrupted data and to demonstrate a possible employment of the correspondence probabilities to indicate these corrupted/pathological areas. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Statistical Modeling of Single Target Cell Encapsulation

    PubMed Central

    Moon, SangJun; Ceyhan, Elvan; Gurkan, Umut Atakan; Demirci, Utkan

    2011-01-01

    High throughput drop-on-demand systems for separation and encapsulation of individual target cells from heterogeneous mixtures of multiple cell types is an emerging method in biotechnology that has broad applications in tissue engineering and regenerative medicine, genomics, and cryobiology. However, cell encapsulation in droplets is a random process that is hard to control. Statistical models can provide an understanding of the underlying processes and estimation of the relevant parameters, and enable reliable and repeatable control over the encapsulation of cells in droplets during the isolation process with high confidence level. We have modeled and experimentally verified a microdroplet-based cell encapsulation process for various combinations of cell loading and target cell concentrations. Here, we explain theoretically and validate experimentally a model to isolate and pattern single target cells from heterogeneous mixtures without using complex peripheral systems. PMID:21814548

  18. Nonlinear multi-analysis of agent-based financial market dynamics by epidemic system

    NASA Astrophysics Data System (ADS)

    Lu, Yunfan; Wang, Jun; Niu, Hongli

    2015-10-01

    Based on the epidemic dynamical system, we construct a new agent-based financial time series model. In order to check and testify its rationality, we compare the statistical properties of the time series model with the real stock market indices, Shanghai Stock Exchange Composite Index and Shenzhen Stock Exchange Component Index. For analyzing the statistical properties, we combine the multi-parameter analysis with the tail distribution analysis, the modified rescaled range analysis, and the multifractal detrended fluctuation analysis. For a better perspective, the three-dimensional diagrams are used to present the analysis results. The empirical research in this paper indicates that the long-range dependence property and the multifractal phenomenon exist in the real returns and the proposed model. Therefore, the new agent-based financial model can recurrence some important features of real stock markets.

  19. A canonical neural mechanism for behavioral variability

    PubMed Central

    Darshan, Ran; Wood, William E.; Peters, Susan; Leblois, Arthur; Hansel, David

    2017-01-01

    The ability to generate variable movements is essential for learning and adjusting complex behaviours. This variability has been linked to the temporal irregularity of neuronal activity in the central nervous system. However, how neuronal irregularity actually translates into behavioural variability is unclear. Here we combine modelling, electrophysiological and behavioural studies to address this issue. We demonstrate that a model circuit comprising topographically organized and strongly recurrent neural networks can autonomously generate irregular motor behaviours. Simultaneous recordings of neurons in singing finches reveal that neural correlations increase across the circuit driving song variability, in agreement with the model predictions. Analysing behavioural data, we find remarkable similarities in the babbling statistics of 5–6-month-old human infants and juveniles from three songbird species and show that our model naturally accounts for these ‘universal' statistics. PMID:28530225

  20. Modelling nitrate pollution pressure using a multivariate statistical approach: the case of Kinshasa groundwater body, Democratic Republic of Congo

    NASA Astrophysics Data System (ADS)

    Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik

    2016-03-01

    A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.

  1. Biopsychosocial influence on exercise-induced injury: genetic and psychological combinations are predictive of shoulder pain phenotypes.

    PubMed

    George, Steven Z; Parr, Jeffrey J; Wallace, Margaret R; Wu, Samuel S; Borsa, Paul A; Dai, Yunfeng; Fillingim, Roger B

    2014-01-01

    Chronic pain is influenced by biological, psychological, social, and cultural factors. The current study investigated potential roles for combinations of genetic and psychological factors in the development and/or maintenance of chronic musculoskeletal pain. An exercise-induced shoulder injury model was used, and a priori selected genetic (ADRB2, COMT, OPRM1, AVPR1 A, GCH1, and KCNS1) and psychological (anxiety, depressive symptoms, pain catastrophizing, fear of pain, and kinesiophobia) factors were included as predictors. Pain phenotypes were shoulder pain intensity (5-day average and peak reported on numerical rating scale), upper extremity disability (5-day average and peak reported on the QuickDASH), and shoulder pain duration (in days). After controlling for age, sex, and race, the genetic and psychological predictors were entered as main effects and interaction terms in separate regression models for the different pain phenotypes. Results from the recruited cohort (N = 190) indicated strong statistical evidence for interactions between the COMT diplotype and 1) pain catastrophizing for 5-day average upper extremity disability and 2) depressive symptoms for pain duration. There was moderate statistical evidence for interactions for other shoulder pain phenotypes between additional genes (ADRB2, AVPR1 A, and KCNS1) and depressive symptoms, pain catastrophizing, or kinesiophobia. These findings confirm the importance of the combined predictive ability of COMT with psychological distress and reveal other novel combinations of genetic and psychological factors that may merit additional investigation in other pain cohorts. Interactions between genetic and psychological factors were investigated as predictors of different exercise-induced shoulder pain phenotypes. The strongest statistical evidence was for interactions between the COMT diplotype and pain catastrophizing (for upper extremity disability) or depressive symptoms (for pain duration). Other novel genetic and psychological combinations were identified that may merit further investigation. Copyright © 2014 American Pain Society. Published by Elsevier Inc. All rights reserved.

  2. Season-ahead water quality forecasts for the Schuylkill River, Pennsylvania

    NASA Astrophysics Data System (ADS)

    Block, P. J.; Leung, K.

    2013-12-01

    Anticipating and preparing for elevated water quality parameter levels in critical water sources, using weather forecasts, is not uncommon. In this study, we explore the feasibility of extending this prediction scale to a season-ahead for the Schuylkill River in Philadelphia, utilizing both statistical and dynamical prediction models, to characterize the season. This advance information has relevance for recreational activities, ecosystem health, and water treatment, as the Schuylkill provides 40% of Philadelphia's water supply. The statistical model associates large-scale climate drivers with streamflow and water quality parameter levels; numerous variables from NOAA's CFSv2 model are evaluated for the dynamical approach. A multi-model combination is also assessed. Results indicate moderately skillful prediction of average summertime total coliform and wintertime turbidity, using season-ahead oceanic and atmospheric variables, predominantly from the North Atlantic Ocean. Models predicting the number of elevated turbidity events across the wintertime season are also explored.

  3. Estimating Preferential Flow in Karstic Aquifers Using Statistical Mixed Models

    PubMed Central

    Anaya, Angel A.; Padilla, Ingrid; Macchiavelli, Raul; Vesper, Dorothy J.; Meeker, John D.; Alshawabkeh, Akram N.

    2013-01-01

    Karst aquifers are highly productive groundwater systems often associated with conduit flow. These systems can be highly vulnerable to contamination, resulting in a high potential for contaminant exposure to humans and ecosystems. This work develops statistical models to spatially characterize flow and transport patterns in karstified limestone and determines the effect of aquifer flow rates on these patterns. A laboratory-scale Geo-HydroBed model is used to simulate flow and transport processes in a karstic limestone unit. The model consists of stainless-steel tanks containing a karstified limestone block collected from a karst aquifer formation in northern Puerto Rico. Experimental work involves making a series of flow and tracer injections, while monitoring hydraulic and tracer response spatially and temporally. Statistical mixed models are applied to hydraulic data to determine likely pathways of preferential flow in the limestone units. The models indicate a highly heterogeneous system with dominant, flow-dependent preferential flow regions. Results indicate that regions of preferential flow tend to expand at higher groundwater flow rates, suggesting a greater volume of the system being flushed by flowing water at higher rates. Spatial and temporal distribution of tracer concentrations indicates the presence of conduit-like and diffuse flow transport in the system, supporting the notion of both combined transport mechanisms in the limestone unit. The temporal response of tracer concentrations at different locations in the model coincide with, and confirms the preferential flow distribution generated with the statistical mixed models used in the study. PMID:23802921

  4. Uniting statistical and individual-based approaches for animal movement modelling.

    PubMed

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems.

  5. Uniting Statistical and Individual-Based Approaches for Animal Movement Modelling

    PubMed Central

    Latombe, Guillaume; Parrott, Lael; Basille, Mathieu; Fortin, Daniel

    2014-01-01

    The dynamic nature of their internal states and the environment directly shape animals' spatial behaviours and give rise to emergent properties at broader scales in natural systems. However, integrating these dynamic features into habitat selection studies remains challenging, due to practically impossible field work to access internal states and the inability of current statistical models to produce dynamic outputs. To address these issues, we developed a robust method, which combines statistical and individual-based modelling. Using a statistical technique for forward modelling of the IBM has the advantage of being faster for parameterization than a pure inverse modelling technique and allows for robust selection of parameters. Using GPS locations from caribou monitored in Québec, caribou movements were modelled based on generative mechanisms accounting for dynamic variables at a low level of emergence. These variables were accessed by replicating real individuals' movements in parallel sub-models, and movement parameters were then empirically parameterized using Step Selection Functions. The final IBM model was validated using both k-fold cross-validation and emergent patterns validation and was tested for two different scenarios, with varying hardwood encroachment. Our results highlighted a functional response in habitat selection, which suggests that our method was able to capture the complexity of the natural system, and adequately provided projections on future possible states of the system in response to different management plans. This is especially relevant for testing the long-term impact of scenarios corresponding to environmental configurations that have yet to be observed in real systems. PMID:24979047

  6. Hierarchical modeling and inference in ecology: The analysis of data from populations, metapopulations and communities

    USGS Publications Warehouse

    Royle, J. Andrew; Dorazio, Robert M.

    2008-01-01

    A guide to data collection, modeling and inference strategies for biological survey data using Bayesian and classical statistical methods. This book describes a general and flexible framework for modeling and inference in ecological systems based on hierarchical models, with a strict focus on the use of probability models and parametric inference. Hierarchical models represent a paradigm shift in the application of statistics to ecological inference problems because they combine explicit models of ecological system structure or dynamics with models of how ecological systems are observed. The principles of hierarchical modeling are developed and applied to problems in population, metapopulation, community, and metacommunity systems. The book provides the first synthetic treatment of many recent methodological advances in ecological modeling and unifies disparate methods and procedures. The authors apply principles of hierarchical modeling to ecological problems, including * occurrence or occupancy models for estimating species distribution * abundance models based on many sampling protocols, including distance sampling * capture-recapture models with individual effects * spatial capture-recapture models based on camera trapping and related methods * population and metapopulation dynamic models * models of biodiversity, community structure and dynamics.

  7. Are Three Sheets Enough? Using Toilet Paper to Teach Science and Mathematics

    ERIC Educational Resources Information Center

    Woolverton, Christopher J.; Woolverton, Lyssa N.

    2006-01-01

    Toilet paper (TP) composition and physical characteristics were used to model scientific investigations that combined several "National Science Education Standards." Experiments with TP permitted the integration of TP history, societal change resulting from invention, mathematics (including geometry and statistics), germ theory, and personal…

  8. Research Analysis on MOOC Course Dropout and Retention Rates

    ERIC Educational Resources Information Center

    Gomez-Zermeno, Marcela Gerogina; Aleman de La Garza, Lorena

    2016-01-01

    This research's objective was to identify the terminal efficiency of the Massive Online Open Course "Educational Innovation with Open Resources" offered by a Mexican private university. A quantitative methodology was used, combining descriptive statistics and probabilistic models to analyze the levels of retention, completion, and…

  9. OvAge: a new methodology to quantify ovarian reserve combining clinical, biochemical and 3D-ultrasonographic parameters.

    PubMed

    Venturella, Roberta; Lico, Daniela; Sarica, Alessia; Falbo, Maria Pia; Gulletta, Elio; Morelli, Michele; Zupi, Errico; Cevenini, Gabriele; Cannataro, Mario; Zullo, Fulvio

    2015-04-08

    In the last decade, both endocrine and ultrasound data have been tested to verify their usefulness for assessing ovarian reserve, but the ideal marker does not yet exist. The purpose of this study was to find, if any, a statistical advanced model able to identify a simple, easy to understand and intuitive modality for defining ovarian age by combining clinical, biochemical and 3D-ultrasonographic data. This is a population-based observational study. From January 2012 to March 2014, we enrolled 652 healthy fertile women, 29 patients with clinical suspect of premature ovarian insufficiency (POI) and 29 patients with Polycystic Ovary syndrome (PCOS) at the Unit of Obstetrics & Gynecology of Magna Graecia University of Catanzaro (Italy). In all women we measured Anti Müllerian Hormone (AMH), Follicle Stimulating Hormone (FSH), Estradiol (E2), 3D Antral Follicle Count (AFC), ovarian volume, Vascular Index (VI) and Flow Index (FI) between days 1 and 4 of menstrual cycle. We applied the Generalized Linear Models (GzLM) for producing an equation combining these data to provide a ready to use information about women ovarian reserve, here called OvAge. To introduce this new variable, expression of ovarian reserve, we assumed that in healthy fertile women ovarian age is identical to chronological age. GzLM applied on the healthy fertile controls dataset produced the following equation OvAge = 48.05 - 3.14*AHM + 0.07*FSH - 0.77*AFC - 0.11*FI + 0.25*VI + 0.1*AMH*AFC + 0.02*FSH*AFC. This model showed a high statistical significance for each marker included in the equation. We applied the final equation on POI and PCOS datasets to test its ability of discovering significant deviation from normality and we obtained a mean of predicted ovarian age significantly different from the mean of chronological age in both groups. OvAge is one of the first reliable attempt to create a new method able to identify a simple, easy to understand and intuitive modality for defining ovarian reserve by combining clinical, biochemical and 3D-ultrasonographic data. Although design data prove a statistical high accuracy of the model, we are going to plan a clinical validation of model reliability in predicting reproductive prognosis and distance to menopause.

  10. Eutrophication risk assessment in coastal embayments using simple statistical models.

    PubMed

    Arhonditsis, G; Eleftheriadou, M; Karydis, M; Tsirtsis, G

    2003-09-01

    A statistical methodology is proposed for assessing the risk of eutrophication in marine coastal embayments. The procedure followed was the development of regression models relating the levels of chlorophyll a (Chl) with the concentration of the limiting nutrient--usually nitrogen--and the renewal rate of the systems. The method was applied in the Gulf of Gera, Island of Lesvos, Aegean Sea and a surrogate for renewal rate was created using the Canberra metric as a measure of the resemblance between the Gulf and the oligotrophic waters of the open sea in terms of their physical, chemical and biological properties. The Chl-total dissolved nitrogen-renewal rate regression model was the most significant, accounting for 60% of the variation observed in Chl. Predicted distributions of Chl for various combinations of the independent variables, based on Bayesian analysis of the models, enabled comparison of the outcomes of specific scenarios of interest as well as further analysis of the system dynamics. The present statistical approach can be used as a methodological tool for testing the resilience of coastal ecosystems under alternative managerial schemes and levels of exogenous nutrient loading.

  11. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

    PubMed Central

    2012-01-01

    Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998

  12. Assimilating the Future for Better Forecasts and Earlier Warnings

    NASA Astrophysics Data System (ADS)

    Du, H.; Wheatcroft, E.; Smith, L. A.

    2016-12-01

    Multi-model ensembles have become popular tools to account for some of the uncertainty due to model inadequacy in weather and climate simulation-based predictions. The current multi-model forecasts focus on combining single model ensemble forecasts by means of statistical post-processing. Assuming each model is developed independently or with different primary target variables, each is likely to contain different dynamical strengths and weaknesses. Using statistical post-processing, such information is only carried by the simulations under a single model ensemble: no advantage is taken to influence simulations under the other models. A novel methodology, named Multi-model Cross Pollination in Time, is proposed for multi-model ensemble scheme with the aim of integrating the dynamical information regarding the future from each individual model operationally. The proposed approach generates model states in time via applying data assimilation scheme(s) to yield truly "multi-model trajectories". It is demonstrated to outperform traditional statistical post-processing in the 40-dimensional Lorenz96 flow. Data assimilation approaches are originally designed to improve state estimation from the past to the current time. The aim of this talk is to introduce a framework that uses data assimilation to improve model forecasts at future time (not to argue for any one particular data assimilation scheme). Illustration of applying data assimilation "in the future" to provide early warning of future high-impact events is also presented.

  13. Measurement-based reliability prediction methodology. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Linn, Linda Shen

    1991-01-01

    In the past, analytical and measurement based models were developed to characterize computer system behavior. An open issue is how these models can be used, if at all, for system design improvement. The issue is addressed here. A combined statistical/analytical approach to use measurements from one environment to model the system failure behavior in a new environment is proposed. A comparison of the predicted results with the actual data from the new environment shows a close correspondence.

  14. Modeling Composite Assessment Data Using Item Response Theory

    PubMed Central

    Ueckert, Sebastian

    2018-01-01

    Composite assessments aim to combine different aspects of a disease in a single score and are utilized in a variety of therapeutic areas. The data arising from these evaluations are inherently discrete with distinct statistical properties. This tutorial presents the framework of the item response theory (IRT) for the analysis of this data type in a pharmacometric context. The article considers both conceptual (terms and assumptions) and practical questions (modeling software, data requirements, and model building). PMID:29493119

  15. Combining super-ensembles and statistical emulation to improve a regional climate and vegetation model

    NASA Astrophysics Data System (ADS)

    Hawkins, L. R.; Rupp, D. E.; Li, S.; Sarah, S.; McNeall, D. J.; Mote, P.; Betts, R. A.; Wallom, D.

    2017-12-01

    Changing regional patterns of surface temperature, precipitation, and humidity may cause ecosystem-scale changes in vegetation, altering the distribution of trees, shrubs, and grasses. A changing vegetation distribution, in turn, alters the albedo, latent heat flux, and carbon exchanged with the atmosphere with resulting feedbacks onto the regional climate. However, a wide range of earth-system processes that affect the carbon, energy, and hydrologic cycles occur at sub grid scales in climate models and must be parameterized. The appropriate parameter values in such parameterizations are often poorly constrained, leading to uncertainty in predictions of how the ecosystem will respond to changes in forcing. To better understand the sensitivity of regional climate to parameter selection and to improve regional climate and vegetation simulations, we used a large perturbed physics ensemble and a suite of statistical emulators. We dynamically downscaled a super-ensemble (multiple parameter sets and multiple initial conditions) of global climate simulations using a 25-km resolution regional climate model HadRM3p with the land-surface scheme MOSES2 and dynamic vegetation module TRIFFID. We simultaneously perturbed land surface parameters relating to the exchange of carbon, water, and energy between the land surface and atmosphere in a large super-ensemble of regional climate simulations over the western US. Statistical emulation was used as a computationally cost-effective tool to explore uncertainties in interactions. Regions of parameter space that did not satisfy observational constraints were eliminated and an ensemble of parameter sets that reduce regional biases and span a range of plausible interactions among earth system processes were selected. This study demonstrated that by combining super-ensemble simulations with statistical emulation, simulations of regional climate could be improved while simultaneously accounting for a range of plausible land-atmosphere feedback strengths.

  16. Linear and nonlinear methods in modeling the aqueous solubility of organic compounds.

    PubMed

    Catana, Cornel; Gao, Hua; Orrenius, Christian; Stouten, Pieter F W

    2005-01-01

    Solubility data for 930 diverse compounds have been analyzed using linear Partial Least Square (PLS) and nonlinear PLS methods, Continuum Regression (CR), and Neural Networks (NN). 1D and 2D descriptors from MOE package in combination with E-state or ISIS keys have been used. The best model was obtained using linear PLS for a combination between 22 MOE descriptors and 65 ISIS keys. It has a correlation coefficient (r2) of 0.935 and a root-mean-square error (RMSE) of 0.468 log molar solubility (log S(w)). The model validated on a test set of 177 compounds not included in the training set has r2 0.911 and RMSE 0.475 log S(w). The descriptors were ranked according to their importance, and at the top of the list have been found the 22 MOE descriptors. The CR model produced results as good as PLS, and because of the way in which cross-validation has been done it is expected to be a valuable tool in prediction besides PLS model. The statistics obtained using nonlinear methods did not surpass those got with linear ones. The good statistic obtained for linear PLS and CR recommends these models to be used in prediction when it is difficult or impossible to make experimental measurements, for virtual screening, combinatorial library design, and efficient leads optimization.

  17. Helicity statistics in homogeneous and isotropic turbulence and turbulence models

    NASA Astrophysics Data System (ADS)

    Sahoo, Ganapati; De Pietro, Massimo; Biferale, Luca

    2017-02-01

    We study the statistical properties of helicity in direct numerical simulations of fully developed homogeneous and isotropic turbulence and in a class of turbulence shell models. We consider correlation functions based on combinations of vorticity and velocity increments that are not invariant under mirror symmetry. We also study the scaling properties of high-order structure functions based on the moments of the velocity increments projected on a subset of modes with either positive or negative helicity (chirality). We show that mirror symmetry is recovered at small scales, i.e., chiral terms are subleading and they are well captured by a dimensional argument plus anomalous corrections. These findings are also supported by a high Reynolds numbers study of helical shell models with the same chiral symmetry of Navier-Stokes equations.

  18. Vehicle track segmentation using higher order random fields

    DOE PAGES

    Quach, Tu -Thach

    2017-01-09

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  19. Vehicle track segmentation using higher order random fields

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quach, Tu -Thach

    Here, we present an approach to segment vehicle tracks in coherent change detection images, a product of combining two synthetic aperture radar images taken at different times. The approach uses multiscale higher order random field models to capture track statistics, such as curvatures and their parallel nature, that are not currently utilized in existing methods. These statistics are encoded as 3-by-3 patterns at different scales. The model can complete disconnected tracks often caused by sensor noise and various environmental effects. Coupling the model with a simple classifier, our approach is effective at segmenting salient tracks. We improve the F-measure onmore » a standard vehicle track data set to 0.963, up from 0.897 obtained by the current state-of-the-art method.« less

  20. Using exploratory regression to identify optimal driving factors for cellular automaton modeling of land use change.

    PubMed

    Feng, Yongjiu; Tong, Xiaohua

    2017-09-22

    Defining transition rules is an important issue in cellular automaton (CA)-based land use modeling because these models incorporate highly correlated driving factors. Multicollinearity among correlated driving factors may produce negative effects that must be eliminated from the modeling. Using exploratory regression under pre-defined criteria, we identified all possible combinations of factors from the candidate factors affecting land use change. Three combinations that incorporate five driving factors meeting pre-defined criteria were assessed. With the selected combinations of factors, three logistic regression-based CA models were built to simulate dynamic land use change in Shanghai, China, from 2000 to 2015. For comparative purposes, a CA model with all candidate factors was also applied to simulate the land use change. Simulations using three CA models with multicollinearity eliminated performed better (with accuracy improvements about 3.6%) than the model incorporating all candidate factors. Our results showed that not all candidate factors are necessary for accurate CA modeling and the simulations were not sensitive to changes in statistically non-significant driving factors. We conclude that exploratory regression is an effective method to search for the optimal combinations of driving factors, leading to better land use change models that are devoid of multicollinearity. We suggest identification of dominant factors and elimination of multicollinearity before building land change models, making it possible to simulate more realistic outcomes.

  1. A product of independent beta probabilities dose escalation design for dual-agent phase I trials.

    PubMed

    Mander, Adrian P; Sweeting, Michael J

    2015-04-15

    Dual-agent trials are now increasingly common in oncology research, and many proposed dose-escalation designs are available in the statistical literature. Despite this, the translation from statistical design to practical application is slow, as has been highlighted in single-agent phase I trials, where a 3 + 3 rule-based design is often still used. To expedite this process, new dose-escalation designs need to be not only scientifically beneficial but also easy to understand and implement by clinicians. In this paper, we propose a curve-free (nonparametric) design for a dual-agent trial in which the model parameters are the probabilities of toxicity at each of the dose combinations. We show that it is relatively trivial for a clinician's prior beliefs or historical information to be incorporated in the model and updating is fast and computationally simple through the use of conjugate Bayesian inference. Monotonicity is ensured by considering only a set of monotonic contours for the distribution of the maximum tolerated contour, which defines the dose-escalation decision process. Varied experimentation around the contour is achievable, and multiple dose combinations can be recommended to take forward to phase II. Code for R, Stata and Excel are available for implementation. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  2. Anatomy of the Higgs fits: A first guide to statistical treatments of the theoretical uncertainties

    NASA Astrophysics Data System (ADS)

    Fichet, Sylvain; Moreau, Grégory

    2016-04-01

    The studies of the Higgs boson couplings based on the recent and upcoming LHC data open up a new window on physics beyond the Standard Model. In this paper, we propose a statistical guide to the consistent treatment of the theoretical uncertainties entering the Higgs rate fits. Both the Bayesian and frequentist approaches are systematically analysed in a unified formalism. We present analytical expressions for the marginal likelihoods, useful to implement simultaneously the experimental and theoretical uncertainties. We review the various origins of the theoretical errors (QCD, EFT, PDF, production mode contamination…). All these individual uncertainties are thoroughly combined with the help of moment-based considerations. The theoretical correlations among Higgs detection channels appear to affect the location and size of the best-fit regions in the space of Higgs couplings. We discuss the recurrent question of the shape of the prior distributions for the individual theoretical errors and find that a nearly Gaussian prior arises from the error combinations. We also develop the bias approach, which is an alternative to marginalisation providing more conservative results. The statistical framework to apply the bias principle is introduced and two realisations of the bias are proposed. Finally, depending on the statistical treatment, the Standard Model prediction for the Higgs signal strengths is found to lie within either the 68% or 95% confidence level region obtained from the latest analyses of the 7 and 8 TeV LHC datasets.

  3. Discrete-element modeling of nacre-like materials: Effects of random microstructures on strain localization and mechanical performance

    NASA Astrophysics Data System (ADS)

    Abid, Najmul; Mirkhalaf, Mohammad; Barthelat, Francois

    2018-03-01

    Natural materials such as nacre, collagen, and spider silk are composed of staggered stiff and strong inclusions in a softer matrix. This type of hybrid microstructure results in remarkable combinations of stiffness, strength, and toughness and it now inspires novel classes of high-performance composites. However, the analytical and numerical approaches used to predict and optimize the mechanics of staggered composites often neglect statistical variations and inhomogeneities, which may have significant impacts on modulus, strength, and toughness. Here we present an analysis of localization using small representative volume elements (RVEs) and large scale statistical volume elements (SVEs) based on the discrete element method (DEM). DEM is an efficient numerical method which enabled the evaluation of more than 10,000 microstructures in this study, each including about 5,000 inclusions. The models explore the combined effects of statistics, inclusion arrangement, and interface properties. We find that statistical variations have a negative effect on all properties, in particular on the ductility and energy absorption because randomness precipitates the localization of deformations. However, the results also show that the negative effects of random microstructures can be offset by interfaces with large strain at failure accompanied by strain hardening. More specifically, this quantitative study reveals an optimal range of interface properties where the interfaces are the most effective at delaying localization. These findings show how carefully designed interfaces in bioinspired staggered composites can offset the negative effects of microstructural randomness, which is inherent to most current fabrication methods.

  4. Introductory comments on the USGS geographic applications program

    NASA Technical Reports Server (NTRS)

    Gerlach, A. C.

    1970-01-01

    The third phase of remote sensing technologies and potentials applied to the operations of the U.S. Geological Survey is introduced. Remote sensing data with multidisciplinary spatial data from traditional sources is combined with geographic theory and techniques of environmental modeling. These combined imputs are subject to four sequential activities that involve: (1) thermatic mapping of land use and environmental factors; (2) the dynamics of change detection; (3) environmental surveillance to identify sudden changes and general trends; and (4) preparation of statistical model and analytical reports. Geography program functions, products, clients, and goals are presented in graphical form, along with aircraft photo missions, geography test sites, and FY-70.

  5. Sampling design considerations for demographic studies: a case of colonial seabirds

    USGS Publications Warehouse

    Kendall, William L.; Converse, Sarah J.; Doherty, Paul F.; Naughton, Maura B.; Anders, Angela; Hines, James E.; Flint, Elizabeth

    2009-01-01

    For the purposes of making many informed conservation decisions, the main goal for data collection is to assess population status and allow prediction of the consequences of candidate management actions. Reducing the bias and variance of estimates of population parameters reduces uncertainty in population status and projections, thereby reducing the overall uncertainty under which a population manager must make a decision. In capture-recapture studies, imperfect detection of individuals, unobservable life-history states, local movement outside study areas, and tag loss can cause bias or precision problems with estimates of population parameters. Furthermore, excessive disturbance to individuals during capture?recapture sampling may be of concern because disturbance may have demographic consequences. We address these problems using as an example a monitoring program for Black-footed Albatross (Phoebastria nigripes) and Laysan Albatross (Phoebastria immutabilis) nesting populations in the northwestern Hawaiian Islands. To mitigate these estimation problems, we describe a synergistic combination of sampling design and modeling approaches. Solutions include multiple capture periods per season and multistate, robust design statistical models, dead recoveries and incidental observations, telemetry and data loggers, buffer areas around study plots to neutralize the effect of local movements outside study plots, and double banding and statistical models that account for band loss. We also present a variation on the robust capture?recapture design and a corresponding statistical model that minimizes disturbance to individuals. For the albatross case study, this less invasive robust design was more time efficient and, when used in combination with a traditional robust design, reduced the standard error of detection probability by 14% with only two hours of additional effort in the field. These field techniques and associated modeling approaches are applicable to studies of most taxa being marked and in some cases have individually been applied to studies of birds, fish, herpetofauna, and mammals.

  6. Multi-level emulation of complex climate model responses to boundary forcing data

    NASA Astrophysics Data System (ADS)

    Tran, Giang T.; Oliver, Kevin I. C.; Holden, Philip B.; Edwards, Neil R.; Sóbester, András; Challenor, Peter

    2018-04-01

    Climate model components involve both high-dimensional input and output fields. It is desirable to efficiently generate spatio-temporal outputs of these models for applications in integrated assessment modelling or to assess the statistical relationship between such sets of inputs and outputs, for example, uncertainty analysis. However, the need for efficiency often compromises the fidelity of output through the use of low complexity models. Here, we develop a technique which combines statistical emulation with a dimensionality reduction technique to emulate a wide range of outputs from an atmospheric general circulation model, PLASIM, as functions of the boundary forcing prescribed by the ocean component of a lower complexity climate model, GENIE-1. Although accurate and detailed spatial information on atmospheric variables such as precipitation and wind speed is well beyond the capability of GENIE-1's energy-moisture balance model of the atmosphere, this study demonstrates that the output of this model is useful in predicting PLASIM's spatio-temporal fields through multi-level emulation. Meaningful information from the fast model, GENIE-1 was extracted by utilising the correlation between variables of the same type in the two models and between variables of different types in PLASIM. We present here the construction and validation of several PLASIM variable emulators and discuss their potential use in developing a hybrid model with statistical components.

  7. Forecasting daily source air quality using multivariate statistical analysis and radial basis function networks.

    PubMed

    Sun, Gang; Hoff, Steven J; Zelle, Brian C; Nelson, Minda A

    2008-12-01

    It is vital to forecast gas and particle matter concentrations and emission rates (GPCER) from livestock production facilities to assess the impact of airborne pollutants on human health, ecological environment, and global warming. Modeling source air quality is a complex process because of abundant nonlinear interactions between GPCER and other factors. The objective of this study was to introduce statistical methods and radial basis function (RBF) neural network to predict daily source air quality in Iowa swine deep-pit finishing buildings. The results show that four variables (outdoor and indoor temperature, animal units, and ventilation rates) were identified as relative important model inputs using statistical methods. It can be further demonstrated that only two factors, the environment factor and the animal factor, were capable of explaining more than 94% of the total variability after performing principal component analysis. The introduction of fewer uncorrelated variables to the neural network would result in the reduction of the model structure complexity, minimize computation cost, and eliminate model overfitting problems. The obtained results of RBF network prediction were in good agreement with the actual measurements, with values of the correlation coefficient between 0.741 and 0.995 and very low values of systemic performance indexes for all the models. The good results indicated the RBF network could be trained to model these highly nonlinear relationships. Thus, the RBF neural network technology combined with multivariate statistical methods is a promising tool for air pollutant emissions modeling.

  8. Dataset on predictive compressive strength model for self-compacting concrete.

    PubMed

    Ofuyatan, O M; Edeki, S O

    2018-04-01

    The determination of compressive strength is affected by many variables such as the water cement (WC) ratio, the superplasticizer (SP), the aggregate combination, and the binder combination. In this dataset article, 7, 28, and 90-day compressive strength models are derived using statistical analysis. The response surface methodology is used toinvestigate the effect of the parameters: Varying percentages of ash, cement, WC, and SP on hardened properties-compressive strengthat 7,28 and 90 days. Thelevels of independent parameters are determinedbased on preliminary experiments. The experimental values for compressive strengthat 7, 28 and 90 days and modulus of elasticity underdifferent treatment conditions are also discussed and presented.These dataset can effectively be used for modelling and prediction in concrete production settings.

  9. Molecular vibrational energy flow

    NASA Astrophysics Data System (ADS)

    Gruebele, M.; Bigwood, R.

    This article reviews some recent work in molecular vibrational energy flow (IVR), with emphasis on our own computational and experimental studies. We consider the problem in various representations, and use these to develop a family of simple models which combine specific molecular properties (e.g. size, vibrational frequencies) with statistical properties of the potential energy surface and wavefunctions. This marriage of molecular detail and statistical simplification captures trends of IVR mechanisms and survival probabilities beyond the abilities of purely statistical models or the computational limitations of full ab initio approaches. Of particular interest is IVR in the intermediate time regime, where heavy-atom skeletal modes take over the IVR process from hydrogenic motions even upon X H bond excitation. Experiments and calculations on prototype heavy-atom systems show that intermediate time IVR differs in many aspects from the early stages of hydrogenic mode IVR. As a result, IVR can be coherently frozen, with potential applications to selective chemistry.

  10. An Assessment of Land Surface and Lightning Characteristics Associated with Lightning-Initiated Wildfires

    NASA Technical Reports Server (NTRS)

    Coy, James; Schultz, Christopher J.; Case, Jonathan L.

    2017-01-01

    Can we use modeled information of the land surface and characteristics of lightning beyond flash occurrence to increase the identification and prediction of wildfires? Combine observed cloud-to-ground (CG) flashes with real-time land surface model output, and Compare data with areas where lightning did not start a wildfire to determine what land surface conditions and lightning characteristics were responsible for causing wildfires. Statistical differences between suspected fire-starters and non-fire-starters were peak-current dependent 0-10 cm Volumetric and Relative Soil Moisture comparisons were statistically dependent to at least the p = 0.05 independence level for both polarity flash types Suspected fire-starters typically occurred in areas of lower soil moisture than non-fire-starters. GVF value comparisons were only found to be statistically dependent for -CG flashes. However, random sampling of the -CG non-fire starter dataset revealed that this relationship may not always hold.

  11. A computational visual saliency model based on statistics and machine learning.

    PubMed

    Lin, Ru-Je; Lin, Wei-Song

    2014-08-01

    Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.

  12. [Application of a mathematical algorithm for the detection of electroneuromyographic results in the pathogenesis study of facial dyskinesia].

    PubMed

    Gribova, N P; Iudel'son, Ia B; Golubev, V L; Abramenkova, I V

    2003-01-01

    To carry out a differential diagnosis of two facial dyskinesia (FD) models--facial hemispasm (FH) and facial paraspasm (FP), a combined program of electroneuromyographic (ENMG) examination has been created, using statistical analyses, including that for objects identification based on hybrid neural network with the application of adaptive fuzzy logic method and standard statistics programs (Wilcoxon, Student statistics). In FH, a lesion of peripheral facial neuromotor apparatus with augmentation of functions of inter-neurons in segmental and upper segmental stem levels predominated. In FP, primary afferent strengthening in mimic muscles was accompanied by increased motor neurons activity and reciprocal augmentation of inter-neurons, inhibiting motor portion of V pair. Mathematical algorithm for ENMG results recognition worked out in the study provides a precise differentiation of two FD models and opens possibilities for differential diagnosis of other facial motor disorders.

  13. A phylogenetic transform enhances analysis of compositional microbiota data.

    PubMed

    Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

    2017-02-15

    Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.

  14. The use of analysis of variance procedures in biological studies

    USGS Publications Warehouse

    Williams, B.K.

    1987-01-01

    The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.

  15. Space station software reliability analysis based on failures observed during testing at the multisystem integration facility

    NASA Technical Reports Server (NTRS)

    Tamayo, Tak Chai

    1987-01-01

    Quality of software not only is vital to the successful operation of the space station, it is also an important factor in establishing testing requirements, time needed for software verification and integration as well as launching schedules for the space station. Defense of management decisions can be greatly strengthened by combining engineering judgments with statistical analysis. Unlike hardware, software has the characteristics of no wearout and costly redundancies, thus making traditional statistical analysis not suitable in evaluating reliability of software. A statistical model was developed to provide a representation of the number as well as types of failures occur during software testing and verification. From this model, quantitative measure of software reliability based on failure history during testing are derived. Criteria to terminate testing based on reliability objectives and methods to estimate the expected number of fixings required are also presented.

  16. Prediction of Patient-Controlled Analgesic Consumption: A Multimodel Regression Tree Approach.

    PubMed

    Hu, Yuh-Jyh; Ku, Tien-Hsiung; Yang, Yu-Hung; Shen, Jia-Ying

    2018-01-01

    Several factors contribute to individual variability in postoperative pain, therefore, individuals consume postoperative analgesics at different rates. Although many statistical studies have analyzed postoperative pain and analgesic consumption, most have identified only the correlation and have not subjected the statistical model to further tests in order to evaluate its predictive accuracy. In this study involving 3052 patients, a multistrategy computational approach was developed for analgesic consumption prediction. This approach uses data on patient-controlled analgesia demand behavior over time and combines clustering, classification, and regression to mitigate the limitations of current statistical models. Cross-validation results indicated that the proposed approach significantly outperforms various existing regression methods. Moreover, a comparison between the predictions by anesthesiologists and medical specialists and those of the computational approach for an independent test data set of 60 patients further evidenced the superiority of the computational approach in predicting analgesic consumption because it produced markedly lower root mean squared errors.

  17. Associative Algorithms for Computational Creativity

    ERIC Educational Resources Information Center

    Varshney, Lav R.; Wang, Jun; Varshney, Kush R.

    2016-01-01

    Computational creativity, the generation of new, unimagined ideas or artifacts by a machine that are deemed creative by people, can be applied in the culinary domain to create novel and flavorful dishes. In fact, we have done so successfully using a combinatorial algorithm for recipe generation combined with statistical models for recipe ranking…

  18. A Comparison of Statistical Techniques for Combining Modeled and Observed Concentrations to Create High-Resolution Ozone Air Quality Surfaces

    EPA Science Inventory

    Air quality surfaces representing pollutant concentrations across space and time are needed for many applications, including tracking trends and relating air quality to human and ecosystem health. The spatial and temporal characteristics of these surfaces may reveal new informat...

  19. Combining data visualization and statistical approaches for interpreting measurements and meta-data: Integrating heatmaps, variable clustering, and mixed regression models

    EPA Science Inventory

    The advent of new higher throughput analytical instrumentation has put a strain on interpreting and explaining the results from complex studies. Contemporary human, environmental, and biomonitoring data sets are comprised of tens or hundreds of analytes, multiple repeat measures...

  20. Re-Conceptualizing the Past: Historical Data in Vocational Interest Research

    ERIC Educational Resources Information Center

    Armstrong, Patrick Ian; Rounds, James; Hubert, Lawrence

    2008-01-01

    Noteworthy progress has been made in the development of statistical models for evaluating the structure of vocational interests over the past three decades. It is proposed that historically significant interest datasets, when combined with modern structural methods of data analysis, provide an opportunity to re-examine the underlying assumptions…

  1. COMBINING EVIDENCE ON AIR POLLUTION AND DAILY MORTALITY FROM 20 LARGEST U.S. CITIES: A HIERARCHICAL MODELING STRATEGY

    EPA Science Inventory

    Environmental science and management are fed by individual studies of pollution effects, often focused on single locations. Data are encountered data, typically from multiple sources and on different time and spatial scales. Statistical issues including publication bias and m...

  2. No-reference image quality assessment based on natural scene statistics and gradient magnitude similarity

    NASA Astrophysics Data System (ADS)

    Jia, Huizhen; Sun, Quansen; Ji, Zexuan; Wang, Tonghan; Chen, Qiang

    2014-11-01

    The goal of no-reference/blind image quality assessment (NR-IQA) is to devise a perceptual model that can accurately predict the quality of a distorted image as human opinions, in which feature extraction is an important issue. However, the features used in the state-of-the-art "general purpose" NR-IQA algorithms are usually natural scene statistics (NSS) based or are perceptually relevant; therefore, the performance of these models is limited. To further improve the performance of NR-IQA, we propose a general purpose NR-IQA algorithm which combines NSS-based features with perceptually relevant features. The new method extracts features in both the spatial and gradient domains. In the spatial domain, we extract the point-wise statistics for single pixel values which are characterized by a generalized Gaussian distribution model to form the underlying features. In the gradient domain, statistical features based on neighboring gradient magnitude similarity are extracted. Then a mapping is learned to predict quality scores using a support vector regression. The experimental results on the benchmark image databases demonstrate that the proposed algorithm correlates highly with human judgments of quality and leads to significant performance improvements over state-of-the-art methods.

  3. Systems Engineering Metrics: Organizational Complexity and Product Quality Modeling

    NASA Technical Reports Server (NTRS)

    Mog, Robert A.

    1997-01-01

    Innovative organizational complexity and product quality models applicable to performance metrics for NASA-MSFC's Systems Analysis and Integration Laboratory (SAIL) missions and objectives are presented. An intensive research effort focuses on the synergistic combination of stochastic process modeling, nodal and spatial decomposition techniques, organizational and computational complexity, systems science and metrics, chaos, and proprietary statistical tools for accelerated risk assessment. This is followed by the development of a preliminary model, which is uniquely applicable and robust for quantitative purposes. Exercise of the preliminary model using a generic system hierarchy and the AXAF-I architectural hierarchy is provided. The Kendall test for positive dependence provides an initial verification and validation of the model. Finally, the research and development of the innovation is revisited, prior to peer review. This research and development effort results in near-term, measurable SAIL organizational and product quality methodologies, enhanced organizational risk assessment and evolutionary modeling results, and 91 improved statistical quantification of SAIL productivity interests.

  4. Absolute plate motions relative to deep mantle plumes

    NASA Astrophysics Data System (ADS)

    Wang, Shimin; Yu, Hongzheng; Zhang, Qiong; Zhao, Yonghong

    2018-05-01

    Advances in whole waveform seismic tomography have revealed the presence of broad mantle plumes rooted at the base of the Earth's mantle beneath major hotspots. Hotspot tracks associated with these deep mantle plumes provide ideal constraints for inverting absolute plate motions as well as testing the fixed hotspot hypothesis. In this paper, 27 observed hotspot trends associated with 24 deep mantle plumes are used together with the MORVEL model for relative plate motions to determine an absolute plate motion model, in terms of a maximum likelihood optimization for angular data fitting, combined with an outlier data detection procedure based on statistical tests. The obtained T25M model fits 25 observed trends of globally distributed hotspot tracks to the statistically required level, while the other two hotspot trend data (Comores on Somalia and Iceland on Eurasia) are identified as outliers, which are significantly incompatible with other data. For most hotspots with rate data available, T25M predicts plate velocities significantly lower than the observed rates of hotspot volcanic migration, which cannot be fully explained by biased errors in observed rate data. Instead, the apparent hotspot motions derived by subtracting the observed hotspot migration velocities from the T25M plate velocities exhibit a combined pattern of being opposite to plate velocities and moving towards mid-ocean ridges. The newly estimated net rotation of the lithosphere is statistically compatible with three recent estimates, but differs significantly from 30 of 33 prior estimates.

  5. Impact of Temperature and Non-Gaussian Statistics on Electron Transfer in Donor-Bridge-Acceptor Molecules.

    PubMed

    Waskasi, Morteza M; Newton, Marshall D; Matyushov, Dmitry V

    2017-03-30

    A combination of experimental data and theoretical analysis provides evidence of a bell-shaped kinetics of electron transfer in the Arrhenius coordinates ln k vs 1/T. This kinetic law is a temperature analogue of the familiar Marcus bell-shaped dependence based on ln k vs the reaction free energy. These results were obtained for reactions of intramolecular charge shift between the donor and acceptor separated by a rigid spacer studied experimentally by Miller and co-workers. The non-Arrhenius kinetic law is a direct consequence of the solvent reorganization energy and reaction driving force changing approximately as hyperbolic functions with temperature. The reorganization energy decreases and the driving force increases when temperature is increased. The point of equality between them marks the maximum of the activationless reaction rate. Reaching the consistency between the kinetic and thermodynamic experimental data requires the non-Gaussian statistics of the donor-acceptor energy gap described by the Q-model of electron transfer. The theoretical formalism combines the vibrational envelope of quantum vibronic transitions with the Q-model describing the classical component of the Franck-Condon factor and a microscopic solvation model of the solvent reorganization energy and the reaction free energy.

  6. Computer program to minimize prediction error in models from experiments with 16 hypercube points and 0 to 6 center points

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1982-01-01

    A previous report described a backward deletion procedure of model selection that was optimized for minimum prediction error and which used a multiparameter combination of the F - distribution and an order statistics distribution of Cochran's. A computer program is described that applies the previously optimized procedure to real data. The use of the program is illustrated by examples.

  7. Heuristic Identification of Biological Architectures for Simulating Complex Hierarchical Genetic Interactions

    PubMed Central

    Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C

    2015-01-01

    Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175

  8. Simulating statistics of lightning-induced and man made fires

    NASA Astrophysics Data System (ADS)

    Krenn, R.; Hergarten, S.

    2009-04-01

    The frequency-area distributions of forest fires show power-law behavior with scaling exponents α in a quite narrow range, relating wildfire research to the theoretical framework of self-organized criticality. Examples of self-organized critical behavior can be found in computer simulations of simple cellular automata. The established self-organized critical Drossel-Schwabl forest fire model (DS-FFM) is one of the most widespread models in this context. Despite its qualitative agreement with event-size statistics from nature, its applicability is still questioned. Apart from general concerns that the DS-FFM apparently oversimplifies the complex nature of forest dynamics, it significantly overestimates the frequency of large fires. We present a straightforward modification of the model rules that increases the scaling exponent α by approximately 1•3 and brings the simulated event-size statistics close to those observed in nature. In addition, combined simulations of both the original and the modified model predict a dependence of the overall distribution on the ratio of lightning induced and man made fires as well as a difference between their respective event-size statistics. The increase of the scaling exponent with decreasing lightning probability as well as the splitting of the partial distributions are confirmed by the analysis of the Canadian Large Fire Database. As a consequence, lightning induced and man made forest fires cannot be treated separately in wildfire modeling, hazard assessment and forest management.

  9. Synchronized Trajectories in a Climate "Supermodel"

    NASA Astrophysics Data System (ADS)

    Duane, Gregory; Schevenhoven, Francine; Selten, Frank

    2017-04-01

    Differences in climate projections among state-of-the-art models can be resolved by connecting the models in run-time, either through inter-model nudging or by directly combining the tendencies for corresponding variables. Since it is clearly established that averaging model outputs typically results in improvement as compared to any individual model output, averaged re-initializations at typical analysis time intervals also seems appropriate. The resulting "supermodel" is more like a single model than it is like an ensemble, because the constituent models tend to synchronize even with limited inter-model coupling. Thus one can examine the properties of specific trajectories, rather than averaging the statistical properties of the separate models. We apply this strategy to a study of the index cycle in a supermodel constructed from several imperfect copies of the SPEEDO model (a global primitive-equation atmosphere-ocean-land climate model). As with blocking frequency, typical weather statistics of interest like probabilities of heat waves or extreme precipitation events, are improved as compared to the standard multi-model ensemble approach. In contrast to the standard approach, the supermodel approach provides detailed descriptions of typical actual events.

  10. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 1: Methodology and applications

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for designs failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  11. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 2: Software documentation

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflights systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for design, failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  12. The Balance-Scale Task Revisited: A Comparison of Statistical Models for Rule-Based and Information-Integration Theories of Proportional Reasoning

    PubMed Central

    Hofman, Abe D.; Visser, Ingmar; Jansen, Brenda R. J.; van der Maas, Han L. J.

    2015-01-01

    We propose and test three statistical models for the analysis of children’s responses to the balance scale task, a seminal task to study proportional reasoning. We use a latent class modelling approach to formulate a rule-based latent class model (RB LCM) following from a rule-based perspective on proportional reasoning and a new statistical model, the Weighted Sum Model, following from an information-integration approach. Moreover, a hybrid LCM using item covariates is proposed, combining aspects of both a rule-based and information-integration perspective. These models are applied to two different datasets, a standard paper-and-pencil test dataset (N = 779), and a dataset collected within an online learning environment that included direct feedback, time-pressure, and a reward system (N = 808). For the paper-and-pencil dataset the RB LCM resulted in the best fit, whereas for the online dataset the hybrid LCM provided the best fit. The standard paper-and-pencil dataset yielded more evidence for distinct solution rules than the online data set in which quantitative item characteristics are more prominent in determining responses. These results shed new light on the discussion on sequential rule-based and information-integration perspectives of cognitive development. PMID:26505905

  13. Spatial Ensemble Postprocessing of Precipitation Forecasts Using High Resolution Analyses

    NASA Astrophysics Data System (ADS)

    Lang, Moritz N.; Schicker, Irene; Kann, Alexander; Wang, Yong

    2017-04-01

    Ensemble prediction systems are designed to account for errors or uncertainties in the initial and boundary conditions, imperfect parameterizations, etc. However, due to sampling errors and underestimation of the model errors, these ensemble forecasts tend to be underdispersive, and to lack both reliability and sharpness. To overcome such limitations, statistical postprocessing methods are commonly applied to these forecasts. In this study, a full-distributional spatial post-processing method is applied to short-range precipitation forecasts over Austria using Standardized Anomaly Model Output Statistics (SAMOS). Following Stauffer et al. (2016), observation and forecast fields are transformed into standardized anomalies by subtracting a site-specific climatological mean and dividing by the climatological standard deviation. Due to the need of fitting only a single regression model for the whole domain, the SAMOS framework provides a computationally inexpensive method to create operationally calibrated probabilistic forecasts for any arbitrary location or for all grid points in the domain simultaneously. Taking advantage of the INCA system (Integrated Nowcasting through Comprehensive Analysis), high resolution analyses are used for the computation of the observed climatology and for model training. The INCA system operationally combines station measurements and remote sensing data into real-time objective analysis fields at 1 km-horizontal resolution and 1 h-temporal resolution. The precipitation forecast used in this study is obtained from a limited area model ensemble prediction system also operated by ZAMG. The so called ALADIN-LAEF provides, by applying a multi-physics approach, a 17-member forecast at a horizontal resolution of 10.9 km and a temporal resolution of 1 hour. The performed SAMOS approach statistically combines the in-house developed high resolution analysis and ensemble prediction system. The station-based validation of 6 hour precipitation sums shows a mean improvement of more than 40% in CRPS when compared to bilinearly interpolated uncalibrated ensemble forecasts. The validation on randomly selected grid points, representing the true height distribution over Austria, still indicates a mean improvement of 35%. The applied statistical model is currently set up for 6-hourly and daily accumulation periods, but will be extended to a temporal resolution of 1-3 hours within a new probabilistic nowcasting system operated by ZAMG.

  14. A resilient and efficient CFD framework: Statistical learning tools for multi-fidelity and heterogeneous information fusion

    NASA Astrophysics Data System (ADS)

    Lee, Seungjoon; Kevrekidis, Ioannis G.; Karniadakis, George Em

    2017-09-01

    Exascale-level simulations require fault-resilient algorithms that are robust against repeated and expected software and/or hardware failures during computations, which may render the simulation results unsatisfactory. If each processor can share some global information about the simulation from a coarse, limited accuracy but relatively costless auxiliary simulator we can effectively fill-in the missing spatial data at the required times by a statistical learning technique - multi-level Gaussian process regression, on the fly; this has been demonstrated in previous work [1]. Based on the previous work, we also employ another (nonlinear) statistical learning technique, Diffusion Maps, that detects computational redundancy in time and hence accelerate the simulation by projective time integration, giving the overall computation a "patch dynamics" flavor. Furthermore, we are now able to perform information fusion with multi-fidelity and heterogeneous data (including stochastic data). Finally, we set the foundations of a new framework in CFD, called patch simulation, that combines information fusion techniques from, in principle, multiple fidelity and resolution simulations (and even experiments) with a new adaptive timestep refinement technique. We present two benchmark problems (the heat equation and the Navier-Stokes equations) to demonstrate the new capability that statistical learning tools can bring to traditional scientific computing algorithms. For each problem, we rely on heterogeneous and multi-fidelity data, either from a coarse simulation of the same equation or from a stochastic, particle-based, more "microscopic" simulation. We consider, as such "auxiliary" models, a Monte Carlo random walk for the heat equation and a dissipative particle dynamics (DPD) model for the Navier-Stokes equations. More broadly, in this paper we demonstrate the symbiotic and synergistic combination of statistical learning, domain decomposition, and scientific computing in exascale simulations.

  15. Meta-markers for the differential diagnosis of lung cancer and lung disease.

    PubMed

    Kim, Yong-In; Ahn, Jung-Mo; Sung, Hye-Jin; Na, Sang-Su; Hwang, Jaesung; Kim, Yongdai; Cho, Je-Yoel

    2016-10-04

    Misdiagnosis of lung cancer remains a serious problem due to the difficulty of distinguishing lung cancer from other respiratory lung diseases. As a result, the development of serum-based differential diagnostic biomarkers is in high demand. In this study, 198 clinical serum samples from non-cancer lung disease and lung cancer patients were analyzed using nLC-MRM-MS for the levels of seven lung cancer biomarker candidates. When the candidates were assessed individually, only SERPINEA4 showed statistically significant changes in the serum levels. The MRM results and clinical information were analyzed using a logistic regression analysis to select model for the best 'meta-marker', or combination of biomarkers for differential diagnosis. Also, under consideration of statistical interaction, variables having low significance as a single factor but statistically influencing on meta-marker model were selected. Using this probabilistic classification, the best meta-marker was determined to be made up of two proteins SERPINA4 and PON1 with age factor. This meta-marker showed an enhanced differential diagnostic capability (AUC=0.915) for distinguishing the two patient groups. Our results suggest that a statistical model can determine optimal meta-markers, which may have better specificity and sensitivity than a single biomarker and thus improve the differential diagnosis of lung cancer and lung disease patients. Diagnosing lung cancer commonly involves the use of radiographic methods. However, an imaging-based diagnosis may fail to differentiate lung cancer from non-cancerous lung disease. In this study, we examined several serum proteins in the sera of 198 lung cancer and non-cancerous lung disease patients by multiple-reaction monitoring. We then used a combination of variables to generate a meta-marker model that is useful as a differential diagnostic biomarker. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  16. Cosmological constraints with weak-lensing peak counts and second-order statistics in a large-field survey

    NASA Astrophysics Data System (ADS)

    Peel, Austin; Lin, Chieh-An; Lanusse, François; Leonard, Adrienne; Starck, Jean-Luc; Kilbinger, Martin

    2017-03-01

    Peak statistics in weak-lensing maps access the non-Gaussian information contained in the large-scale distribution of matter in the Universe. They are therefore a promising complementary probe to two-point and higher-order statistics to constrain our cosmological models. Next-generation galaxy surveys, with their advanced optics and large areas, will measure the cosmic weak-lensing signal with unprecedented precision. To prepare for these anticipated data sets, we assess the constraining power of peak counts in a simulated Euclid-like survey on the cosmological parameters Ωm, σ8, and w0de. In particular, we study how Camelus, a fast stochastic model for predicting peaks, can be applied to such large surveys. The algorithm avoids the need for time-costly N-body simulations, and its stochastic approach provides full PDF information of observables. Considering peaks with a signal-to-noise ratio ≥ 1, we measure the abundance histogram in a mock shear catalogue of approximately 5000 deg2 using a multiscale mass-map filtering technique. We constrain the parameters of the mock survey using Camelus combined with approximate Bayesian computation, a robust likelihood-free inference algorithm. Peak statistics yield a tight but significantly biased constraint in the σ8-Ωm plane, as measured by the width ΔΣ8 of the 1σ contour. We find Σ8 = σ8(Ωm/ 0.27)α = 0.77-0.05+0.06 with α = 0.75 for a flat ΛCDM model. The strong bias indicates the need to better understand and control the model systematics before applying it to a real survey of this size or larger. We perform a calibration of the model and compare results to those from the two-point correlation functions ξ± measured on the same field. We calibrate the ξ± result as well, since its contours are also biased, although not as severely as for peaks. In this case, we find for peaks Σ8 = 0.76-0.03+0.02 with α = 0.65, while for the combined ξ+ and ξ- statistics the values are Σ8 = 0.76-0.01+0.02 and α = 0.70. We conclude that the constraining power can therefore be comparable between the two weak-lensing observables in large-field surveys. Furthermore, the tilt in the σ8-Ωm degeneracy direction for peaks with respect to that of ξ± suggests that a combined analysis would yield tighter constraints than either measure alone. As expected, w0de cannot be well constrained without a tomographic analysis, but its degeneracy directions with the other two varied parameters are still clear for both peaks and ξ±.

  17. Joint Seasonal ARMA Approach for Modeling of Load Forecast Errors in Planning Studies

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hafen, Ryan P.; Samaan, Nader A.; Makarov, Yuri V.

    2014-04-14

    To make informed and robust decisions in the probabilistic power system operation and planning process, it is critical to conduct multiple simulations of the generated combinations of wind and load parameters and their forecast errors to handle the variability and uncertainty of these time series. In order for the simulation results to be trustworthy, the simulated series must preserve the salient statistical characteristics of the real series. In this paper, we analyze day-ahead load forecast error data from multiple balancing authority locations and characterize statistical properties such as mean, standard deviation, autocorrelation, correlation between series, time-of-day bias, and time-of-day autocorrelation.more » We then construct and validate a seasonal autoregressive moving average (ARMA) model to model these characteristics, and use the model to jointly simulate day-ahead load forecast error series for all BAs.« less

  18. Mixed models, linear dependency, and identification in age-period-cohort models.

    PubMed

    O'Brien, Robert M

    2017-07-20

    This paper examines the identification problem in age-period-cohort models that use either linear or categorically coded ages, periods, and cohorts or combinations of these parameterizations. These models are not identified using the traditional fixed effect regression model approach because of a linear dependency between the ages, periods, and cohorts. However, these models can be identified if the researcher introduces a single just identifying constraint on the model coefficients. The problem with such constraints is that the results can differ substantially depending on the constraint chosen. Somewhat surprisingly, age-period-cohort models that specify one or more of ages and/or periods and/or cohorts as random effects are identified. This is the case without introducing an additional constraint. I label this identification as statistical model identification and show how statistical model identification comes about in mixed models and why which effects are treated as fixed and which are treated as random can substantially change the estimates of the age, period, and cohort effects. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  19. Proceedings of the Conference on the Design of Experiments in Army Research Development and Testing (32nd)

    DTIC Science & Technology

    1987-06-01

    number of series among the 63 which were identified as a particular ARIMA form and were "best" modeled by a particular technique. Figure 1 illustrates a...th time from xe’s. The integrbted autoregressive - moving average model , denoted by ARIMA (p,d,q) is a result of combining d-th differencing process...Experiments, (4) Data Analysis and Modeling , (5) Theory and Probablistic Inference, (6) Fuzzy Statistics, (7) Forecasting and Prediction, (8) Small Sample

  20. Trends in modeling Biomedical Complex Systems

    PubMed Central

    Milanesi, Luciano; Romano, Paolo; Castellani, Gastone; Remondini, Daniel; Liò, Petro

    2009-01-01

    In this paper we provide an introduction to the techniques for multi-scale complex biological systems, from the single bio-molecule to the cell, combining theoretical modeling, experiments, informatics tools and technologies suitable for biological and biomedical research, which are becoming increasingly multidisciplinary, multidimensional and information-driven. The most important concepts on mathematical modeling methodologies and statistical inference, bioinformatics and standards tools to investigate complex biomedical systems are discussed and the prominent literature useful to both the practitioner and the theoretician are presented. PMID:19828068

  1. Mathematical Modeling of Ultraporous Nonmetallic Reticulated Materials

    NASA Astrophysics Data System (ADS)

    Alifanov, O. M.; Cherepanov, V. V.; Morzhukhina, A. V.

    2015-01-01

    We have developed an imitation statistical mathematical model reflecting the structure and the thermal, electrophysical, and optical properties of nonmetallic ultraporous reticulated materials. This model, in combination with a nonstationary thermal experiment and methods of the theory of inverse heat transfer problems, permits determining the little-studied characteristics of the above materials such as the radiative and conductive heat conductivities, the spectral scattering and absorption coefficients, the scattering indicatrix, and the dielectric constants, which are of great practical interest but are difficult to investigate.

  2. Evaluation of respiratory and cardiac motion correction schemes in dual gated PET/CT cardiac imaging

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lamare, F., E-mail: frederic.lamare@chu-bordeaux.fr; Fernandez, P.; CNRS, INCIA, UMR 5287, F-33400 Talence

    Purpose: Cardiac imaging suffers from both respiratory and cardiac motion. One of the proposed solutions involves double gated acquisitions. Although such an approach may lead to both respiratory and cardiac motion compensation there are issues associated with (a) the combination of data from cardiac and respiratory motion bins, and (b) poor statistical quality images as a result of using only part of the acquired data. The main objective of this work was to evaluate different schemes of combining binned data in order to identify the best strategy to reconstruct motion free cardiac images from dual gated positron emission tomography (PET)more » acquisitions. Methods: A digital phantom study as well as seven human studies were used in this evaluation. PET data were acquired in list mode (LM). A real-time position management system and an electrocardiogram device were used to provide the respiratory and cardiac motion triggers registered within the LM file. Acquired data were subsequently binned considering four and six cardiac gates, or the diastole only in combination with eight respiratory amplitude gates. PET images were corrected for attenuation, but no randoms nor scatter corrections were included. Reconstructed images from each of the bins considered above were subsequently used in combination with an affine or an elastic registration algorithm to derive transformation parameters allowing the combination of all acquired data in a particular position in the cardiac and respiratory cycles. Images were assessed in terms of signal-to-noise ratio (SNR), contrast, image profile, coefficient-of-variation (COV), and relative difference of the recovered activity concentration. Results: Regardless of the considered motion compensation strategy, the nonrigid motion model performed better than the affine model, leading to higher SNR and contrast combined with a lower COV. Nevertheless, when compensating for respiration only, no statistically significant differences were observed in the performance of the two motion models considered. Superior image SNR and contrast were seen using the affine respiratory motion model in combination with the diastole cardiac bin in comparison to the use of the whole cardiac cycle. In contrast, when simultaneously correcting for cardiac beating and respiration, the elastic respiratory motion model outperformed the affine model. In this context, four cardiac bins associated with eight respiratory amplitude bins seemed to be adequate. Conclusions: Considering the compensation of respiratory motion effects only, both affine and elastic based approaches led to an accurate resizing and positioning of the myocardium. The use of the diastolic phase combined with an affine model based respiratory motion correction may therefore be a simple approach leading to significant quality improvements in cardiac PET imaging. However, the best performance was obtained with the combined correction for both cardiac and respiratory movements considering all the dual-gated bins independently through the use of an elastic model based motion compensation.« less

  3. Hunting statistics: what data for what use? An account of an international workshop

    USGS Publications Warehouse

    Nichols, J.D.; Lancia, R.A.; Lebreton, J.D.

    2001-01-01

    Hunting interacts with the underlying dynamics of game species in several different ways and is, at the same time, a source of valuable information not easily obtained from populations that are not subjected to hunting. Specific questions, including the sustainability of hunting activities, can be addressed using hunting statistics. Such investigations will frequently require that hunting statistics be combined with data from other sources of population-level information. Such reflections served as a basis for the meeting, ?Hunting Statistics: What Data for What Use,? held on January 15-18, 2001 in Saint-Benoist, France. We review here the 20 talks held during the workshop and the contribution of hunting statistics to our knowledge of the population dynamics of game species. Three specific topics (adaptive management, catch-effort models, and dynamics of exploited populations) were highlighted as important themes and are more extensively presented as boxes.

  4. Establishing the Conceptual Model to Connect Stress with Geoelectric Signals

    NASA Astrophysics Data System (ADS)

    Chen, H. J.; Chen, C. C.; Ouillon, G.; Sornette, D.

    2017-12-01

    In this study, we conceptualize a completely novel model combining the seismic microruptures occurring within a generalized Burridge-Knopoff spring-block model, with the nucleation and propagation of geoelectric pulses within a coupled electrokinetic system (modelled with a series of RLC circuits). In particular, it is able to reproduce the unipolar pulses that have often been reported before large seismic events, as well as the observed anomalies in the statistical moments of the ambient electric field. This model is thus likely to open a new era of modeling and analyses of geoelectric precursors to earthquakes.

  5. Modelling of peak temperature during friction stir processing of magnesium alloy AZ91

    NASA Astrophysics Data System (ADS)

    Vaira Vignesh, R.; Padmanaban, R.

    2018-02-01

    Friction stir processing (FSP) is a solid state processing technique with potential to modify the properties of the material through microstructural modification. The study of heat transfer in FSP aids in the identification of defects like flash, inadequate heat input, poor material flow and mixing etc. In this paper, transient temperature distribution during FSP of magnesium alloy AZ91 was simulated using finite element modelling. The numerical model results were validated using the experimental results from the published literature. The model was used to predict the peak temperature obtained during FSP for various process parameter combinations. The simulated peak temperature results were used to develop a statistical model. The effect of process parameters namely tool rotation speed, tool traverse speed and shoulder diameter of the tool on the peak temperature was investigated using the developed statistical model. It was found that peak temperature was directly proportional to tool rotation speed and shoulder diameter and inversely proportional to tool traverse speed.

  6. Techniques for recognizing identity of several response functions from the data of visual inspection

    NASA Astrophysics Data System (ADS)

    Nechval, Nicholas A.

    1996-08-01

    The purpose of this paper is to present some efficient techniques for recognizing from the observed data whether several response functions are identical to each other. For example, in an industrial setting the problem may be to determine whether the production coefficients established in a small-scale pilot study apply to each of several large- scale production facilities. The techniques proposed here combine sensor information from automated visual inspection of manufactured products which is carried out by means of pixel-by-pixel comparison of the sensed image of the product to be inspected with some reference pattern (or image). Let (a1, . . . , am) be p-dimensional parameters associated with m response models of the same type. This study is concerned with the simultaneous comparison of a1, . . . , am. A generalized maximum likelihood ratio (GMLR) test is derived for testing equality of these parameters, where each of the parameters represents a corresponding vector of regression coefficients. The GMLR test reduces to an equivalent test based on a statistic that has an F distribution. The main advantage of the test lies in its relative simplicity and the ease with which it can be applied. Another interesting test for the same problem is an application of Fisher's method of combining independent test statistics which can be considered as a parallel procedure to the GMLR test. The combination of independent test statistics does not appear to have been used very much in applied statistics. There does, however, seem to be potential data analytic value in techniques for combining distributional assessments in relation to statistically independent samples which are of joint experimental relevance. In addition, a new iterated test for the problem defined above is presented. A rejection of the null hypothesis by this test provides some reason why all the parameters are not equal. A numerical example is discussed in the context of the proposed procedures for hypothesis testing.

  7. Non-parametric model selection for subject-specific topological organization of resting-state functional connectivity.

    PubMed

    Ferrarini, Luca; Veer, Ilya M; van Lew, Baldur; Oei, Nicole Y L; van Buchem, Mark A; Reiber, Johan H C; Rombouts, Serge A R B; Milles, J

    2011-06-01

    In recent years, graph theory has been successfully applied to study functional and anatomical connectivity networks in the human brain. Most of these networks have shown small-world topological characteristics: high efficiency in long distance communication between nodes, combined with highly interconnected local clusters of nodes. Moreover, functional studies performed at high resolutions have presented convincing evidence that resting-state functional connectivity networks exhibits (exponentially truncated) scale-free behavior. Such evidence, however, was mostly presented qualitatively, in terms of linear regressions of the degree distributions on log-log plots. Even when quantitative measures were given, these were usually limited to the r(2) correlation coefficient. However, the r(2) statistic is not an optimal estimator of explained variance, when dealing with (truncated) power-law models. Recent developments in statistics have introduced new non-parametric approaches, based on the Kolmogorov-Smirnov test, for the problem of model selection. In this work, we have built on this idea to statistically tackle the issue of model selection for the degree distribution of functional connectivity at rest. The analysis, performed at voxel level and in a subject-specific fashion, confirmed the superiority of a truncated power-law model, showing high consistency across subjects. Moreover, the most highly connected voxels were found to be consistently part of the default mode network. Our results provide statistically sound support to the evidence previously presented in literature for a truncated power-law model of resting-state functional connectivity. Copyright © 2010 Elsevier Inc. All rights reserved.

  8. Statistical-dynamical modeling of the cloud-to-ground lightning activity in Portugal

    NASA Astrophysics Data System (ADS)

    Sousa, J. F.; Fragoso, M.; Mendes, S.; Corte-Real, J.; Santos, J. A.

    2013-10-01

    The present study employs a dataset of cloud-to-ground discharges over Portugal, collected by the Portuguese lightning detection network in the period of 2003-2009, to identify dynamically coherent lightning regimes in Portugal and to implement a statistical-dynamical modeling of the daily discharges over the country. For this purpose, the high-resolution MERRA reanalysis is used. Three lightning regimes are then identified for Portugal: WREG, WREM and SREG. WREG is a typical cold-core cut-off low. WREM is connected to strong frontal systems driven by remote low pressure systems at higher latitudes over the North Atlantic. SREG is a combination of an inverted trough and a mid-tropospheric cold-core nearby Portugal. The statistical-dynamical modeling is based on logistic regressions (statistical component) developed for each regime separately (dynamical component). It is shown that the strength of the lightning activity (either strong or weak) for each regime is consistently modeled by a set of suitable dynamical predictors (65-70% of efficiency). The difference of the equivalent potential temperature in the 700-500 hPa layer is the best predictor for the three regimes, while the best 4-layer lifted index is still important for all regimes, but with much weaker significance. Six other predictors are more suitable for a specific regime. For the purpose of validating the modeling approach, a regional-scale climate model simulation is carried out under a very intense lightning episode.

  9. Damage modeling and statistical analysis of optics damage performance in MJ-class laser systems.

    PubMed

    Liao, Zhi M; Raymond, B; Gaylord, J; Fallejo, R; Bude, J; Wegner, P

    2014-11-17

    Modeling the lifetime of a fused silica optic is described for a multiple beam, MJ-class laser system. This entails combining optic processing data along with laser shot data to account for complete history of optic processing and shot exposure. Integrating with online inspection data allows for the construction of a performance metric to describe how an optic performs with respect to the model. This methodology helps to validate the damage model as well as allows strategic planning and identifying potential hidden parameters that are affecting the optic's performance.

  10. Statistical Modeling of Retinal Optical Coherence Tomography.

    PubMed

    Amini, Zahra; Rabbani, Hossein

    2016-06-01

    In this paper, a new model for retinal Optical Coherence Tomography (OCT) images is proposed. This statistical model is based on introducing a nonlinear Gaussianization transform to convert the probability distribution function (pdf) of each OCT intra-retinal layer to a Gaussian distribution. The retina is a layered structure and in OCT each of these layers has a specific pdf which is corrupted by speckle noise, therefore a mixture model for statistical modeling of OCT images is proposed. A Normal-Laplace distribution, which is a convolution of a Laplace pdf and Gaussian noise, is proposed as the distribution of each component of this model. The reason for choosing Laplace pdf is the monotonically decaying behavior of OCT intensities in each layer for healthy cases. After fitting a mixture model to the data, each component is gaussianized and all of them are combined by Averaged Maximum A Posterior (AMAP) method. To demonstrate the ability of this method, a new contrast enhancement method based on this statistical model is proposed and tested on thirteen healthy 3D OCTs taken by the Topcon 3D OCT and five 3D OCTs from Age-related Macular Degeneration (AMD) patients, taken by Zeiss Cirrus HD-OCT. Comparing the results with two contending techniques, the prominence of the proposed method is demonstrated both visually and numerically. Furthermore, to prove the efficacy of the proposed method for a more direct and specific purpose, an improvement in the segmentation of intra-retinal layers using the proposed contrast enhancement method as a preprocessing step, is demonstrated.

  11. Composite Load Spectra for Select Space Propulsion Structural Components

    NASA Technical Reports Server (NTRS)

    Ho, Hing W.; Newell, James F.

    1994-01-01

    Generic load models are described with multiple levels of progressive sophistication to simulate the composite (combined) load spectra (CLS) that are induced in space propulsion system components, representative of Space Shuttle Main Engines (SSME), such as transfer ducts, turbine blades and liquid oxygen (LOX) posts. These generic (coupled) models combine the deterministic models for composite load dynamic, acoustic, high-pressure and high rotational speed, etc., load simulation using statistically varying coefficients. These coefficients are then determined using advanced probabilistic simulation methods with and without strategically selected experimental data. The entire simulation process is included in a CLS computer code. Applications of the computer code to various components in conjunction with the PSAM (Probabilistic Structural Analysis Method) to perform probabilistic load evaluation and life prediction evaluations are also described to illustrate the effectiveness of the coupled model approach.

  12. Combined Uncertainty and A-Posteriori Error Bound Estimates for General CFD Calculations: Theory and Software Implementation

    NASA Technical Reports Server (NTRS)

    Barth, Timothy J.

    2014-01-01

    This workshop presentation discusses the design and implementation of numerical methods for the quantification of statistical uncertainty, including a-posteriori error bounds, for output quantities computed using CFD methods. Hydrodynamic realizations often contain numerical error arising from finite-dimensional approximation (e.g. numerical methods using grids, basis functions, particles) and statistical uncertainty arising from incomplete information and/or statistical characterization of model parameters and random fields. The first task at hand is to derive formal error bounds for statistics given realizations containing finite-dimensional numerical error [1]. The error in computed output statistics contains contributions from both realization error and the error resulting from the calculation of statistics integrals using a numerical method. A second task is to devise computable a-posteriori error bounds by numerically approximating all terms arising in the error bound estimates. For the same reason that CFD calculations including error bounds but omitting uncertainty modeling are only of limited value, CFD calculations including uncertainty modeling but omitting error bounds are only of limited value. To gain maximum value from CFD calculations, a general software package for uncertainty quantification with quantified error bounds has been developed at NASA. The package provides implementations for a suite of numerical methods used in uncertainty quantification: Dense tensorization basis methods [3] and a subscale recovery variant [1] for non-smooth data, Sparse tensorization methods[2] utilizing node-nested hierarchies, Sampling methods[4] for high-dimensional random variable spaces.

  13. Simplified estimation of age-specific reference intervals for skewed data.

    PubMed

    Wright, E M; Royston, P

    1997-12-30

    Age-specific reference intervals are commonly used in medical screening and clinical practice, where interest lies in the detection of extreme values. Many different statistical approaches have been published on this topic. The advantages of a parametric method are that they necessarily produce smooth centile curves, the entire density is estimated and an explicit formula is available for the centiles. The method proposed here is a simplified version of a recent approach proposed by Royston and Wright. Basic transformations of the data and multiple regression techniques are combined to model the mean, standard deviation and skewness. Using these simple tools, which are implemented in almost all statistical computer packages, age-specific reference intervals may be obtained. The scope of the method is illustrated by fitting models to several real data sets and assessing each model using goodness-of-fit techniques.

  14. Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays

    NASA Astrophysics Data System (ADS)

    Sibatov, R. T.

    2011-08-01

    A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.

  15. Prognostic value of cell cycle regulatory proteins in muscle-infiltrating bladder cancer.

    PubMed

    Galmozzi, Fabia; Rubagotti, Alessandra; Romagnoli, Andrea; Carmignani, Giorgio; Perdelli, Luisa; Gatteschi, Beatrice; Boccardo, Francesco

    2006-12-01

    The aims of this study were to investigate the expression levels of proteins involved in cell cycle regulation in specimens of bladder cancer and to correlate them with the clinicopathological characteristics, proliferative activity and survival. Eighty-two specimens obtained from patients affected by muscle-invasive bladder cancer were evaluated immunohistochemically for p53, p21 and cyclin D1 expression, as well as for the tumour proliferation index, Ki-67. The statistical analysis included Kaplan-Meier curves with log-rank test and Cox proportional hazards models. In univariate analyses, low Ki-67 proliferation index (P = 0.045) and negative p21 immunoreactivity (P = 0.04) were associated to patient's overall survival (OS), but in multivariate models p21 did not reach statistical significance. When the combinations of the variables were assessed in two separate multivariate models that included tumour stage, grading, lymph node status, vascular invasion and perineural invasion, the combined variables p21/Ki-67 or p21/cyclin D1 expression were independent predictors for OS; in particular, patients with positive p21/high Ki-67 (P = 0.015) or positive p21/negative cyclin D1 (P = 0.04) showed the worst survival outcome. Important alterations in the cell cycle regulatory pathways occur in muscle-invasive bladder cancer and the combined use of cell cycle regulators appears to provide significant prognostic information that could be used to select the patients most suitable for multimodal therapeutic approaches.

  16. Job stress and work-related musculoskeletal symptoms among intensive care unit nurses: a comparison between job demand-control and effort-reward imbalance models.

    PubMed

    Lee, Soo-Jeong; Lee, Joung Hee; Gillen, Marion; Krause, Niklas

    2014-02-01

    The aims of this study were to compare job demand-control (JDC) and effort-reward imbalance (ERI) models in examining the association of job stress with work-related musculoskeletal symptoms and to evaluate the utility of a combined model. This study analyzed cross-sectional survey data obtained from a nationwide random sample of 304 intensive-care unit (ICU) nurses. Demographic and job factors were controlled in the analyses using logistic regression. Both JDC and ERI variables had strong and statistically significant associations with work-related musculoskeletal symptoms. Effort-reward imbalance had stronger associations than job strain or iso-strain with musculoskeletal symptoms. Effort-reward imbalance alone showed similar or stronger associations with musculoskeletal symptoms compared to combined variables of the JDC and ERI models. The ERI model appears to capture the magnitude of the musculoskeletal health risk among nurses associated with job stress at least as well and possibly better than the JDC model. Our findings suggest that combining the two models provides little gain compared to using effort-reward imbalance only. © 2013 Wiley Periodicals, Inc.

  17. Statistical tools for transgene copy number estimation based on real-time PCR.

    PubMed

    Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal

    2007-11-01

    As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.

  18. Spectral likelihood expansions for Bayesian inference

    NASA Astrophysics Data System (ADS)

    Nagel, Joseph B.; Sudret, Bruno

    2016-03-01

    A spectral approach to Bayesian inference is presented. It pursues the emulation of the posterior probability density. The starting point is a series expansion of the likelihood function in terms of orthogonal polynomials. From this spectral likelihood expansion all statistical quantities of interest can be calculated semi-analytically. The posterior is formally represented as the product of a reference density and a linear combination of polynomial basis functions. Both the model evidence and the posterior moments are related to the expansion coefficients. This formulation avoids Markov chain Monte Carlo simulation and allows one to make use of linear least squares instead. The pros and cons of spectral Bayesian inference are discussed and demonstrated on the basis of simple applications from classical statistics and inverse modeling.

  19. Governance and Regional Variation of Homicide Rates: Evidence From Cross-National Data.

    PubMed

    Cao, Liqun; Zhang, Yan

    2017-01-01

    Criminological theories of cross-national studies of homicide have underestimated the effects of quality governance of liberal democracy and region. Data sets from several sources are combined and a comprehensive model of homicide is proposed. Results of the spatial regression model, which controls for the effect of spatial autocorrelation, show that quality governance, human development, economic inequality, and ethnic heterogeneity are statistically significant in predicting homicide. In addition, regions of Latin America and non-Muslim Sub-Saharan Africa have significantly higher rates of homicides ceteris paribus while the effects of East Asian countries and Islamic societies are not statistically significant. These findings are consistent with the expectation of the new modernization and regional theories. © The Author(s) 2015.

  20. Coherent spectroscopic methods for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution

    NASA Astrophysics Data System (ADS)

    Moguilnaya, T.; Suminov, Y.; Botikov, A.; Ignatov, S.; Kononenko, A.; Agibalov, A.

    2017-01-01

    We developed the new automatic method that combines the method of forced luminescence and stimulated Brillouin scattering. This method is used for monitoring pathogens, genetically modified products and nanostructured materials in colloidal solution. We carried out the statistical spectral analysis of pathogens, genetically modified soy and nano-particles of silver in water from different regions in order to determine the statistical errors of the method. We studied spectral characteristics of these objects in water to perform the initial identification with 95% probability. These results were used for creation of the model of the device for monitor of pathogenic organisms and working model of the device to determine the genetically modified soy in meat.

  1. Participatory Equity and Student Outcomes in Living-Learning Programs of Differing Thematic Types

    ERIC Educational Resources Information Center

    Soldner, Matthew Edward

    2011-01-01

    This study evaluated participatory equity in varying thematic types of living-learning programs and, for a subset of student group x program type combinations found to be below equity, used latent mean modeling to determine whether statistically significant mean differences existed between the outcome scores of living-learning participants and…

  2. Changes of crop rotation in Iowa determined from the USDA-NASS cropland data layer product

    USDA-ARS?s Scientific Manuscript database

    Crop rotation is one of the important decisions made independently by numerous farm managers, and is a critical variable in models of crop growth and soil carbon. By combining multiple years (2001-2009) of the USDA National Agricultural Statistics Service (NASS) cropland data layer (CDL), it is pos...

  3. A Complex Approach to UXO Discrimination: Combining Advanced EMI Forward Models and Statistical Signal Processing

    DTIC Science & Technology

    2012-01-01

    discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI models such as, the...detection and discrimination at live-UXO sites. Namely, under this project first we developed and implemented advanced, physically complete forward EMI...Shubitidze of Sky Research and Dartmouth College, conceived, implemented , and tested most of the approaches presented in this report. He developed

  4. Backward deletion to minimize prediction errors in models from factorial experiments with zero to six center points

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1980-01-01

    Population model coefficients were chosen to simulate a saturated 2 to the fourth power fixed effects experiment having an unfavorable distribution of relative values. Using random number studies, deletion strategies were compared that were based on the F distribution, on an order statistics distribution of Cochran's, and on a combination of the two. Results of the comparisons and a recommended strategy are given.

  5. Predictions of the residue cross-sections for the elements Z = 113 and Z = 114

    NASA Astrophysics Data System (ADS)

    Bouriquet, B.; Abe, Y.; Kosenko, G.

    2004-10-01

    A good reproduction of experimental excitation functions is obtained for the 1 n reactions producing the elements with Z = 108, 110, 111 and 112 by the combined usage of the two-step model for fusion and the statistical decay code KEWPIE. Furthermore, the model provides reliable predictions of productions of the elements with Z = 113 and Z = 114 which will be a useful guide for plannings of experiments.

  6. Quality classification of Spanish olive oils by untargeted gas chromatography coupled to hybrid quadrupole-time of flight mass spectrometry with atmospheric pressure chemical ionization and metabolomics-based statistical approach.

    PubMed

    Sales, C; Cervera, M I; Gil, R; Portolés, T; Pitarch, E; Beltran, J

    2017-02-01

    The novel atmospheric pressure chemical ionization (APCI) source has been used in combination with gas chromatography (GC) coupled to hybrid quadrupole time-of-flight (QTOF) mass spectrometry (MS) for determination of volatile components of olive oil, enhancing its potential for classification of olive oil samples according to their quality using a metabolomics-based approach. The full-spectrum acquisition has allowed the detection of volatile organic compounds (VOCs) in olive oil samples, including Extra Virgin, Virgin and Lampante qualities. A dynamic headspace extraction with cartridge solvent elution was applied. The metabolomics strategy consisted of three different steps: a full mass spectral alignment of GC-MS data using MzMine 2.0, a multivariate analysis using Ez-Info and the creation of the statistical model with combinations of responses for molecular fragments. The model was finally validated using blind samples, obtaining an accuracy in oil classification of 70%, taking the official established method, "PANEL TEST", as reference. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Comparing and combining biomarkers as principle surrogates for time-to-event clinical endpoints.

    PubMed

    Gabriel, Erin E; Sachs, Michael C; Gilbert, Peter B

    2015-02-10

    Principal surrogate endpoints are useful as targets for phase I and II trials. In many recent trials, multiple post-randomization biomarkers are measured. However, few statistical methods exist for comparison of or combination of biomarkers as principal surrogates, and none of these methods to our knowledge utilize time-to-event clinical endpoint information. We propose a Weibull model extension of the semi-parametric estimated maximum likelihood method that allows for the inclusion of multiple biomarkers in the same risk model as multivariate candidate principal surrogates. We propose several methods for comparing candidate principal surrogates and evaluating multivariate principal surrogates. These include the time-dependent and surrogate-dependent true and false positive fraction, the time-dependent and the integrated standardized total gain, and the cumulative distribution function of the risk difference. We illustrate the operating characteristics of our proposed methods in simulations and outline how these statistics can be used to evaluate and compare candidate principal surrogates. We use these methods to investigate candidate surrogates in the Diabetes Control and Complications Trial. Copyright © 2014 John Wiley & Sons, Ltd.

  8. Learning Midlevel Auditory Codes from Natural Sound Statistics.

    PubMed

    Młynarski, Wiktor; McDermott, Josh H

    2018-03-01

    Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. To gain insight into such midlevel representations for sound, we designed a hierarchical generative model of natural sounds that learns combinations of spectrotemporal features from natural stimulus statistics. In the first layer, the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first-layer coefficients. When trained on corpora of speech and environmental sounds, some second-layer units learned to group similar spectrotemporal features. Others instantiate opponency between distinct sets of features. Such groupings might be instantiated by neurons in the auditory cortex, providing a hypothesis for midlevel neuronal computation.

  9. Assessing Continuous Operator Workload With a Hybrid Scaffolded Neuroergonomic Modeling Approach.

    PubMed

    Borghetti, Brett J; Giametta, Joseph J; Rusnock, Christina F

    2017-02-01

    We aimed to predict operator workload from neurological data using statistical learning methods to fit neurological-to-state-assessment models. Adaptive systems require real-time mental workload assessment to perform dynamic task allocations or operator augmentation as workload issues arise. Neuroergonomic measures have great potential for informing adaptive systems, and we combine these measures with models of task demand as well as information about critical events and performance to clarify the inherent ambiguity of interpretation. We use machine learning algorithms on electroencephalogram (EEG) input to infer operator workload based upon Improved Performance Research Integration Tool workload model estimates. Cross-participant models predict workload of other participants, statistically distinguishing between 62% of the workload changes. Machine learning models trained from Monte Carlo resampled workload profiles can be used in place of deterministic workload profiles for cross-participant modeling without incurring a significant decrease in machine learning model performance, suggesting that stochastic models can be used when limited training data are available. We employed a novel temporary scaffold of simulation-generated workload profile truth data during the model-fitting process. A continuous workload profile serves as the target to train our statistical machine learning models. Once trained, the workload profile scaffolding is removed and the trained model is used directly on neurophysiological data in future operator state assessments. These modeling techniques demonstrate how to use neuroergonomic methods to develop operator state assessments, which can be employed in adaptive systems.

  10. Evaluation of analgesic and anti-inflammatory activity of a combination of tramadol-ibuprofen in experimental animals.

    PubMed

    Suthakaran, Chidambarann; Kayalvizhi, Muniyagounder K; Nithya, Karnam; Raja, Thozhudalangudy Ar

    2017-01-01

    Pain is the major concern of patients attending dental clinics, and satisfactory pain relief has always been difficult to achieve. Since the pathophysiology of pain is a complex, central and peripheral nervous system process, combined analgesic regimens with different mechanisms of action as a multimodal approach are becoming popular among the clinicians and dentists. The aim of the present study was to evaluate the analgesic and anti-inflammatory activity of ibuprofen and tramadol when used alone or in combination in animal models of pain and inflammation. The animals were divided into six groups with six animals in each group. Analgesic activity was assessed by hot plate method in rats and by acetic acid-induced writhing test in mice. Paw edema model in rats after induction with 0.1 mL of 1% carrageenan was used to assess the anti-inflammatory activity. Analysis of variance followed by Tukey's honestly significant difference post hoc test was used for statistical analysis. Combined use of tramadol and ibuprofen provided enhanced analgesic and anti-inflammatory effects in animal models of pain and inflammation.

  11. Some considerations concerning the theory of combined toxicity: a case study of subchronic experimental intoxication with cadmium and lead.

    PubMed

    Varaksin, Anatoly N; Katsnelson, Boris A; Panov, Vladimir G; Privalova, Larisa I; Kireyeva, Ekaterina P; Valamina, Irene E; Beresneva, Olga Yu

    2014-02-01

    Rats were exposed intraperitoneally (3 times a week up to 20 injections) to either Cadmium and Lead salts in doses equivalent to their 0.05 LD50 separately or combined in the same or halved doses. Toxic effects were assessed by more than 40 functional, biochemical and morphometric indices. We analysed the results obtained aiming at determination of the type of combined toxicity using either common sense considerations based on descriptive statistics or two mathematical models based (a) on ANOVA and (b) on Mathematical Theory of Experimental Design, which correspond, respectively, to the widely recognised paradigms of effect additivity and dose additivity. Nevertheless, these approaches have led us unanimously to the following conclusions: (1) The above paradigms are virtually interchangeable and should be regarded as different methods of modelling the combined toxicity rather than as reflecting fundamentally differing processes. (2) Within both models there exist not merely three traditionally used types of combined toxicity (additivity, subadditivity and superadditivity) but at least 10 variants of it depending on exactly which effect is considered and on its level, as well as on the dose levels and their ratio. Copyright © 2013 Elsevier Ltd. All rights reserved.

  12. Blended particle filters for large-dimensional chaotic dynamical systems

    PubMed Central

    Majda, Andrew J.; Qi, Di; Sapsis, Themistoklis P.

    2014-01-01

    A major challenge in contemporary data science is the development of statistically accurate particle filters to capture non-Gaussian features in large-dimensional chaotic dynamical systems. Blended particle filters that capture non-Gaussian features in an adaptively evolving low-dimensional subspace through particles interacting with evolving Gaussian statistics on the remaining portion of phase space are introduced here. These blended particle filters are constructed in this paper through a mathematical formalism involving conditional Gaussian mixtures combined with statistically nonlinear forecast models compatible with this structure developed recently with high skill for uncertainty quantification. Stringent test cases for filtering involving the 40-dimensional Lorenz 96 model with a 5-dimensional adaptive subspace for nonlinear blended filtering in various turbulent regimes with at least nine positive Lyapunov exponents are used here. These cases demonstrate the high skill of the blended particle filter algorithms in capturing both highly non-Gaussian dynamical features as well as crucial nonlinear statistics for accurate filtering in extreme filtering regimes with sparse infrequent high-quality observations. The formalism developed here is also useful for multiscale filtering of turbulent systems and a simple application is sketched below. PMID:24825886

  13. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  14. Providing peak river flow statistics and forecasting in the Niger River basin

    NASA Astrophysics Data System (ADS)

    Andersson, Jafet C. M.; Ali, Abdou; Arheimer, Berit; Gustafsson, David; Minoungou, Bernard

    2017-08-01

    Flooding is a growing concern in West Africa. Improved quantification of discharge extremes and associated uncertainties is needed to improve infrastructure design, and operational forecasting is needed to provide timely warnings. In this study, we use discharge observations, a hydrological model (Niger-HYPE) and extreme value analysis to estimate peak river flow statistics (e.g. the discharge magnitude with a 100-year return period) across the Niger River basin. To test the model's capacity of predicting peak flows, we compared 30-year maximum discharge and peak flow statistics derived from the model vs. derived from nine observation stations. The results indicate that the model simulates peak discharge reasonably well (on average + 20%). However, the peak flow statistics have a large uncertainty range, which ought to be considered in infrastructure design. We then applied the methodology to derive basin-wide maps of peak flow statistics and their associated uncertainty. The results indicate that the method is applicable across the hydrologically active part of the river basin, and that the uncertainty varies substantially depending on location. Subsequently, we used the most recent bias-corrected climate projections to analyze potential changes in peak flow statistics in a changed climate. The results are generally ambiguous, with consistent changes only in very few areas. To test the forecasting capacity, we ran Niger-HYPE with a combination of meteorological data sets for the 2008 high-flow season and compared with observations. The results indicate reasonable forecasting capacity (on average 17% deviation), but additional years should also be evaluated. We finish by presenting a strategy and pilot project which will develop an operational flood monitoring and forecasting system based in-situ data, earth observations, modelling, and extreme statistics. In this way we aim to build capacity to ultimately improve resilience toward floods, protecting lives and infrastructure in the region.

  15. Predicting Power Outages Using Multi-Model Ensemble Forecasts

    NASA Astrophysics Data System (ADS)

    Cerrai, D.; Anagnostou, E. N.; Yang, J.; Astitha, M.

    2017-12-01

    Power outages affect every year millions of people in the United States, affecting the economy and conditioning the everyday life. An Outage Prediction Model (OPM) has been developed at the University of Connecticut for helping utilities to quickly restore outages and to limit their adverse consequences on the population. The OPM, operational since 2015, combines several non-parametric machine learning (ML) models that use historical weather storm simulations and high-resolution weather forecasts, satellite remote sensing data, and infrastructure and land cover data to predict the number and spatial distribution of power outages. A new methodology, developed for improving the outage model performances by combining weather- and soil-related variables using three different weather models (WRF 3.7, WRF 3.8 and RAMS/ICLAMS), will be presented in this study. First, we will present a performance evaluation of each model variable, by comparing historical weather analyses with station data or reanalysis over the entire storm data set. Hence, each variable of the new outage model version is extracted from the best performing weather model for that variable, and sensitivity tests are performed for investigating the most efficient variable combination for outage prediction purposes. Despite that the final variables combination is extracted from different weather models, this ensemble based on multi-weather forcing and multi-statistical model power outage prediction outperforms the currently operational OPM version that is based on a single weather forcing variable (WRF 3.7), because each model component is the closest to the actual atmospheric state.

  16. Biopsychosocial influence on exercise-induced injury: genetic and psychological combinations are predictive of shoulder pain phenotypes

    PubMed Central

    George, Steven Z.; Parr, Jeffrey J.; Wallace, Margaret R.; Wu, Samuel S.; Borsa, Paul A.; Dai, Yunfeng; Fillingim, Roger B.

    2014-01-01

    Chronic pain is influenced by biological, psychological, social, and cultural factors. The current study investigated potential roles for combinations of genetic and psychological factors in the development and/or maintenance of chronic musculoskeletal pain. An exercise-induced shoulder injury model was used and a priori selected genetic (ADRB2, COMT, OPRM1, AVPR1A, GCH1, and KCNS1) and psychological (anxiety, depressive symptoms, pain catastrophizing, fear of pain, and kinesiophobia) factors were included as predictors. Pain phenotypes were shoulder pain intensity (5-day average and peak reported on numerical rating scale), upper-extremity disability (5-day average and peak reported on the QuickDASH), and shoulder pain duration (in days). After controlling for age, sex, and race the genetic and psychological predictors were entered as main effects and interaction terms in separate regression models for the different pain phenotypes. Results from the recruited cohort (n = 190) indicated strong statistical evidence for interactions between the COMT diplotype and 1) pain catastrophizing for 5-day average upper-extremity disability and 2) depressive symptoms for pain duration. There was moderate statistical evidence for interactions for other shoulder pain phenotypes between additional genes (ADRB2, AVPR1A, and KCNS1) and depressive symptoms, pain catastrophizing, or kinesiophobia. These findings confirm the importance of the combined predictive ability of COMT with psychological distress, and reveal other novel combinations of genetic and psychological factors that may merit additional investigation in other pain cohorts. PMID:24373571

  17. Statistical atmospheric inversion of local gas emissions by coupling the tracer release technique and local-scale transport modelling: a test case with controlled methane emissions

    NASA Astrophysics Data System (ADS)

    Ars, Sébastien; Broquet, Grégoire; Yver Kwok, Camille; Roustan, Yelva; Wu, Lin; Arzoumanian, Emmanuel; Bousquet, Philippe

    2017-12-01

    This study presents a new concept for estimating the pollutant emission rates of a site and its main facilities using a series of atmospheric measurements across the pollutant plumes. This concept combines the tracer release method, local-scale atmospheric transport modelling and a statistical atmospheric inversion approach. The conversion between the controlled emission and the measured atmospheric concentrations of the released tracer across the plume places valuable constraints on the atmospheric transport. This is used to optimise the configuration of the transport model parameters and the model uncertainty statistics in the inversion system. The emission rates of all sources are then inverted to optimise the match between the concentrations simulated with the transport model and the pollutants' measured atmospheric concentrations, accounting for the transport model uncertainty. In principle, by using atmospheric transport modelling, this concept does not strongly rely on the good colocation between the tracer and pollutant sources and can be used to monitor multiple sources within a single site, unlike the classical tracer release technique. The statistical inversion framework and the use of the tracer data for the configuration of the transport and inversion modelling systems should ensure that the transport modelling errors are correctly handled in the source estimation. The potential of this new concept is evaluated with a relatively simple practical implementation based on a Gaussian plume model and a series of inversions of controlled methane point sources using acetylene as a tracer gas. The experimental conditions are chosen so that they are suitable for the use of a Gaussian plume model to simulate the atmospheric transport. In these experiments, different configurations of methane and acetylene point source locations are tested to assess the efficiency of the method in comparison to the classic tracer release technique in coping with the distances between the different methane and acetylene sources. The results from these controlled experiments demonstrate that, when the targeted and tracer gases are not well collocated, this new approach provides a better estimate of the emission rates than the tracer release technique. As an example, the relative error between the estimated and actual emission rates is reduced from 32 % with the tracer release technique to 16 % with the combined approach in the case of a tracer located 60 m upwind of a single methane source. Further studies and more complex implementations with more advanced transport models and more advanced optimisations of their configuration will be required to generalise the applicability of the approach and strengthen its robustness.

  18. Combined statistical and mechanistic modelling suggests food and temperature effects on survival of early life stages of Northeast Arctic cod (Gadus morhua)

    NASA Astrophysics Data System (ADS)

    Stige, Leif Chr.; Langangen, Øystein; Yaragina, Natalia A.; Vikebø, Frode B.; Bogstad, Bjarte; Ottersen, Geir; Stenseth, Nils Chr.; Hjermann, Dag Ø.

    2015-05-01

    Understanding the causes of the large interannual fluctuations in the recruitment to many marine fishes is a key challenge in fisheries ecology. We here propose that the combination of mechanistic and statistical modelling of the pelagic early life stages (ELS) prior to recruitment can be a powerful approach for improving our understanding of local-scale and population-scale dynamics. Specifically, this approach allows separating effects of ocean transport and survival, and thereby enhances the knowledge of the processes that regulate recruitment. We analyse data on the pelagic eggs, larvae and post-larvae of Northeast Arctic cod and on copepod nauplii, the main prey of the cod larvae. The data originate from two surveys, one in spring and one in summer, for 30 years. A coupled physical-biological model is used to simulate the transport, ambient temperature and development of cod ELS from spawning through spring and summer. The predictions from this model are used as input in a statistical analysis of the summer data, to investigate effects of covariates thought to be linked to growth and survival. We find significant associations between the local-scale ambient copepod nauplii concentration and temperature in spring and the local-scale occurrence of cod (post)larvae in summer, consistent with effects on survival. Moreover, years with low copepod nauplii concentrations and low temperature in spring are significantly associated with lower mean length of the cod (post)larvae in summer, likely caused in part by higher mortality leading to increased dominance of young and hence small individuals. Finally, we find that the recruitment at age 3 is strongly associated with the mean body length of the cod ELS, highlighting the biological significance of the findings.

  19. Stochastic Flow Cascades

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo I.; Shlesinger, Michael F.

    2012-01-01

    We introduce and explore a Stochastic Flow Cascade (SFC) model: A general statistical model for the unidirectional flow through a tandem array of heterogeneous filters. Examples include the flow of: (i) liquid through heterogeneous porous layers; (ii) shocks through tandem shot noise systems; (iii) signals through tandem communication filters. The SFC model combines together the Langevin equation, convolution filters and moving averages, and Poissonian randomizations. A comprehensive analysis of the SFC model is carried out, yielding closed-form results. Lévy laws are shown to universally emerge from the SFC model, and characterize both heavy tailed retention times (Noah effect) and long-ranged correlations (Joseph effect).

  20. Comprehensive analysis of correlation coefficients estimated from pooling heterogeneous microarray data

    PubMed Central

    2013-01-01

    Background The synthesis of information across microarray studies has been performed by combining statistical results of individual studies (as in a mosaic), or by combining data from multiple studies into a large pool to be analyzed as a single data set (as in a melting pot of data). Specific issues relating to data heterogeneity across microarray studies, such as differences within and between labs or differences among experimental conditions, could lead to equivocal results in a melting pot approach. Results We applied statistical theory to determine the specific effect of different means and heteroskedasticity across 19 groups of microarray data on the sign and magnitude of gene-to-gene Pearson correlation coefficients obtained from the pool of 19 groups. We quantified the biases of the pooled coefficients and compared them to the biases of correlations estimated by an effect-size model. Mean differences across the 19 groups were the main factor determining the magnitude and sign of the pooled coefficients, which showed largest values of bias as they approached ±1. Only heteroskedasticity across the pool of 19 groups resulted in less efficient estimations of correlations than did a classical meta-analysis approach of combining correlation coefficients. These results were corroborated by simulation studies involving either mean differences or heteroskedasticity across a pool of N > 2 groups. Conclusions The combination of statistical results is best suited for synthesizing the correlation between expression profiles of a gene pair across several microarray studies. PMID:23822712

  1. Hormone replacement therapy is associated with gastro-oesophageal reflux disease: a retrospective cohort study.

    PubMed

    Close, Helen; Mason, James M; Wilson, Douglas; Hungin, A Pali S

    2012-05-29

    Oestrogen and progestogen have the potential to influence gastro-intestinal motility; both are key components of hormone replacement therapy (HRT). Results of observational studies in women taking HRT rely on self-reporting of gastro-oesophageal symptoms and the aetiology of gastro-oesophageal reflux disease (GORD) remains unclear. This study investigated the association between HRT and GORD in menopausal women using validated general practice records. 51,182 menopausal women were identified using the UK General Practice Research Database between 1995-2004. Of these, 8,831 were matched with and without hormone use. Odds ratios (ORs) were calculated for GORD and proton-pump inhibitor (PPI) use in hormone and non-hormone users, adjusting for age, co-morbidities, and co-pharmacy. In unadjusted analysis, all forms of hormone use (oestrogen-only, tibolone, combined HRT and progestogen) were statistically significantly associated with GORD. In adjusted models, this association remained statistically significant for oestrogen-only treatment (OR 1.49; 1.18-1.89). Unadjusted analysis showed a statistically significant association between PPI use and oestrogen-only and combined HRT treatment. When adjusted for covariates, oestrogen-only treatment was significant (OR 1.34; 95% CI 1.03-1.74). Findings from the adjusted model demonstrated the greater use of PPI by progestogen users (OR 1.50; 1.01-2.22). This first large cohort study of the association between GORD and HRT found a statistically significant association between oestrogen-only hormone and GORD and PPI use. This should be further investigated using prospective follow-up to validate the strength of association and describe its clinical significance.

  2. Empirical-statistical downscaling of reanalysis data to high-resolution air temperature and specific humidity above a glacier surface (Cordillera Blanca, Peru)

    NASA Astrophysics Data System (ADS)

    Hofer, Marlis; MöLg, Thomas; Marzeion, Ben; Kaser, Georg

    2010-06-01

    Recently initiated observation networks in the Cordillera Blanca (Peru) provide temporally high-resolution, yet short-term, atmospheric data. The aim of this study is to extend the existing time series into the past. We present an empirical-statistical downscaling (ESD) model that links 6-hourly National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis data to air temperature and specific humidity, measured at the tropical glacier Artesonraju (northern Cordillera Blanca). The ESD modeling procedure includes combined empirical orthogonal function and multiple regression analyses and a double cross-validation scheme for model evaluation. Apart from the selection of predictor fields, the modeling procedure is automated and does not include subjective choices. We assess the ESD model sensitivity to the predictor choice using both single-field and mixed-field predictors. Statistical transfer functions are derived individually for different months and times of day. The forecast skill largely depends on month and time of day, ranging from 0 to 0.8. The mixed-field predictors perform better than the single-field predictors. The ESD model shows added value, at all time scales, against simpler reference models (e.g., the direct use of reanalysis grid point values). The ESD model forecast 1960-2008 clearly reflects interannual variability related to the El Niño/Southern Oscillation but is sensitive to the chosen predictor type.

  3. Biophysical model for assessment of risk of acute exposures in combination with low level chronic irradiation

    NASA Astrophysics Data System (ADS)

    Smirnova, O. A.

    A biophysical model is developed which describes the mortality dynamics in mammalian populations unexposed and exposed to radiation The model relates statistical biometric functions mortality rate life span probability density and life span probability with statistical characteristics and dynamics of a critical body system in individuals composing the population The model describing the dynamics of thrombocytopoiesis in nonirradiated and irradiated mammals is also developed this hematopoietic line being considered as the critical body system under exposures in question The mortality model constructed in the framework of the proposed approach was identified to reproduce the irradiation effects on populations of mice The most parameters of the thrombocytopoiesis model were determined from the data available in the literature on hematology and radiobiology the rest parameters were evaluated by fitting some experimental data on the dynamics of this system in acutely irradiated mice The successful verification of the thrombocytopoiesis model was fulfilled by the quantitative juxtaposition of the modeling predictions and experimental data on the dynamics of this system in mice exposed to either acute or chronic irradiation at wide ranges of doses and dose rates It is important that only experimental data on the mortality rate in nonirradiated population and the relevant statistical characteristics of the thrombocytopoiesis system in mice which are also available in the literature on radiobiology are needed for the final identification of

  4. A simple rapid approach using coupled multivariate statistical methods, GIS and trajectory models to delineate areas of common oil spill risk

    NASA Astrophysics Data System (ADS)

    Guillen, George; Rainey, Gail; Morin, Michelle

    2004-04-01

    Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.

  5. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra.

    PubMed

    Shin, S M; Kim, Y-I; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. The sample included 24 female and 19 male patients with hand-wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index.

  6. The skeletal maturation status estimated by statistical shape analysis: axial images of Japanese cervical vertebra

    PubMed Central

    Shin, S M; Choi, Y-S; Yamaguchi, T; Maki, K; Cho, B-H; Park, S-B

    2015-01-01

    Objectives: To evaluate axial cervical vertebral (ACV) shape quantitatively and to build a prediction model for skeletal maturation level using statistical shape analysis for Japanese individuals. Methods: The sample included 24 female and 19 male patients with hand–wrist radiographs and CBCT images. Through generalized Procrustes analysis and principal components (PCs) analysis, the meaningful PCs were extracted from each ACV shape and analysed for the estimation regression model. Results: Each ACV shape had meaningful PCs, except for the second axial cervical vertebra. Based on these models, the smallest prediction intervals (PIs) were from the combination of the shape space PCs, age and gender. Overall, the PIs of the male group were smaller than those of the female group. There was no significant correlation between centroid size as a size factor and skeletal maturation level. Conclusions: Our findings suggest that the ACV maturation method, which was applied by statistical shape analysis, could confirm information about skeletal maturation in Japanese individuals as an available quantifier of skeletal maturation and could be as useful a quantitative method as the skeletal maturation index. PMID:25411713

  7. 2D Affine and Projective Shape Analysis.

    PubMed

    Bryner, Darshan; Klassen, Eric; Huiling Le; Srivastava, Anuj

    2014-05-01

    Current techniques for shape analysis tend to seek invariance to similarity transformations (rotation, translation, and scale), but certain imaging situations require invariance to larger groups, such as affine or projective groups. Here we present a general Riemannian framework for shape analysis of planar objects where metrics and related quantities are invariant to affine and projective groups. Highlighting two possibilities for representing object boundaries-ordered points (or landmarks) and parameterized curves-we study different combinations of these representations (points and curves) and transformations (affine and projective). Specifically, we provide solutions to three out of four situations and develop algorithms for computing geodesics and intrinsic sample statistics, leading up to Gaussian-type statistical models, and classifying test shapes using such models learned from training data. In the case of parameterized curves, we also achieve the desired goal of invariance to re-parameterizations. The geodesics are constructed by particularizing the path-straightening algorithm to geometries of current manifolds and are used, in turn, to compute shape statistics and Gaussian-type shape models. We demonstrate these ideas using a number of examples from shape and activity recognition.

  8. Statistical robustness of machine-learning estimates for characterizing a groundwater-surface water system, Southland, New Zealand

    NASA Astrophysics Data System (ADS)

    Friedel, M. J.; Daughney, C.

    2016-12-01

    The development of a successful surface-groundwater management strategy depends on the quality of data provided for analysis. This study evaluates the statistical robustness when using a modified self-organizing map (MSOM) technique to estimate missing values for three hypersurface models: synoptic groundwater-surface water hydrochemistry, time-series of groundwater-surface water hydrochemistry, and mixed-survey (combination of groundwater-surface water hydrochemistry and lithologies) hydrostratigraphic unit data. These models of increasing complexity are developed and validated based on observations from the Southland region of New Zealand. In each case, the estimation method is sufficiently robust to cope with groundwater-surface water hydrochemistry vagaries due to sample size and extreme data insufficiency, even when >80% of the data are missing. The estimation of surface water hydrochemistry time series values enabled the evaluation of seasonal variation, and the imputation of lithologies facilitated the evaluation of hydrostratigraphic controls on groundwater-surface water interaction. The robust statistical results for groundwater-surface water models of increasing data complexity provide justification to apply the MSOM technique in other regions of New Zealand and abroad.

  9. Effects of alcohol tax increases on alcohol-related disease mortality in Alaska: time-series analyses from 1976 to 2004.

    PubMed

    Wagenaar, Alexander C; Maldonado-Molina, Mildred M; Wagenaar, Bradley H

    2009-08-01

    We evaluated the effects of tax increases on alcoholic beverages in 1983 and 2002 on alcohol-related disease mortality in Alaska. We used a quasi-experimental design with quarterly measures of mortality from 1976 though 2004, and we included other states for comparison. Our statistical approach combined an autoregressive integrated moving average model with structural parameters in interrupted time-series models. We observed statistically significant reductions in the numbers and rates of deaths caused by alcohol-related disease beginning immediately after the 1983 and 2002 alcohol tax increases in Alaska. In terms of effect size, the reductions were -29% (Cohen's d = -0.57) and -11% (Cohen's d = -0.52) for the 2 tax increases. Statistical tests of temporary-effect models versus long-term-effect models showed little dissipation of the effect over time. Increases in alcohol excise tax rates were associated with immediate and sustained reductions in alcohol-related disease mortality in Alaska. Reductions in mortality occurred after 2 tax increases almost 20 years apart. Taxing alcoholic beverages is an effective public health strategy for reducing the burden of alcohol-related disease.

  10. Current and future pluvial flood hazard analysis for the city of Antwerp

    NASA Astrophysics Data System (ADS)

    Willems, Patrick; Tabari, Hossein; De Niel, Jan; Van Uytven, Els; Lambrechts, Griet; Wellens, Geert

    2016-04-01

    For the city of Antwerp in Belgium, higher rainfall extremes were observed in comparison with surrounding areas. The differences were found statistically significant for some areas and may be the result of the heat island effect in combination with the higher concentrations of aerosols. A network of 19 rain gauges but with varying records length (the longest since the 1960s) and continuous radar data for 10 years were combined to map the spatial variability of rainfall extremes over the city at various durations from 15 minutes to 1 day together with the uncertainty. The improved spatial rainfall information was used as input in the sewer system model of the city to analyze the frequency of urban pluvial floods. Comparison with historical flood observations from various sources (fire brigade and media) confirmed that the improved spatial rainfall information also improved sewer impact results on both the magnitude and frequency of the sewer floods. Next to these improved urban flood impact results for recent and current climatological conditions, the new insights on the local rainfall microclimate were also helpful to enhance future projections on rainfall extremes and pluvial floods in the city. This was done by improved statistical downscaling of all available CMIP5 global climate model runs (160 runs) for the 4 RCP scenarios, as well as the available EURO-CORDEX regional climate model runs. Two types of statistical downscaling methods were applied for that purpose (a weather typing based method, and a quantile perturbation approach), making use of the microclimate results and its dependency on specific weather types. Changes in extreme rainfall intensities were analyzed and mapped as a function of the RCP scenario, together with the uncertainty, decomposed in the uncertainties related to the climate models, the climate model initialization or limited length of the 30-year time series (natural climate variability) and the statistical downscaling (albeit limited to two types of methods). These were finally transferred into future pluvial flash flood hazard maps for the city together with the uncertainties, and are considered as basis for spatial planning and adaptation.

  11. How different is the time-averaged field from that of a geocentric axial dipole ? Making the best of paleomagnetic directional data using the statistical Giant Gaussian Process approach.

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.; Johnson, C. L.

    2012-12-01

    It is well known that the geometry of the recent time-averaged paleomagnetic field (TAF) is very close to that of a geocentric axial dipole (GAD). Yet, numerous numerical dynamo simulations show that some departures from such a simple geometry is to be expected, not least because of the heterogeneous thermal core-mantle boundary conditions that the convecting mantle imposes on the geodynamo. Indeed, many TAF models recovered from averaging lava flow paleomagnetic directional data (the most numerous and reliable of all data) would suggest this is the case. However, assessing the significance of such minor departures from the GAD is particularly challenging, because non-linear directional data are sensitive not only to the time-averaged component of the field, but also to its time fluctuating component, known as the paleosecular variation (PSV). This means that in addition to data errors, PSV also must be taken into account when assessing any lava flow directional data based claims of departures of the TAF from the GAD. Furthermore, because of limited age information for these data , it is necessary to assess departures from the GAD by resorting to a statistical approach. We report recent progress using an approach we have suggested and further developed (Khokhlov et al., Geophysical Journal International, 2001, 2006) to test the compatibility of combined time-averaged (TAF) and paleosecular variation (PSV) field models, against any lava flow paleomagnetic database, asssuming that these TAF and PSV models are defined within the Giant Gaussian Process statistical framework. In particular we will show how sensitive statistical measures of the compatibility of a combined set of TAF and PSV models with a given directional database can be defined. These measures can be used to test published TAF and PSV models with updated 0-5 Ma lava flow paleomagnetic data sets. They also lay the groundwork for designing inverse methods better suited to seek the minimum required departure of the TAF from the GAD.

  12. Modeling the Risk of Radiation-Induced Acute Esophagitis for Combined Washington University and RTOG Trial 93-11 Lung Cancer Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Ellen X.; Bradley, Jeffrey D.; El Naqa, Issam

    2012-04-01

    Purpose: To construct a maximally predictive model of the risk of severe acute esophagitis (AE) for patients who receive definitive radiation therapy (RT) for non-small-cell lung cancer. Methods and Materials: The dataset includes Washington University and RTOG 93-11 clinical trial data (events/patients: 120/374, WUSTL = 101/237, RTOG9311 = 19/137). Statistical model building was performed based on dosimetric and clinical parameters (patient age, sex, weight loss, pretreatment chemotherapy, concurrent chemotherapy, fraction size). A wide range of dose-volume parameters were extracted from dearchived treatment plans, including Dx, Vx, MOHx (mean of hottest x% volume), MOCx (mean of coldest x% volume), and gEUDmore » (generalized equivalent uniform dose) values. Results: The most significant single parameters for predicting acute esophagitis (RTOG Grade 2 or greater) were MOH85, mean esophagus dose (MED), and V30. A superior-inferior weighted dose-center position was derived but not found to be significant. Fraction size was found to be significant on univariate logistic analysis (Spearman R = 0.421, p < 0.00001) but not multivariate logistic modeling. Cross-validation model building was used to determine that an optimal model size needed only two parameters (MOH85 and concurrent chemotherapy, robustly selected on bootstrap model-rebuilding). Mean esophagus dose (MED) is preferred instead of MOH85, as it gives nearly the same statistical performance and is easier to compute. AE risk is given as a logistic function of (0.0688 Asterisk-Operator MED+1.50 Asterisk-Operator ConChemo-3.13), where MED is in Gy and ConChemo is either 1 (yes) if concurrent chemotherapy was given, or 0 (no). This model correlates to the observed risk of AE with a Spearman coefficient of 0.629 (p < 0.000001). Conclusions: Multivariate statistical model building with cross-validation suggests that a two-variable logistic model based on mean dose and the use of concurrent chemotherapy robustly predicts acute esophagitis risk in combined-data WUSTL and RTOG 93-11 trial datasets.« less

  13. Combined statistical analyses for long-term stability data with multiple storage conditions: a simulation study.

    PubMed

    Almalik, Osama; Nijhuis, Michiel B; van den Heuvel, Edwin R

    2014-01-01

    Shelf-life estimation usually requires that at least three registration batches are tested for stability at multiple storage conditions. The shelf-life estimates are often obtained by linear regression analysis per storage condition, an approach implicitly suggested by ICH guideline Q1E. A linear regression analysis combining all data from multiple storage conditions was recently proposed in the literature when variances are homogeneous across storage conditions. The combined analysis is expected to perform better than the separate analysis per storage condition, since pooling data would lead to an improved estimate of the variation and higher numbers of degrees of freedom, but this is not evident for shelf-life estimation. Indeed, the two approaches treat the observed initial batch results, the intercepts in the model, and poolability of batches differently, which may eliminate or reduce the expected advantage of the combined approach with respect to the separate approach. Therefore, a simulation study was performed to compare the distribution of simulated shelf-life estimates on several characteristics between the two approaches and to quantify the difference in shelf-life estimates. In general, the combined statistical analysis does estimate the true shelf life more consistently and precisely than the analysis per storage condition, but it did not outperform the separate analysis in all circumstances.

  14. Fine-mapping additive and dominant SNP effects using group-LASSO and Fractional Resample Model Averaging

    PubMed Central

    Sabourin, Jeremy; Nobel, Andrew B.; Valdar, William

    2014-01-01

    Genomewide association studies sometimes identify loci at which both the number and identities of the underlying causal variants are ambiguous. In such cases, statistical methods that model effects of multiple SNPs simultaneously can help disentangle the observed patterns of association and provide information about how those SNPs could be prioritized for follow-up studies. Current multi-SNP methods, however, tend to assume that SNP effects are well captured by additive genetics; yet when genetic dominance is present, this assumption translates to reduced power and faulty prioritizations. We describe a statistical procedure for prioritizing SNPs at GWAS loci that efficiently models both additive and dominance effects. Our method, LLARRMA-dawg, combines a group LASSO procedure for sparse modeling of multiple SNP effects with a resampling procedure based on fractional observation weights; it estimates for each SNP the robustness of association with the phenotype both to sampling variation and to competing explanations from other SNPs. In producing a SNP prioritization that best identifies underlying true signals, we show that: our method easily outperforms a single marker analysis; when additive-only signals are present, our joint model for additive and dominance is equivalent to or only slightly less powerful than modeling additive-only effects; and, when dominance signals are present, even in combination with substantial additive effects, our joint model is unequivocally more powerful than a model assuming additivity. We also describe how performance can be improved through calibrated randomized penalization, and discuss how dominance in ungenotyped SNPs can be incorporated through either heterozygote dosage or multiple imputation. PMID:25417853

  15. Editorial: Introduction to the Special Section on Causal Inference in Cross Sectional and Longitudinal Mediational Models

    PubMed Central

    West, Stephen G.

    2016-01-01

    Psychologists have long had interest in the processes through which antecedent variables produce their effects on the outcomes of ultimate interest (e.g., Wood-worth's Stimulus-Organism-Response model). Models involving such meditational processes have characterized many of the important psychological theories of the 20th century and continue to the present day. However, it was not until Judd and Kenny (1981) and Baron and Kenny (1986) combined ideas from experimental design and structural equation modeling that statistical methods for directly testing such models, now known as mediation analysis, began to be developed. Methodologists have improved these statistical methods, developing new, more efficient estimators for mediated effects. They have also extended mediation analysis to multilevel data structures, models involving multiple mediators, models in which interactions occur, and an array of noncontinuous outcome measures (see MacKinnon, 2008). This work nicely maps on to key questions of applied researchers and has led to an outpouring of research testing meditational models (As of August, 2011, Baron and Kenny's article has had over 24,000 citations according to Google Scholar). PMID:26736046

  16. Publication Bias ( The "File-Drawer Problem") in Scientific Inference

    NASA Technical Reports Server (NTRS)

    Scargle, Jeffrey D.; DeVincenzi, Donald (Technical Monitor)

    1999-01-01

    Publication bias arises whenever the probability that a study is published depends on the statistical significance of its results. This bias, often called the file-drawer effect since the unpublished results are imagined to be tucked away in researchers' file cabinets, is potentially a severe impediment to combining the statistical results of studies collected from the literature. With almost any reasonable quantitative model for publication bias, only a small number of studies lost in the file-drawer will produce a significant bias. This result contradicts the well known Fail Safe File Drawer (FSFD) method for setting limits on the potential harm of publication bias, widely used in social, medical and psychic research. This method incorrectly treats the file drawer as unbiased, and almost always miss-estimates the seriousness of publication bias. A large body of not only psychic research, but medical and social science studies, has mistakenly relied on this method to validate claimed discoveries. Statistical combination can be trusted only if it is known with certainty that all studies that have been carried out are included. Such certainty is virtually impossible to achieve in literature surveys.

  17. Recognizing stationary and locomotion activities using combinational of spectral analysis with statistical descriptors features

    NASA Astrophysics Data System (ADS)

    Zainudin, M. N. Shah; Sulaiman, Md Nasir; Mustapha, Norwati; Perumal, Thinagaran

    2017-10-01

    Prior knowledge in pervasive computing recently garnered a lot of attention due to its high demand in various application domains. Human activity recognition (HAR) considered as the applications that are widely explored by the expertise that provides valuable information to the human. Accelerometer sensor-based approach is utilized as devices to undergo the research in HAR since their small in size and this sensor already build-in in the various type of smartphones. However, the existence of high inter-class similarities among the class tends to degrade the recognition performance. Hence, this work presents the method for activity recognition using our proposed features from combinational of spectral analysis with statistical descriptors that able to tackle the issue of differentiating stationary and locomotion activities. The noise signal is filtered using Fourier Transform before it will be extracted using two different groups of features, spectral frequency analysis, and statistical descriptors. Extracted signal later will be classified using random forest ensemble classifier models. The recognition results show the good accuracy performance for stationary and locomotion activities based on USC HAD datasets.

  18. Hurricane Loss Estimation Models: Opportunities for Improving the State of the Art.

    NASA Astrophysics Data System (ADS)

    Watson, Charles C., Jr.; Johnson, Mark E.

    2004-11-01

    The results of hurricane loss models are used regularly for multibillion dollar decisions in the insurance and financial services industries. These models are proprietary, and this “black box” nature hinders analysis. The proprietary models produce a wide range of results, often producing loss costs that differ by a ratio of three to one or more. In a study for the state of North Carolina, 324 combinations of loss models were analyzed, based on a combination of nine wind models, four surface friction models, and nine damage models drawn from the published literature in insurance, engineering, and meteorology. These combinations were tested against reported losses from Hurricanes Hugo and Andrew as reported by a major insurance company, as well as storm total losses for additional storms. Annual loss costs were then computed using these 324 combinations of models for both North Carolina and Florida, and compared with publicly available proprietary model results in Florida. The wide range of resulting loss costs for open, scientifically defensible models that perform well against observed losses mirrors the wide range of loss costs computed by the proprietary models currently in use. This outcome may be discouraging for governmental and corporate decision makers relying on this data for policy and investment guidance (due to the high variability across model results), but it also provides guidance for the efforts of future investigations to improve loss models. Although hurricane loss models are true multidisciplinary efforts, involving meteorology, engineering, statistics, and actuarial sciences, the field of meteorology offers the most promising opportunities for improvement of the state of the art.

  19. Interactions between different EEG frequency bands and their effect on alpha-fMRI correlations.

    PubMed

    de Munck, J C; Gonçalves, S I; Mammoliti, R; Heethaar, R M; Lopes da Silva, F H

    2009-08-01

    In EEG/fMRI correlation studies it is common to consider the fMRI BOLD as filtered version of the EEG alpha power. Here the question is addressed whether other EEG frequency components may affect the correlation between alpha and BOLD. This was done comparing the statistical parametric maps (SPMs) of three different filter models wherein either the free or the standard hemodynamic response functions (HRF) were used in combination with the full spectral bandwidth of the EEG. EEG and fMRI were co-registered in a 30 min resting state condition in 15 healthy young subjects. Power variations in the delta, theta, alpha, beta and gamma bands were extracted from the EEG and used as regressors in a general linear model. Statistical parametric maps (SPMs) were computed using three different filter models, wherein either the free or the standard hemodynamic response functions (HRF) were used in combination with the full spectral bandwidth of the EEG. Results show that the SPMs of different EEG frequency bands, when significant, are very similar to that of the alpha rhythm. This is true in particular for the beta band, despite the fact that the alpha harmonics were discarded. It is shown that inclusion of EEG frequency bands as confounder in the fMRI-alpha correlation model has a large effect on the resulting SPM, in particular when for each frequency band the HRF is extracted from the data. We conclude that power fluctuations of different EEG frequency bands are mutually highly correlated, and that a multi frequency model is required to extract the SPM of the frequency of interest from EEG/fMRI data. When no constraints are put on the shapes of the HRFs of the nuisance frequencies, the correlation model looses so much statistical power that no correlations can be detected.

  20. Assessing colour-dependent occupation statistics inferred from galaxy group catalogues

    NASA Astrophysics Data System (ADS)

    Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu

    2015-09-01

    We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.

  1. A Method of Relating General Circulation Model Simulated Climate to the Observed Local Climate. Part I: Seasonal Statistics.

    NASA Astrophysics Data System (ADS)

    Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David

    1990-10-01

    Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.

  2. Combining quantitative and qualitative breast density measures to assess breast cancer risk.

    PubMed

    Kerlikowske, Karla; Ma, Lin; Scott, Christopher G; Mahmoudzadeh, Amir P; Jensen, Matthew R; Sprague, Brian L; Henderson, Louise M; Pankratz, V Shane; Cummings, Steven R; Miglioretti, Diana L; Vachon, Celine M; Shepherd, John A

    2017-08-22

    Accurately identifying women with dense breasts (Breast Imaging Reporting and Data System [BI-RADS] heterogeneously or extremely dense) who are at high breast cancer risk will facilitate discussions of supplemental imaging and primary prevention. We examined the independent contribution of dense breast volume and BI-RADS breast density to predict invasive breast cancer and whether dense breast volume combined with Breast Cancer Surveillance Consortium (BCSC) risk model factors (age, race/ethnicity, family history of breast cancer, history of breast biopsy, and BI-RADS breast density) improves identifying women with dense breasts at high breast cancer risk. We conducted a case-control study of 1720 women with invasive cancer and 3686 control subjects. We calculated ORs and 95% CIs for the effect of BI-RADS breast density and Volpara™ automated dense breast volume on invasive cancer risk, adjusting for other BCSC risk model factors plus body mass index (BMI), and we compared C-statistics between models. We calculated BCSC 5-year breast cancer risk, incorporating the adjusted ORs associated with dense breast volume. Compared with women with BI-RADS scattered fibroglandular densities and second-quartile dense breast volume, women with BI-RADS extremely dense breasts and third- or fourth-quartile dense breast volume (75% of women with extremely dense breasts) had high breast cancer risk (OR 2.87, 95% CI 1.84-4.47, and OR 2.56, 95% CI 1.87-3.52, respectively), whereas women with extremely dense breasts and first- or second-quartile dense breast volume were not at significantly increased breast cancer risk (OR 1.53, 95% CI 0.75-3.09, and OR 1.50, 95% CI 0.82-2.73, respectively). Adding continuous dense breast volume to a model with BCSC risk model factors and BMI increased discriminatory accuracy compared with a model with only BCSC risk model factors (C-statistic 0.639, 95% CI 0.623-0.654, vs. C-statistic 0.614, 95% CI 0.598-0.630, respectively; P < 0.001). Women with dense breasts and fourth-quartile dense breast volume had a BCSC 5-year risk of 2.5%, whereas women with dense breasts and first-quartile dense breast volume had a 5-year risk ≤ 1.8%. Risk models with automated dense breast volume combined with BI-RADS breast density may better identify women with dense breasts at high breast cancer risk than risk models with either measure alone.

  3. Does the organisational model of dementia case management make a difference in satisfaction with case management and caregiver burden? An evaluation study.

    PubMed

    Peeters, José M; Pot, Anne Margriet; de Lange, Jacomine; Spreeuwenberg, Peter M; Francke, Anneke L

    2016-03-09

    In the Netherlands, various organisational models of dementia case management exist. In this study the following four models are distinguished, based on differences in the availability of the service and in the case management function: Model 1: the case management service is available from first dementia symptoms + is always a separate specialist function; Model 2: the case management service is only available after a formal dementia diagnosis + is always a separate specialist function; Model 3: the case management service is available from first dementia symptoms + is often a combined function; Model 4: the case management service is only available after a formal dementia diagnosis + is often a combined function. The objectives of this study are to give insight into whether satisfaction with dementia case management and the development of caregiver burden depend on the organisational model. A survey was carried out in regional dementia care networks in the Netherlands among 554 informal carers for people with dementia at the start of case management (response of 85 %), and one year later. Descriptive statistics and multilevel models were used to analyse the data. The satisfaction with the case manager was high in general (an average of 8.0 within a possible range of 1 to 10), although the caregiver burden did not decrease in the first year after starting with case management. No differences were found between the four organisational models regarding the development of caregiver burden. However, statistically significant differences (p < 0.05) were found regarding satisfaction: informal carers in the organisational model where case management is only available after formal diagnosis of dementia and is often a combined function had on average the lowest satisfaction scores. Nevertheless, the satisfaction of informal carers within all organisational models was high (ranging from 7.51 to 8.40 within a range of 1 to 10). Organisational features of case management seem to make little or no difference to the development in caregiver burden and the satisfaction of informal carers. Future research is needed to explore whether the individual characteristics of the case managers themselves are associated with case management outcomes.

  4. Discrimination of soft tissues using laser-induced breakdown spectroscopy in combination with k nearest neighbors (kNN) and support vector machine (SVM) classifiers

    NASA Astrophysics Data System (ADS)

    Li, Xiaohui; Yang, Sibo; Fan, Rongwei; Yu, Xin; Chen, Deying

    2018-06-01

    In this paper, discrimination of soft tissues using laser-induced breakdown spectroscopy (LIBS) in combination with multivariate statistical methods is presented. Fresh pork fat, skin, ham, loin and tenderloin muscle tissues are manually cut into slices and ablated using a 1064 nm pulsed Nd:YAG laser. Discrimination analyses between fat, skin and muscle tissues, and further between highly similar ham, loin and tenderloin muscle tissues, are performed based on the LIBS spectra in combination with multivariate statistical methods, including principal component analysis (PCA), k nearest neighbors (kNN) classification, and support vector machine (SVM) classification. Performances of the discrimination models, including accuracy, sensitivity and specificity, are evaluated using 10-fold cross validation. The classification models are optimized to achieve best discrimination performances. The fat, skin and muscle tissues can be definitely discriminated using both kNN and SVM classifiers, with accuracy of over 99.83%, sensitivity of over 0.995 and specificity of over 0.998. The highly similar ham, loin and tenderloin muscle tissues can also be discriminated with acceptable performances. The best performances are achieved with SVM classifier using Gaussian kernel function, with accuracy of 76.84%, sensitivity of over 0.742 and specificity of over 0.869. The results show that the LIBS technique assisted with multivariate statistical methods could be a powerful tool for online discrimination of soft tissues, even for tissues of high similarity, such as muscles from different parts of the animal body. This technique could be used for discrimination of tissues suffering minor clinical changes, thus may advance the diagnosis of early lesions and abnormalities.

  5. A method for modeling co-occurrence propensity of clinical codes with application to ICD-10-PCS auto-coding.

    PubMed

    Subotin, Michael; Davis, Anthony R

    2016-09-01

    Natural language processing methods for medical auto-coding, or automatic generation of medical billing codes from electronic health records, generally assign each code independently of the others. They may thus assign codes for closely related procedures or diagnoses to the same document, even when they do not tend to occur together in practice, simply because the right choice can be difficult to infer from the clinical narrative. We propose a method that injects awareness of the propensities for code co-occurrence into this process. First, a model is trained to estimate the conditional probability that one code is assigned by a human coder, given than another code is known to have been assigned to the same document. Then, at runtime, an iterative algorithm is used to apply this model to the output of an existing statistical auto-coder to modify the confidence scores of the codes. We tested this method in combination with a primary auto-coder for International Statistical Classification of Diseases-10 procedure codes, achieving a 12% relative improvement in F-score over the primary auto-coder baseline. The proposed method can be used, with appropriate features, in combination with any auto-coder that generates codes with different levels of confidence. The promising results obtained for International Statistical Classification of Diseases-10 procedure codes suggest that the proposed method may have wider applications in auto-coding. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  6. Data mining and statistical inference in selective laser melting

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kamath, Chandrika

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  7. Data mining and statistical inference in selective laser melting

    DOE PAGES

    Kamath, Chandrika

    2016-01-11

    Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less

  8. The Complete Light-curve Sample of Spectroscopically Confirmed SNe Ia from Pan-STARRS1 and Cosmological Constraints from the Combined Pantheon Sample

    NASA Astrophysics Data System (ADS)

    Scolnic, D. M.; Jones, D. O.; Rest, A.; Pan, Y. C.; Chornock, R.; Foley, R. J.; Huber, M. E.; Kessler, R.; Narayan, G.; Riess, A. G.; Rodney, S.; Berger, E.; Brout, D. J.; Challis, P. J.; Drout, M.; Finkbeiner, D.; Lunnan, R.; Kirshner, R. P.; Sanders, N. E.; Schlafly, E.; Smartt, S.; Stubbs, C. W.; Tonry, J.; Wood-Vasey, W. M.; Foley, M.; Hand, J.; Johnson, E.; Burgett, W. S.; Chambers, K. C.; Draper, P. W.; Hodapp, K. W.; Kaiser, N.; Kudritzki, R. P.; Magnier, E. A.; Metcalfe, N.; Bresolin, F.; Gall, E.; Kotak, R.; McCrum, M.; Smith, K. W.

    2018-06-01

    We present optical light curves, redshifts, and classifications for 365 spectroscopically confirmed Type Ia supernovae (SNe Ia) discovered by the Pan-STARRS1 (PS1) Medium Deep Survey. We detail improvements to the PS1 SN photometry, astrometry, and calibration that reduce the systematic uncertainties in the PS1 SN Ia distances. We combine the subset of 279 PS1 SNe Ia (0.03 < z < 0.68) with useful distance estimates of SNe Ia from the Sloan Digital Sky Survey (SDSS), SNLS, and various low-z and Hubble Space Telescope samples to form the largest combined sample of SNe Ia, consisting of a total of 1048 SNe Ia in the range of 0.01 < z < 2.3, which we call the “Pantheon Sample.” When combining Planck 2015 cosmic microwave background (CMB) measurements with the Pantheon SN sample, we find {{{Ω }}}m=0.307+/- 0.012 and w=-1.026+/- 0.041 for the wCDM model. When the SN and CMB constraints are combined with constraints from BAO and local H 0 measurements, the analysis yields the most precise measurement of dark energy to date: {w}0=-1.007+/- 0.089 and {w}a=-0.222+/- 0.407 for the {w}0{w}aCDM model. Tension with a cosmological constant previously seen in an analysis of PS1 and low-z SNe has diminished after an increase of 2× in the statistics of the PS1 sample, improved calibration and photometry, and stricter light-curve quality cuts. We find that the systematic uncertainties in our measurements of dark energy are almost as large as the statistical uncertainties, primarily due to limitations of modeling the low-redshift sample. This must be addressed for future progress in using SNe Ia to measure dark energy.

  9. Using a coupled hydro-mechanical fault model to better understand the risk of induced seismicity in deep geothermal projects

    NASA Astrophysics Data System (ADS)

    Abe, Steffen; Krieger, Lars; Deckert, Hagen

    2017-04-01

    The changes of fluid pressures related to the injection of fluids into the deep underground, for example during geothermal energy production, can potentially reactivate faults and thus cause induced seismic events. Therefore, an important aspect in the planning and operation of such projects, in particular in densely populated regions such as the Upper Rhine Graben in Germany, is the estimation and mitigation of the induced seismic risk. The occurrence of induced seismicity depends on a combination of hydraulic properties of the underground, mechanical and geometric parameters of the fault, and the fluid injection regime. In this study we are therefore employing a numerical model to investigate the impact of fluid pressure changes on the dynamics of the faults and the resulting seismicity. The approach combines a model of the fluid flow around a geothermal well based on a 3D finite difference discretisation of the Darcy-equation with a 2D block-slider model of a fault. The models are coupled so that the evolving pore pressure at the relevant locations of the hydraulic model is taken into account in the calculation of the stick-slip dynamics of the fault model. Our modelling approach uses two subsequent modelling steps. Initially, the fault model is run by applying a fixed deformation rate for a given duration and without the influence of the hydraulic model in order to generate the background event statistics. Initial tests have shown that the response of the fault to hydraulic loading depends on the timing of the fluid injection relative to the seismic cycle of the fault. Therefore, multiple snapshots of the fault's stress- and displacement state are generated from the fault model. In a second step, these snapshots are then used as initial conditions in a set of coupled hydro-mechanical model runs including the effects of the fluid injection. This set of models is then compared with the background event statistics to evaluate the change in the probability of seismic events. The event data such as location, magnitude, and source characteristics can be used as input for numerical wave propagation models. This allows the translation of seismic event statistics generated by the model into ground shaking probabilities.

  10. Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework.

    PubMed

    Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana

    2014-06-01

    Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd.

  11. Resolving the Antarctic contribution to sea-level rise: a hierarchical modelling framework†

    PubMed Central

    Zammit-Mangion, Andrew; Rougier, Jonathan; Bamber, Jonathan; Schön, Nana

    2014-01-01

    Determining the Antarctic contribution to sea-level rise from observational data is a complex problem. The number of physical processes involved (such as ice dynamics and surface climate) exceeds the number of observables, some of which have very poor spatial definition. This has led, in general, to solutions that utilise strong prior assumptions or physically based deterministic models to simplify the problem. Here, we present a new approach for estimating the Antarctic contribution, which only incorporates descriptive aspects of the physically based models in the analysis and in a statistical manner. By combining physical insights with modern spatial statistical modelling techniques, we are able to provide probability distributions on all processes deemed to play a role in both the observed data and the contribution to sea-level rise. Specifically, we use stochastic partial differential equations and their relation to geostatistical fields to capture our physical understanding and employ a Gaussian Markov random field approach for efficient computation. The method, an instantiation of Bayesian hierarchical modelling, naturally incorporates uncertainty in order to reveal credible intervals on all estimated quantities. The estimated sea-level rise contribution using this approach corroborates those found using a statistically independent method. © 2013 The Authors. Environmetrics Published by John Wiley & Sons, Ltd. PMID:25505370

  12. The Mantel-Haenszel procedure revisited: models and generalizations.

    PubMed

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.

  13. The Mantel-Haenszel Procedure Revisited: Models and Generalizations

    PubMed Central

    Fidler, Vaclav; Nagelkerke, Nico

    2013-01-01

    Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463

  14. Combining test statistics and models in bootstrapped model rejection: it is a balancing act

    PubMed Central

    2014-01-01

    Background Model rejections lie at the heart of systems biology, since they provide conclusive statements: that the corresponding mechanistic assumptions do not serve as valid explanations for the experimental data. Rejections are usually done using e.g. the chi-square test (χ2) or the Durbin-Watson test (DW). Analytical formulas for the corresponding distributions rely on assumptions that typically are not fulfilled. This problem is partly alleviated by the usage of bootstrapping, a computationally heavy approach to calculate an empirical distribution. Bootstrapping also allows for a natural extension to estimation of joint distributions, but this feature has so far been little exploited. Results We herein show that simplistic combinations of bootstrapped tests, like the max or min of the individual p-values, give inconsistent, i.e. overly conservative or liberal, results. A new two-dimensional (2D) approach based on parametric bootstrapping, on the other hand, is found both consistent and with a higher power than the individual tests, when tested on static and dynamic examples where the truth is known. In the same examples, the most superior test is a 2D χ2vsχ2, where the second χ2-value comes from an additional help model, and its ability to describe bootstraps from the tested model. This superiority is lost if the help model is too simple, or too flexible. If a useful help model is found, the most powerful approach is the bootstrapped log-likelihood ratio (LHR). We show that this is because the LHR is one-dimensional, because the second dimension comes at a cost, and because LHR has retained most of the crucial information in the 2D distribution. These approaches statistically resolve a previously published rejection example for the first time. Conclusions We have shown how to, and how not to, combine tests in a bootstrap setting, when the combination is advantageous, and when it is advantageous to include a second model. These results also provide a deeper insight into the original motivation for formulating the LHR, for the more general setting of nonlinear and non-nested models. These insights are valuable in cases when accuracy and power, rather than computational speed, are prioritized. PMID:24742065

  15. Using Generalized Equivalent Uniform Dose Atlases to Combine and Analyze Prospective Dosimetric and Radiation Pneumonitis Data From 2 Non-Small Cell Lung Cancer Dose Escalation Protocols

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu Fan; Yorke, Ellen D.; Belderbos, Jose S.A.

    2013-01-01

    Purpose: To demonstrate the use of generalized equivalent uniform dose (gEUD) atlas for data pooling in radiation pneumonitis (RP) modeling, to determine the dependence of RP on gEUD, to study the consistency between data sets, and to verify the increased statistical power of the combination. Methods and Materials: Patients enrolled in prospective phase I/II dose escalation studies of radiation therapy of non-small cell lung cancer at Memorial Sloan-Kettering Cancer Center (MSKCC) (78 pts) and the Netherlands Cancer Institute (NKI) (86 pts) were included; 10 (13%) and 14 (17%) experienced RP requiring steroids (RPS) within 6 months after treatment. gEUD wasmore » calculated from dose-volume histograms. Atlases for each data set were created using 1-Gy steps from exact gEUDs and RPS data. The Lyman-Kutcher-Burman model was fit to the atlas and exact gEUD data. Heterogeneity and inconsistency statistics for the fitted parameters were computed. gEUD maps of the probability of RPS rate {>=}20% were plotted. Results: The 2 data sets were homogeneous and consistent. The best fit values of the volume effect parameter a were small, with upper 95% confidence limit around 1.0 in the joint data. The likelihood profiles around the best fit a values were flat in all cases, making determination of the best fit a weak. All confidence intervals (CIs) were narrower in the joint than in the individual data sets. The minimum P value for correlations of gEUD with RPS in the joint data was .002, compared with P=.01 and .05 for MSKCC and NKI data sets, respectively. gEUD maps showed that at small a, RPS risk increases with gEUD. Conclusions: The atlas can be used to combine gEUD and RPS information from different institutions and model gEUD dependence of RPS. RPS has a large volume effect with the mean dose model barely included in the 95% CI. Data pooling increased statistical power.« less

  16. Simplified methods for real-time prediction of storm surge uncertainty: The city of Venice case study

    NASA Astrophysics Data System (ADS)

    Mel, Riccardo; Viero, Daniele Pietro; Carniello, Luca; Defina, Andrea; D'Alpaos, Luigi

    2014-09-01

    Providing reliable and accurate storm surge forecasts is important for a wide range of problems related to coastal environments. In order to adequately support decision-making processes, it also become increasingly important to be able to estimate the uncertainty associated with the storm surge forecast. The procedure commonly adopted to do this uses the results of a hydrodynamic model forced by a set of different meteorological forecasts; however, this approach requires a considerable, if not prohibitive, computational cost for real-time application. In the present paper we present two simplified methods for estimating the uncertainty affecting storm surge prediction with moderate computational effort. In the first approach we use a computationally fast, statistical tidal model instead of a hydrodynamic numerical model to estimate storm surge uncertainty. The second approach is based on the observation that the uncertainty in the sea level forecast mainly stems from the uncertainty affecting the meteorological fields; this has led to the idea to estimate forecast uncertainty via a linear combination of suitable meteorological variances, directly extracted from the meteorological fields. The proposed methods were applied to estimate the uncertainty in the storm surge forecast in the Venice Lagoon. The results clearly show that the uncertainty estimated through a linear combination of suitable meteorological variances nicely matches the one obtained using the deterministic approach and overcomes some intrinsic limitations in the use of a statistical tidal model.

  17. Accurate prediction of vaccine stability under real storage conditions and during temperature excursions.

    PubMed

    Clénet, Didier

    2018-04-01

    Due to their thermosensitivity, most vaccines must be kept refrigerated from production to use. To successfully carry out global immunization programs, ensuring the stability of vaccines is crucial. In this context, two important issues are critical, namely: (i) predicting vaccine stability and (ii) preventing product damage due to excessive temperature excursions outside of the recommended storage conditions (cold chain break). We applied a combination of advanced kinetics and statistical analyses on vaccine forced degradation data to accurately describe the loss of antigenicity for a multivalent freeze-dried inactivated virus vaccine containing three variants. The screening of large amounts of kinetic models combined with a statistical model selection approach resulted in the identification of two-step kinetic models. Predictions based on kinetic analysis and experimental stability data were in agreement, with approximately five percentage points difference from real values for long-term stability storage conditions, after excursions of temperature and during experimental shipments of freeze-dried products. Results showed that modeling a few months of forced degradation can be used to predict various time and temperature profiles endured by vaccines, i.e. long-term stability, short time excursions outside the labeled storage conditions or shipments at ambient temperature, with high accuracy. Pharmaceutical applications of the presented kinetics-based approach are discussed. Copyright © 2018 The Author. Published by Elsevier B.V. All rights reserved.

  18. Product placement of computer games in cyberspace.

    PubMed

    Yang, Heng-Li; Wang, Cheng-Shu

    2008-08-01

    Computer games are considered an emerging media and are even regarded as an advertising channel. By a three-phase experiment, this study investigated the advertising effectiveness of computer games for different product placement forms, product types, and their combinations. As the statistical results revealed, computer games are appropriate for placement advertising. Additionally, different product types and placement forms produced different advertising effectiveness. Optimum combinations of product types and placement forms existed. An advertisement design model is proposed for use in game design environments. Some suggestions are given for advertisers and game companies respectively.

  19. Combined monitoring, decision and control model for the human operator in a command and control desk

    NASA Technical Reports Server (NTRS)

    Muralidharan, R.; Baron, S.

    1978-01-01

    A report is given on the ongoing efforts to mode the human operator in the context of the task during the enroute/return phases in the ground based control of multiple flights of remotely piloted vehicles (RPV). The approach employed here uses models that have their analytical bases in control theory and in statistical estimation and decision theory. In particular, it draws heavily on the modes and the concepts of the optimal control model (OCM) of the human operator. The OCM is being extended into a combined monitoring, decision, and control model (DEMON) of the human operator by infusing decision theoretic notions that make it suitable for application to problems in which human control actions are infrequent and in which monitoring and decision-making are the operator's main activities. Some results obtained with a specialized version of DEMON for the RPV control problem are included.

  20. Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Volden, Thomas R.

    2010-01-01

    The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.

  1. Modelling of electronic excitation and radiation in the Direct Simulation Monte Carlo Macroscopic Chemistry Method

    NASA Astrophysics Data System (ADS)

    Goldsworthy, M. J.

    2012-10-01

    One of the most useful tools for modelling rarefied hypersonic flows is the Direct Simulation Monte Carlo (DSMC) method. Simulator particle movement and collision calculations are combined with statistical procedures to model thermal non-equilibrium flow-fields described by the Boltzmann equation. The Macroscopic Chemistry Method for DSMC simulations was developed to simplify the inclusion of complex thermal non-equilibrium chemistry. The macroscopic approach uses statistical information which is calculated during the DSMC solution process in the modelling procedures. Here it is shown how inclusion of macroscopic information in models of chemical kinetics, electronic excitation, ionization, and radiation can enhance the capabilities of DSMC to model flow-fields where a range of physical processes occur. The approach is applied to the modelling of a 6.4 km/s nitrogen shock wave and results are compared with those from existing shock-tube experiments and continuum calculations. Reasonable agreement between the methods is obtained. The quality of the comparison is highly dependent on the set of vibrational relaxation and chemical kinetic parameters employed.

  2. Combined effects of projected sea level rise, storm surge, and peak river flows on water levels in the Skagit Floodplain

    USGS Publications Warehouse

    Hamman, Josheph J; Hamlet, Alan F.; Fuller, Roger; Grossman, Eric E.

    2016-01-01

    Current understanding of the combined effects of sea level rise (SLR), storm surge, and changes in river flooding on near-coastal environments is very limited. This project uses a suite of numerical models to examine the combined effects of projected future climate change on flooding in the Skagit floodplain and estuary. Statistically and dynamically downscaled global climate model scenarios from the ECHAM-5 GCM were used as the climate forcings. Unregulated daily river flows were simulated using the VIC hydrology model, and regulated river flows were simulated using the SkagitSim reservoir operations model. Daily tidal anomalies (TA) were calculated using a regression approach based on ENSO and atmospheric pressure forcing simulated by the WRF regional climate model. A 2-D hydrodynamic model was used to estimate water surface elevations in the Skagit floodplain using resampled hourly hydrographs keyed to regulated daily flood flows produced by the reservoir simulation model, and tide predictions adjusted for SLR and TA. Combining peak annual TA with projected sea level rise, the historical (1970–1999) 100-yr peak high water level is exceeded essentially every year by the 2050s. The combination of projected sea level rise and larger floods by the 2080s yields both increased flood inundation area (+ 74%), and increased average water depth (+ 25 cm) in the Skagit floodplain during a 100-year flood. Adding sea level rise to the historical FEMA 100-year flood resulted in a 35% increase in inundation area by the 2040's, compared to a 57% increase when both SLR and projected changes in river flow were combined.

  3. Impact of Temperature and Non-Gaussian Statistics on Electron Transfer in Donor–Bridge–Acceptor Molecules

    DOE PAGES

    Waskasi, Morteza M.; Newton, Marshall D.; Matyushov, Dmitry V.

    2017-03-16

    A combination of experimental data and theoretical analysis provides evidence of a bell-shaped kinetics of electron transfer in the Arrhenius coordinates ln k vs 1/T . This kinetic law is a temperature analog of the familiar Marcus bell-shaped dependence based on ln k vs the reaction free energy. These results were obtained for reactions of intramolecular charge shift between the donor and acceptor separated by a rigid spacer studied experimentally by Miller and co-workers. The non-Arrhenius kinetic law is a direct consequence of the solvent reorganization energy and reaction driving force changing approximately as hyperbolic functions with temperature. The reorganizationmore » energy decreases and the driving force increases when temperature is increased. The point of equality between them marks the maximum of the activationless reaction rate. Reaching the consistency between the kinetic and thermodynamic experimental data requires the non-Gaussian statistics of the donor-acceptor energy gap described by the Q-model of electron transfer. Furthermore, the theoretical formalism combines the vibrational envelope of quantum vibronic transitions with the Q-model describing the classical component of the Franck-Condon factor and a microscopic solvation model of the solvent reorganization energy and the reaction free energy.« less

  4. Fault Diagnosis Strategies for SOFC-Based Power Generation Plants

    PubMed Central

    Costamagna, Paola; De Giorgi, Andrea; Gotelli, Alberto; Magistri, Loredana; Moser, Gabriele; Sciaccaluga, Emanuele; Trucco, Andrea

    2016-01-01

    The success of distributed power generation by plants based on solid oxide fuel cells (SOFCs) is hindered by reliability problems that can be mitigated through an effective fault detection and isolation (FDI) system. However, the numerous operating conditions under which such plants can operate and the random size of the possible faults make identifying damaged plant components starting from the physical variables measured in the plant very difficult. In this context, we assess two classical FDI strategies (model-based with fault signature matrix and data-driven with statistical classification) and the combination of them. For this assessment, a quantitative model of the SOFC-based plant, which is able to simulate regular and faulty conditions, is used. Moreover, a hybrid approach based on the random forest (RF) classification method is introduced to address the discrimination of regular and faulty situations due to its practical advantages. Working with a common dataset, the FDI performances obtained using the aforementioned strategies, with different sets of monitored variables, are observed and compared. We conclude that the hybrid FDI strategy, realized by combining a model-based scheme with a statistical classifier, outperforms the other strategies. In addition, the inclusion of two physical variables that should be measured inside the SOFCs can significantly improve the FDI performance, despite the actual difficulty in performing such measurements. PMID:27556472

  5. Markov Logic Networks in the Analysis of Genetic Data

    PubMed Central

    Sakhanenko, Nikita A.

    2010-01-01

    Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249

  6. Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

    PubMed

    Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga

    2015-10-01

    The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.

  7. Properties of JP=1/2+ baryon octets at low energy

    NASA Astrophysics Data System (ADS)

    Kaur, Amanpreet; Gupta, Pallavi; Upadhyay, Alka

    2017-06-01

    The statistical model in combination with the detailed balance principle is able to phenomenologically calculate and analyze spin- and flavor-dependent properties like magnetic moments (with effective masses, with effective charge, or with both effective mass and effective charge), quark spin polarization and distribution, the strangeness suppression factor, and \\overline{d}-\\overline{u} asymmetry incorporating the strange sea. The s\\overline{s} in the sea is said to be generated via the basic quark mechanism but suppressed by the strange quark mass factor ms>m_{u,d}. The magnetic moments of the octet baryons are analyzed within the statistical model, by putting emphasis on the SU(3) symmetry-breaking effects generated by the mass difference between the strange and non-strange quarks. The work presented here assumes hadrons with a sea having an admixture of quark gluon Fock states. The results obtained have been compared with theoretical models and experimental data.

  8. Fault detection and diagnosis using neural network approaches

    NASA Technical Reports Server (NTRS)

    Kramer, Mark A.

    1992-01-01

    Neural networks can be used to detect and identify abnormalities in real-time process data. Two basic approaches can be used, the first based on training networks using data representing both normal and abnormal modes of process behavior, and the second based on statistical characterization of the normal mode only. Given data representative of process faults, radial basis function networks can effectively identify failures. This approach is often limited by the lack of fault data, but can be facilitated by process simulation. The second approach employs elliptical and radial basis function neural networks and other models to learn the statistical distributions of process observables under normal conditions. Analytical models of failure modes can then be applied in combination with the neural network models to identify faults. Special methods can be applied to compensate for sensor failures, to produce real-time estimation of missing or failed sensors based on the correlations codified in the neural network.

  9. An Intelligent Model for Pairs Trading Using Genetic Algorithms.

    PubMed

    Huang, Chien-Feng; Hsu, Chi-Jen; Chen, Chi-Chung; Chang, Bao Rong; Li, Chen-An

    2015-01-01

    Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice.

  10. An Intelligent Model for Pairs Trading Using Genetic Algorithms

    PubMed Central

    Hsu, Chi-Jen; Chen, Chi-Chung; Li, Chen-An

    2015-01-01

    Pairs trading is an important and challenging research area in computational finance, in which pairs of stocks are bought and sold in pair combinations for arbitrage opportunities. Traditional methods that solve this set of problems mostly rely on statistical methods such as regression. In contrast to the statistical approaches, recent advances in computational intelligence (CI) are leading to promising opportunities for solving problems in the financial applications more effectively. In this paper, we present a novel methodology for pairs trading using genetic algorithms (GA). Our results showed that the GA-based models are able to significantly outperform the benchmark and our proposed method is capable of generating robust models to tackle the dynamic characteristics in the financial application studied. Based upon the promising results obtained, we expect this GA-based method to advance the research in computational intelligence for finance and provide an effective solution to pairs trading for investment in practice. PMID:26339236

  11. Automatic stage identification of Drosophila egg chamber based on DAPI images

    PubMed Central

    Jia, Dongyu; Xu, Qiuping; Xie, Qian; Mio, Washington; Deng, Wu-Min

    2016-01-01

    The Drosophila egg chamber, whose development is divided into 14 stages, is a well-established model for developmental biology. However, visual stage determination can be a tedious, subjective and time-consuming task prone to errors. Our study presents an objective, reliable and repeatable automated method for quantifying cell features and classifying egg chamber stages based on DAPI images. The proposed approach is composed of two steps: 1) a feature extraction step and 2) a statistical modeling step. The egg chamber features used are egg chamber size, oocyte size, egg chamber ratio and distribution of follicle cells. Methods for determining the on-site of the polytene stage and centripetal migration are also discussed. The statistical model uses linear and ordinal regression to explore the stage-feature relationships and classify egg chamber stages. Combined with machine learning, our method has great potential to enable discovery of hidden developmental mechanisms. PMID:26732176

  12. Role of weakest links and system-size scaling in multiscale modeling of stochastic plasticity

    NASA Astrophysics Data System (ADS)

    Ispánovity, Péter Dusán; Tüzes, Dániel; Szabó, Péter; Zaiser, Michael; Groma, István

    2017-02-01

    Plastic deformation of crystalline and amorphous matter often involves intermittent local strain burst events. To understand the physical background of the phenomenon a minimal stochastic mesoscopic model was introduced, where details of the microstructure evolution are statistically represented in terms of a fluctuating local yield threshold. In the present paper we propose a method for determining the corresponding yield stress distribution for the case of crystal plasticity from lower scale discrete dislocation dynamics simulations which we combine with weakest link arguments. The success of scale linking is demonstrated by comparing stress-strain curves obtained from the resulting mesoscopic and the underlying discrete dislocation models in the microplastic regime. As shown by various scaling relations they are statistically equivalent and behave identically in the thermodynamic limit. The proposed technique is expected to be applicable to different microstructures and also to amorphous materials.

  13. The Effect of Automobile Safety on Vehicle Type Choice: An Empirical Study.

    ERIC Educational Resources Information Center

    McCarthy, Patrick S.

    An analysis was made of the extent to which the safety characteristics of new vehicles affect consumer purchase decisions. Using an extensive data set that combines vehicle data collected by the Automobile Club of Southern California Target Car Program with the responses from a national household survey of new car buyers, a statistical model of…

  14. Synthesis of Single-Case Experimental Data: A Comparison of Alternative Multilevel Approaches

    ERIC Educational Resources Information Center

    Ferron, John; Van den Noortgate, Wim; Beretvas, Tasha; Moeyaert, Mariola; Ugille, Maaike; Petit-Bois, Merlande; Baek, Eun Kyeng

    2013-01-01

    Single-case or single-subject experimental designs (SSED) are used to evaluate the effect of one or more treatments on a single case. Although SSED studies are growing in popularity, the results are in theory case-specific. One systematic and statistical approach for combining single-case data within and across studies is multilevel modeling. The…

  15. Varsity Sport Participation and Consumption as Two of Several Factors Associated with College Achievement and Attrition.

    ERIC Educational Resources Information Center

    Henriksen, Larry; And Others

    The study reported in this paper used path analysis statistical methodology and an attrition model to examine the separate and combined relationships of several different types of background, personal, and college experience variables with the attained college grade-point average (GPA) and graduation rate for 3,125 Ball State (Indiana) University…

  16. Factors Influencing Student Prerequisite Preparation for and Subsequent Performance in College Chemistry Two: A Statistical Investigation

    ERIC Educational Resources Information Center

    Easter, David C.

    2010-01-01

    For students entering Chemistry Two following a Chemistry One course, an assessment exam was given and the results were evaluated in combination with other variables to develop a predictive model that forecasts student achievement in the course. Variables considered in the analysis included student major, GPA, classification (student standing:…

  17. Using Statistical Techniques and Web Search to Correct ESL Errors

    ERIC Educational Resources Information Center

    Gamon, Michael; Leacock, Claudia; Brockett, Chris; Dolan, William B.; Gao, Jianfeng; Belenko, Dmitriy; Klementiev, Alexandre

    2009-01-01

    In this paper we present a system for automatic correction of errors made by learners of English. The system has two novel aspects. First, machine-learned classifiers trained on large amounts of native data and a very large language model are combined to optimize the precision of suggested corrections. Second, the user can access real-life web…

  18. Modeling of the spectral evolution in a narrow-linewidth fiber amplifier

    NASA Astrophysics Data System (ADS)

    Liu, Wei; Kuang, Wenjun; Jiang, Man; Xu, Jiangming; Zhou, Pu; Liu, Zejin

    2016-03-01

    Efficient numerical modeling of the spectral evolution in a narrow-linewidth fiber amplifier is presented. By describing the seeds using a statistical model and simulating the amplification process through power balanced equations combined with the nonlinear Schrödinger equations, the spectral evolution of different seeds in the fiber amplifier can be evaluated accurately. The simulation results show that the output spectra are affected by the temporal stability of the seeds and the seeds with constant amplitude in time are beneficial to maintain the linewidth of the seed in the fiber amplifier.

  19. Gaussian Mixture Model of Heart Rate Variability

    PubMed Central

    Costa, Tommaso; Boccignone, Giuseppe; Ferraro, Mario

    2012-01-01

    Heart rate variability (HRV) is an important measure of sympathetic and parasympathetic functions of the autonomic nervous system and a key indicator of cardiovascular condition. This paper proposes a novel method to investigate HRV, namely by modelling it as a linear combination of Gaussians. Results show that three Gaussians are enough to describe the stationary statistics of heart variability and to provide a straightforward interpretation of the HRV power spectrum. Comparisons have been made also with synthetic data generated from different physiologically based models showing the plausibility of the Gaussian mixture parameters. PMID:22666386

  20. QSAR modeling of β-lactam binding to human serum proteins

    NASA Astrophysics Data System (ADS)

    Hall, L. Mark; Hall, Lowell H.; Kier, Lemont B.

    2003-02-01

    The binding of beta-lactams to human serum proteins was modeled with topological descriptors of molecular structure. Experimental data was the concentration of protein-bound drug expressed as a percent of the total plasma concentration (percent fraction bound, PFB) for 87 penicillins and for 115 β-lactams. The electrotopological state indices (E-State) and the molecular connectivity chi indices were found to be the basis of two satisfactory models. A data set of 74 penicillins from a drug design series was successfully modeled with statistics: r2=0.80, s = 12.1, q2=0.76, spress=13.4. This model was then used to predict protein binding (PFB) for 13 commercial penicillins, resulting in a very good mean absolute error, MAE = 12.7 and correlation coefficient, q2=0.84. A group of 28 cephalosporins were combined with the penicillin data to create a dataset of 115 beta-lactams that was successfully modeled: r2=0.82, s = 12.7, q2=0.78, spress=13.7. A ten-fold 10% leave-group-out (LGO) cross-validation procedure was implemented, leading to very good statistics: MAE = 10.9, spress=14.0, q2 (or r2 press)=0.78. The models indicate a combination of general and specific structure features that are important for estimating protein binding in this class of antibiotics. For the β-lactams, significant factors that increase binding are presence and electron accessibility of aromatic rings, halogens, methylene groups, and =N- atoms. Significant negative influence on binding comes from amine groups and carbonyl oxygen atoms.

  1. Bayesian inference for joint modelling of longitudinal continuous, binary and ordinal events.

    PubMed

    Li, Qiuju; Pan, Jianxin; Belcher, John

    2016-12-01

    In medical studies, repeated measurements of continuous, binary and ordinal outcomes are routinely collected from the same patient. Instead of modelling each outcome separately, in this study we propose to jointly model the trivariate longitudinal responses, so as to take account of the inherent association between the different outcomes and thus improve statistical inferences. This work is motivated by a large cohort study in the North West of England, involving trivariate responses from each patient: Body Mass Index, Depression (Yes/No) ascertained with cut-off score not less than 8 at the Hospital Anxiety and Depression Scale, and Pain Interference generated from the Medical Outcomes Study 36-item short-form health survey with values returned on an ordinal scale 1-5. There are some well-established methods for combined continuous and binary, or even continuous and ordinal responses, but little work was done on the joint analysis of continuous, binary and ordinal responses. We propose conditional joint random-effects models, which take into account the inherent association between the continuous, binary and ordinal outcomes. Bayesian analysis methods are used to make statistical inferences. Simulation studies show that, by jointly modelling the trivariate outcomes, standard deviations of the estimates of parameters in the models are smaller and much more stable, leading to more efficient parameter estimates and reliable statistical inferences. In the real data analysis, the proposed joint analysis yields a much smaller deviance information criterion value than the separate analysis, and shows other good statistical properties too. © The Author(s) 2014.

  2. Flies and humans share a motion estimation strategy that exploits natural scene statistics

    PubMed Central

    Clark, Damon A.; Fitzgerald, James E.; Ales, Justin M.; Gohl, Daryl M.; Silies, Marion A.; Norcia, Anthony M.; Clandinin, Thomas R.

    2014-01-01

    Sighted animals extract motion information from visual scenes by processing spatiotemporal patterns of light falling on the retina. The dominant models for motion estimation exploit intensity correlations only between pairs of points in space and time. Moving natural scenes, however, contain more complex correlations. Here we show that fly and human visual systems encode the combined direction and contrast polarity of moving edges using triple correlations that enhance motion estimation in natural environments. Both species extract triple correlations with neural substrates tuned for light or dark edges, and sensitivity to specific triple correlations is retained even as light and dark edge motion signals are combined. Thus, both species separately process light and dark image contrasts to capture motion signatures that can improve estimation accuracy. This striking convergence argues that statistical structures in natural scenes have profoundly affected visual processing, driving a common computational strategy over 500 million years of evolution. PMID:24390225

  3. Inferring action structure and causal relationships in continuous sequences of human action.

    PubMed

    Buchsbaum, Daphna; Griffiths, Thomas L; Plunkett, Dillon; Gopnik, Alison; Baldwin, Dare

    2015-02-01

    In the real world, causal variables do not come pre-identified or occur in isolation, but instead are embedded within a continuous temporal stream of events. A challenge faced by both human learners and machine learning algorithms is identifying subsequences that correspond to the appropriate variables for causal inference. A specific instance of this problem is action segmentation: dividing a sequence of observed behavior into meaningful actions, and determining which of those actions lead to effects in the world. Here we present a Bayesian analysis of how statistical and causal cues to segmentation should optimally be combined, as well as four experiments investigating human action segmentation and causal inference. We find that both people and our model are sensitive to statistical regularities and causal structure in continuous action, and are able to combine these sources of information in order to correctly infer both causal relationships and segmentation boundaries. Copyright © 2014. Published by Elsevier Inc.

  4. The Shape of a Ponytail and the Statistical Physics of Hair Fiber Bundles

    NASA Astrophysics Data System (ADS)

    Goldstein, Raymond E.; Warren, Patrick B.; Ball, Robin C.

    2012-02-01

    From Leonardo to the Brothers Grimm our fascination with hair has endured in art and science. Yet, a quantitative understanding of the shapes of a hair bundles has been lacking. Here we combine experiment and theory to propose an answer to the most basic question: What is the shape of a ponytail? A model for the shape of hair bundles is developed from the perspective of statistical physics, treating individual fibers as elastic filaments with random intrinsic curvatures. The combined effects of bending elasticity, gravity, and bundle compressibility are recast as a differential equation for the envelope of a bundle, in which the compressibility enters through an ``equation of state.'' From this, we identify the balance of forces in various regions of the ponytail, extract the equation of state from analysis of ponytail shapes, and relate the observed pressure to the measured random curvatures of individual hairs.

  5. A phylogenetic transform enhances analysis of compositional microbiota data

    PubMed Central

    Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence A

    2017-01-01

    Surveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities. DOI: http://dx.doi.org/10.7554/eLife.21887.001 PMID:28198697

  6. A bilayer Double Semion Model with Symmetry-Enriched Topological Order

    NASA Astrophysics Data System (ADS)

    Ortiz, Laura; Martin-Delgado, Miguel Angel

    We construct a new model of two-dimensional quantum spin systems that combines intrinsic topological orders and a global symmetry called flavour symmetry. It is referred as the bilayer Doubled Semion model (bDS) and is an instance of symmetry-enriched topological order. A honeycomb bilayer lattice is introduced to combine a Double Semion Topolgical Order with a global spin-flavour symmetry to get the fractionalization of its quasiparticles. The bDS model exhibits non-trival braiding self-statistics of excitations and its dual model constitutes a Symmetry-Protected Topological Order with novel edge states. This dual model gives rise to a bilayer Non-Trivial Paramagnet that is invariant under the flavour symmetry and the well-known spin flip symmetry. We acknowledge financial support from the Spanish MINECO Grants FIS2012-33152, FIS2015-67411, and the CAM research consortium QUITEMAD+, Grant No. S2013/ICE-2801. The research of M.A.M.-D. has been supported in part by the U.S. Army Research Office throu.

  7. Stochastic modeling of Lagrangian accelerations

    NASA Astrophysics Data System (ADS)

    Reynolds, Andy

    2002-11-01

    It is shown how Sawford's second-order Lagrangian stochastic model (Phys. Fluids A 3, 1577-1586, 1991) for fluid-particle accelerations can be combined with a model for the evolution of the dissipation rate (Pope and Chen, Phys. Fluids A 2, 1437-1449, 1990) to produce a Lagrangian stochastic model that is consistent with both the measured distribution of Lagrangian accelerations (La Porta et al., Nature 409, 1017-1019, 2001) and Kolmogorov's similarity theory. The later condition is found not to be satisfied when a constant dissipation rate is employed and consistency with prescribed acceleration statistics is enforced through fulfilment of a well-mixed condition.

  8. A review of statistical updating methods for clinical prediction models.

    PubMed

    Su, Ting-Li; Jaki, Thomas; Hickey, Graeme L; Buchan, Iain; Sperrin, Matthew

    2018-01-01

    A clinical prediction model is a tool for predicting healthcare outcomes, usually within a specific population and context. A common approach is to develop a new clinical prediction model for each population and context; however, this wastes potentially useful historical information. A better approach is to update or incorporate the existing clinical prediction models already developed for use in similar contexts or populations. In addition, clinical prediction models commonly become miscalibrated over time, and need replacing or updating. In this article, we review a range of approaches for re-using and updating clinical prediction models; these fall in into three main categories: simple coefficient updating, combining multiple previous clinical prediction models in a meta-model and dynamic updating of models. We evaluated the performance (discrimination and calibration) of the different strategies using data on mortality following cardiac surgery in the United Kingdom: We found that no single strategy performed sufficiently well to be used to the exclusion of the others. In conclusion, useful tools exist for updating existing clinical prediction models to a new population or context, and these should be implemented rather than developing a new clinical prediction model from scratch, using a breadth of complementary statistical methods.

  9. Predicting clinical diagnosis in Huntington's disease: An imaging polymarker

    PubMed Central

    Daws, Richard E.; Soreq, Eyal; Johnson, Eileanoir B.; Scahill, Rachael I.; Tabrizi, Sarah J.; Barker, Roger A.; Hampshire, Adam

    2018-01-01

    Objective Huntington's disease (HD) gene carriers can be identified before clinical diagnosis; however, statistical models for predicting when overt motor symptoms will manifest are too imprecise to be useful at the level of the individual. Perfecting this prediction is integral to the search for disease modifying therapies. This study aimed to identify an imaging marker capable of reliably predicting real‐life clinical diagnosis in HD. Method A multivariate machine learning approach was applied to resting‐state and structural magnetic resonance imaging scans from 19 premanifest HD gene carriers (preHD, 8 of whom developed clinical disease in the 5 years postscanning) and 21 healthy controls. A classification model was developed using cross‐group comparisons between preHD and controls, and within the preHD group in relation to “estimated” and “actual” proximity to disease onset. Imaging measures were modeled individually, and combined, and permutation modeling robustly tested classification accuracy. Results Classification performance for preHDs versus controls was greatest when all measures were combined. The resulting polymarker predicted converters with high accuracy, including those who were not expected to manifest in that time scale based on the currently adopted statistical models. Interpretation We propose that a holistic multivariate machine learning treatment of brain abnormalities in the premanifest phase can be used to accurately identify those patients within 5 years of developing motor features of HD, with implications for prognostication and preclinical trials. Ann Neurol 2018;83:532–543 PMID:29405351

  10. Comparison modeling for alpine vegetation distribution in an arid area.

    PubMed

    Zhou, Jihua; Lai, Liming; Guan, Tianyu; Cai, Wetao; Gao, Nannan; Zhang, Xiaolong; Yang, Dawen; Cong, Zhentao; Zheng, Yuanrun

    2016-07-01

    Mapping and modeling vegetation distribution are fundamental topics in vegetation ecology. With the rise of powerful new statistical techniques and GIS tools, the development of predictive vegetation distribution models has increased rapidly. However, modeling alpine vegetation with high accuracy in arid areas is still a challenge because of the complexity and heterogeneity of the environment. Here, we used a set of 70 variables from ASTER GDEM, WorldClim, and Landsat-8 OLI (land surface albedo and spectral vegetation indices) data with decision tree (DT), maximum likelihood classification (MLC), and random forest (RF) models to discriminate the eight vegetation groups and 19 vegetation formations in the upper reaches of the Heihe River Basin in the Qilian Mountains, northwest China. The combination of variables clearly discriminated vegetation groups but failed to discriminate vegetation formations. Different variable combinations performed differently in each type of model, but the most consistently important parameter in alpine vegetation modeling was elevation. The best RF model was more accurate for vegetation modeling compared with the DT and MLC models for this alpine region, with an overall accuracy of 75 % and a kappa coefficient of 0.64 verified against field point data and an overall accuracy of 65 % and a kappa of 0.52 verified against vegetation map data. The accuracy of regional vegetation modeling differed depending on the variable combinations and models, resulting in different classifications for specific vegetation groups.

  11. Wave Propagation in Non-Stationary Statistical Mantle Models at the Global Scale

    NASA Astrophysics Data System (ADS)

    Meschede, M.; Romanowicz, B. A.

    2014-12-01

    We study the effect of statistically distributed heterogeneities that are smaller than the resolution of current tomographic models on seismic waves that propagate through the Earth's mantle at teleseismic distances. Current global tomographic models are missing small-scale structure as evidenced by the failure of even accurate numerical synthetics to explain enhanced coda in observed body and surface waveforms. One way to characterize small scale heterogeneity is to construct random models and confront observed coda waveforms with predictions from these models. Statistical studies of the coda typically rely on models with simplified isotropic and stationary correlation functions in Cartesian geometries. We show how to construct more complex random models for the mantle that can account for arbitrary non-stationary and anisotropic correlation functions as well as for complex geometries. Although this method is computationally heavy, model characteristics such as translational, cylindrical or spherical symmetries can be used to greatly reduce the complexity such that this method becomes practical. With this approach, we can create 3D models of the full spherical Earth that can be radially anisotropic, i.e. with different horizontal and radial correlation functions, and radially non-stationary, i.e. with radially varying model power and correlation functions. Both of these features are crucial for a statistical description of the mantle in which structure depends to first order on the spherical geometry of the Earth. We combine different random model realizations of S velocity with current global tomographic models that are robust at long wavelengths (e.g. Meschede and Romanowicz, 2014, GJI submitted), and compute the effects of these hybrid models on the wavefield with a spectral element code (SPECFEM3D_GLOBE). We finally analyze the resulting coda waves for our model selection and compare our computations with observations. Based on these observations, we make predictions about the strength of unresolved small-scale structure and extrinsic attenuation.

  12. Variable Selection in the Presence of Missing Data: Imputation-based Methods.

    PubMed

    Zhao, Yize; Long, Qi

    2017-01-01

    Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.

  13. A Bayesian approach for parameter estimation and prediction using a computationally intensive model

    DOE PAGES

    Higdon, Dave; McDonnell, Jordan D.; Schunck, Nicolas; ...

    2015-02-05

    Bayesian methods have been successful in quantifying uncertainty in physics-based problems in parameter estimation and prediction. In these cases, physical measurements y are modeled as the best fit of a physics-based modelmore » $$\\eta (\\theta )$$, where θ denotes the uncertain, best input setting. Hence the statistical model is of the form $$y=\\eta (\\theta )+\\epsilon ,$$ where $$\\epsilon $$ accounts for measurement, and possibly other, error sources. When nonlinearity is present in $$\\eta (\\cdot )$$, the resulting posterior distribution for the unknown parameters in the Bayesian formulation is typically complex and nonstandard, requiring computationally demanding computational approaches such as Markov chain Monte Carlo (MCMC) to produce multivariate draws from the posterior. Although generally applicable, MCMC requires thousands (or even millions) of evaluations of the physics model $$\\eta (\\cdot )$$. This requirement is problematic if the model takes hours or days to evaluate. To overcome this computational bottleneck, we present an approach adapted from Bayesian model calibration. This approach combines output from an ensemble of computational model runs with physical measurements, within a statistical formulation, to carry out inference. A key component of this approach is a statistical response surface, or emulator, estimated from the ensemble of model runs. We demonstrate this approach with a case study in estimating parameters for a density functional theory model, using experimental mass/binding energy measurements from a collection of atomic nuclei. Lastly, we also demonstrate how this approach produces uncertainties in predictions for recent mass measurements obtained at Argonne National Laboratory.« less

  14. Bayesian model averaging using particle filtering and Gaussian mixture modeling: Theory, concepts, and simulation experiments

    NASA Astrophysics Data System (ADS)

    Rings, Joerg; Vrugt, Jasper A.; Schoups, Gerrit; Huisman, Johan A.; Vereecken, Harry

    2012-05-01

    Bayesian model averaging (BMA) is a standard method for combining predictive distributions from different models. In recent years, this method has enjoyed widespread application and use in many fields of study to improve the spread-skill relationship of forecast ensembles. The BMA predictive probability density function (pdf) of any quantity of interest is a weighted average of pdfs centered around the individual (possibly bias-corrected) forecasts, where the weights are equal to posterior probabilities of the models generating the forecasts, and reflect the individual models skill over a training (calibration) period. The original BMA approach presented by Raftery et al. (2005) assumes that the conditional pdf of each individual model is adequately described with a rather standard Gaussian or Gamma statistical distribution, possibly with a heteroscedastic variance. Here we analyze the advantages of using BMA with a flexible representation of the conditional pdf. A joint particle filtering and Gaussian mixture modeling framework is presented to derive analytically, as closely and consistently as possible, the evolving forecast density (conditional pdf) of each constituent ensemble member. The median forecasts and evolving conditional pdfs of the constituent models are subsequently combined using BMA to derive one overall predictive distribution. This paper introduces the theory and concepts of this new ensemble postprocessing method, and demonstrates its usefulness and applicability by numerical simulation of the rainfall-runoff transformation using discharge data from three different catchments in the contiguous United States. The revised BMA method receives significantly lower-prediction errors than the original default BMA method (due to filtering) with predictive uncertainty intervals that are substantially smaller but still statistically coherent (due to the use of a time-variant conditional pdf).

  15. Combining abiotic and biotic models - Hydraulical modeling to fill the gap between catchment and hydro-dynamic models

    NASA Astrophysics Data System (ADS)

    Guse, B.; Sulc, D.; Schmalz, B.; Fohrer, N.

    2012-04-01

    The European Water Framework Directive (WFD) requires a catchment-based approach, which is assessed in the IMPACT project by combining abiotic and biotic models. The core point of IMPACT is a model chain (catchment model -> 1-D-hydraulic model -> 3-D-hydro-morphodynamic model -> biotic habitat model) with the aim to estimate the occurrence of the target species of the WFD. Firstly, the model chain is developed for the current land use and climate conditions. Secondly, land use and climate change scenarios are developed at the catchment scale. The outputs of the catchment model for the scenarios are used as input for the next models within the model chain to estimate the effect of these changes on the target species. The eco-hydrological catchment model SWAT is applied for the Treene catchment in Northern Germany and delivers discharge and water quality parameters as a spatial explicit output for each subbasin. There is no water level information given by SWAT. However, water level values are needed as lower boundary condition for the hydro-dynamic and habitat models which are applied for the 300 m candidate reference reach. In order to fill the gap between the catchment and the hydro-morphodynamic model, the 1-D hydraulic model HEC-RAS is applied for a 3 km long reach transect from the next upstream hydrological station until the upper bound of the candidate study reach. The channel geometry for HEC-RAS was estimated based on 96 cross-sections which were measured in the IMPACT project. By using available discharge and water level measurements from the hydrological station and own flow velocity measurements, the channel resistence was estimated. HEC-RAS was run with different statistical indices (mean annual drought, mean discharge, …) for steady flow conditions. The rating curve was then constructed for the target cross-section, i.e. the lower bound of the candidate study reach, to fulfill the combining with the hydro- and morphodynamic models. These statistical indices can also be calculated for the discharge series provided by land use and climate scenarios. In this way, the effect of land use and climate change on the catchment and the hydraulic processes can be assessed.

  16. ToNER: A tool for identifying nucleotide enrichment signals in feature-enriched RNA-seq data.

    PubMed

    Promworn, Yuttachon; Kaewprommal, Pavita; Shaw, Philip J; Intarapanich, Apichart; Tongsima, Sissades; Piriyapongsa, Jittima

    2017-01-01

    Biochemical methods are available for enriching 5' ends of RNAs in prokaryotes, which are employed in the differential RNA-seq (dRNA-seq) and the more recent Cappable-seq protocols. Computational methods are needed to locate RNA 5' ends from these data by statistical analysis of the enrichment. Although statistical-based analysis methods have been developed for dRNA-seq, they may not be suitable for Cappable-seq data. The more efficient enrichment method employed in Cappable-seq compared with dRNA-seq could affect data distribution and thus algorithm performance. We present Transformation of Nucleotide Enrichment Ratios (ToNER), a tool for statistical modeling of enrichment from RNA-seq data obtained from enriched and unenriched libraries. The tool calculates nucleotide enrichment scores and determines the global transformation for fitting to the normal distribution using the Box-Cox procedure. From the transformed distribution, sites of significant enrichment are identified. To increase power of detection, meta-analysis across experimental replicates is offered. We tested the tool on Cappable-seq and dRNA-seq data for identifying Escherichia coli transcript 5' ends and compared the results with those from the TSSAR tool, which is designed for analyzing dRNA-seq data. When combining results across Cappable-seq replicates, ToNER detects more known transcript 5' ends than TSSAR. In general, the transcript 5' ends detected by ToNER but not TSSAR occur in regions which cannot be locally modeled by TSSAR. ToNER uses a simple yet robust statistical modeling approach, which can be used for detecting RNA 5'ends from Cappable-seq data, in particular when combining information from experimental replicates. The ToNER tool could potentially be applied for analyzing other RNA-seq datasets in which enrichment for other structural features of RNA is employed. The program is freely available for download at ToNER webpage (http://www4a.biotec.or.th/GI/tools/toner) and GitHub repository (https://github.com/PavitaKae/ToNER).

  17. The use of algorithmic behavioural transfer functions in parametric EO system performance models

    NASA Astrophysics Data System (ADS)

    Hickman, Duncan L.; Smith, Moira I.

    2015-10-01

    The use of mathematical models to predict the overall performance of an electro-optic (EO) system is well-established as a methodology and is used widely to support requirements definition, system design, and produce performance predictions. Traditionally these models have been based upon cascades of transfer functions based on established physical theory, such as the calculation of signal levels from radiometry equations, as well as the use of statistical models. However, the performance of an EO system is increasing being dominated by the on-board processing of the image data and this automated interpretation of image content is complex in nature and presents significant modelling challenges. Models and simulations of EO systems tend to either involve processing of image data as part of a performance simulation (image-flow) or else a series of mathematical functions that attempt to define the overall system characteristics (parametric). The former approach is generally more accurate but statistically and theoretically weak in terms of specific operational scenarios, and is also time consuming. The latter approach is generally faster but is unable to provide accurate predictions of a system's performance under operational conditions. An alternative and novel architecture is presented in this paper which combines the processing speed attributes of parametric models with the accuracy of image-flow representations in a statistically valid framework. An additional dimension needed to create an effective simulation is a robust software design whose architecture reflects the structure of the EO System and its interfaces. As such, the design of the simulator can be viewed as a software prototype of a new EO System or an abstraction of an existing design. This new approach has been used successfully to model a number of complex military systems and has been shown to combine improved performance estimation with speed of computation. Within the paper details of the approach and architecture are described in detail, and example results based on a practical application are then given which illustrate the performance benefits. Finally, conclusions are drawn and comments given regarding the benefits and uses of the new approach.

  18. Multiplatform serum metabolic phenotyping combined with pathway mapping to identify biochemical differences in smokers.

    PubMed

    Kaluarachchi, Manuja R; Boulangé, Claire L; Garcia-Perez, Isabel; Lindon, John C; Minet, Emmanuel F

    2016-10-01

    Determining perturbed biochemical functions associated with tobacco smoking should be helpful for establishing causal relationships between exposure and adverse events. A multiplatform comparison of serum of smokers (n = 55) and never-smokers (n = 57) using nuclear magnetic resonance spectroscopy, UPLC-MS and statistical modeling revealed clustering of the classes, distinguished by metabolic biomarkers. The identified metabolites were subjected to metabolic pathway enrichment, modeling adverse biological events using available databases. Perturbation of metabolites involved in chronic obstructive pulmonary disease, cardiovascular diseases and cancer were identified and discussed. Combining multiplatform metabolic phenotyping with knowledge-based mapping gives mechanistic insights into disease development, which can be applied to next-generation tobacco and nicotine products for comparative risk assessment.

  19. Symbolic Processing Combined with Model-Based Reasoning

    NASA Technical Reports Server (NTRS)

    James, Mark

    2009-01-01

    A computer program for the detection of present and prediction of future discrete states of a complex, real-time engineering system utilizes a combination of symbolic processing and numerical model-based reasoning. One of the biggest weaknesses of a purely symbolic approach is that it enables prediction of only future discrete states while missing all unmodeled states or leading to incorrect identification of an unmodeled state as a modeled one. A purely numerical approach is based on a combination of statistical methods and mathematical models of the applicable physics and necessitates development of a complete model to the level of fidelity required for prediction. In addition, a purely numerical approach does not afford the ability to qualify its results without some form of symbolic processing. The present software implements numerical algorithms to detect unmodeled events and symbolic algorithms to predict expected behavior, correlate the expected behavior with the unmodeled events, and interpret the results in order to predict future discrete states. The approach embodied in this software differs from that of the BEAM methodology (aspects of which have been discussed in several prior NASA Tech Briefs articles), which provides for prediction of future measurements in the continuous-data domain.

  20. Feature maps driven no-reference image quality prediction of authentically distorted images

    NASA Astrophysics Data System (ADS)

    Ghadiyaram, Deepti; Bovik, Alan C.

    2015-03-01

    Current blind image quality prediction models rely on benchmark databases comprised of singly and synthetically distorted images, thereby learning image features that are only adequate to predict human perceived visual quality on such inauthentic distortions. However, real world images often contain complex mixtures of multiple distortions. Rather than a) discounting the effect of these mixtures of distortions on an image's perceptual quality and considering only the dominant distortion or b) using features that are only proven to be efficient for singly distorted images, we deeply study the natural scene statistics of authentically distorted images, in different color spaces and transform domains. We propose a feature-maps-driven statistical approach which avoids any latent assumptions about the type of distortion(s) contained in an image, and focuses instead on modeling the remarkable consistencies in the scene statistics of real world images in the absence of distortions. We design a deep belief network that takes model-based statistical image features derived from a very large database of authentically distorted images as input and discovers good feature representations by generalizing over different distortion types, mixtures, and severities, which are later used to learn a regressor for quality prediction. We demonstrate the remarkable competence of our features for improving automatic perceptual quality prediction on a benchmark database and on the newly designed LIVE Authentic Image Quality Challenge Database and show that our approach of combining robust statistical features and the deep belief network dramatically outperforms the state-of-the-art.

  1. The application of data mining techniques to oral cancer prognosis.

    PubMed

    Tseng, Wan-Ting; Chiang, Wei-Fan; Liu, Shyun-Yeu; Roan, Jinsheng; Lin, Chun-Nan

    2015-05-01

    This study adopted an integrated procedure that combines the clustering and classification features of data mining technology to determine the differences between the symptoms shown in past cases where patients died from or survived oral cancer. Two data mining tools, namely decision tree and artificial neural network, were used to analyze the historical cases of oral cancer, and their performance was compared with that of logistic regression, the popular statistical analysis tool. Both decision tree and artificial neural network models showed superiority to the traditional statistical model. However, as to clinician, the trees created by the decision tree models are relatively easier to interpret compared to that of the artificial neural network models. Cluster analysis also discovers that those stage 4 patients whose also possess the following four characteristics are having an extremely low survival rate: pN is N2b, level of RLNM is level I-III, AJCC-T is T4, and cells mutate situation (G) is moderate.

  2. Evaluation of Oceanic Transport Statistics By Use of Transient Tracers and Bayesian Methods

    NASA Astrophysics Data System (ADS)

    Trossman, D. S.; Thompson, L.; Mecking, S.; Bryan, F.; Peacock, S.

    2013-12-01

    Key variables that quantify the time scales over which atmospheric signals penetrate into the oceanic interior and their uncertainties are computed using Bayesian methods and transient tracers from both models and observations. First, the mean residence times, subduction rates, and formation rates of Subtropical Mode Water (STMW) and Subpolar Mode Water (SPMW) in the North Atlantic and Subantarctic Mode Water (SAMW) in the Southern Ocean are estimated by combining a model and observations of chlorofluorocarbon-11 (CFC-11) via Bayesian Model Averaging (BMA), statistical technique that weights model estimates according to how close they agree with observations. Second, a Bayesian method is presented to find two oceanic transport parameters associated with the age distribution of ocean waters, the transit-time distribution (TTD), by combining an eddying global ocean model's estimate of the TTD with hydrographic observations of CFC-11, temperature, and salinity. Uncertainties associated with objectively mapping irregularly spaced bottle data are quantified by making use of a thin-plate spline and then propagated via the two Bayesian techniques. It is found that the subduction of STMW, SPMW, and SAMW is mostly an advective process, but up to about one-third of STMW subduction likely owes to non-advective processes. Also, while the formation of STMW is mostly due to subduction, the formation of SPMW is mostly due to other processes. About half of the formation of SAMW is due to subduction and half is due to other processes. A combination of air-sea flux, acting on relatively short time scales, and turbulent mixing, acting on a wide range of time scales, is likely the dominant SPMW erosion mechanism. Air-sea flux is likely responsible for most STMW erosion, and turbulent mixing is likely responsible for most SAMW erosion. Two oceanic transport parameters, the mean age of a water parcel and the half-variance associated with the TTD, estimated using the model's tracers as data (BayesPOP) and those estimated using tracer observations as data (BayesObs) provide information about the sources of model biases, and give a more nuanced picture than can be found by comparing the simulated CFC-11 concentrations with observed CFC-11 concentrations. Using the differences between the two oceanic transport parameters from BayesObs and those from BayesPOP with and without a constant Peclet number assumption along each of the hydrographic cross-sections considered here, it is found that the model's diffusivity tensor biases lead to larger model errors than the model's mean advection time biases. However, it is also found that mean advection time biases in the model are statistically significant at the 95% level where mode water is found.

  3. Probabilistic dietary exposure assessment taking into account variability in both amount and frequency of consumption.

    PubMed

    Slob, Wout

    2006-07-01

    Probabilistic dietary exposure assessments that are fully based on Monte Carlo sampling from the raw intake data may not be appropriate. This paper shows that the data should first be analysed by using a statistical model that is able to take the various dimensions of food consumption patterns into account. A (parametric) model is discussed that takes into account the interindividual variation in (daily) consumption frequencies, as well as in amounts consumed. Further, the model can be used to include covariates, such as age, sex, or other individual attributes. Some illustrative examples show how this model may be used to estimate the probability of exceeding an (acute or chronic) exposure limit. These results are compared with the results based on directly counting the fraction of observed intakes exceeding the limit value. This comparison shows that the latter method is not adequate, in particular for the acute exposure situation. A two-step approach for probabilistic (acute) exposure assessment is proposed: first analyse the consumption data by a (parametric) statistical model as discussed in this paper, and then use Monte Carlo techniques for combining the variation in concentrations with the variation in consumption (by sampling from the statistical model). This approach results in an estimate of the fraction of the population as a function of the fraction of days at which the exposure limit is exceeded by the individual.

  4. Economic correlates of violent death rates in forty countries, 1962-2008: A cross-typological analysis.

    PubMed

    Lee, Bandy X; Marotta, Phillip L; Blay-Tofey, Morkeh; Wang, Winnie; de Bourmont, Shalila

    2014-01-01

    Our goal was to identify if there might be advantages to combining two major public health concerns, i.e., homicides and suicides, in an analysis with well-established macro-level economic determinants, i.e., unemployment and inequality. Mortality data, unemployment statistics, and inequality measures were obtained for 40 countries for the years 1962-2008. Rates of combined homicide and suicide, ratio of suicide to combined violent death, and ratio between homicide and suicide were graphed and analyzed. A fixed effects regression model was then performed for unemployment rates and Gini coefficients on homicide, suicide, and combined death rates. For a majority of nation states, suicide comprised a substantial proportion (mean 75.51%; range 0-99%) of the combined rate of homicide and suicide. When combined, a small but significant relationship emerged between logged Gini coefficient and combined death rates (0.0066, p < 0.05), suggesting that the combined rate improves the ability to detect a significant relationship when compared to either rate measurement alone. Results were duplicated by age group, whereby combining death rates into a single measure improved statistical power, provided that the association was strong. Violent deaths, when combined, were associated with an increase in unemployment and an increase in Gini coefficient, creating a more robust variable. As the effects of macro-level factors (e.g., social and economic policies) on violent death rates in a population are shown to be more significant than those of micro-level influences (e.g., individual characteristics), these associations may be useful to discover. An expansion of socioeconomic variables and the inclusion of other forms of violence in future research could help elucidate long-term trends.

  5. Economic correlates of violent death rates in forty countries, 1962–2008: A cross-typological analysis

    PubMed Central

    Lee, Bandy X.; Marotta, Phillip L.; Blay-Tofey, Morkeh; Wang, Winnie; de Bourmont, Shalila

    2015-01-01

    Objectives Our goal was to identify if there might be advantages to combining two major public health concerns, i.e., homicides and suicides, in an analysis with well-established macro-level economic determinants, i.e., unemployment and inequality. Methods Mortality data, unemployment statistics, and inequality measures were obtained for 40 countries for the years 1962–2008. Rates of combined homicide and suicide, ratio of suicide to combined violent death, and ratio between homicide and suicide were graphed and analyzed. A fixed effects regression model was then performed for unemployment rates and Gini coefficients on homicide, suicide, and combined death rates. Results For a majority of nation states, suicide comprised a substantial proportion (mean 75.51%; range 0–99%) of the combined rate of homicide and suicide. When combined, a small but significant relationship emerged between logged Gini coefficient and combined death rates (0.0066, p < 0.05), suggesting that the combined rate improves the ability to detect a significant relationship when compared to either rate measurement alone. Results were duplicated by age group, whereby combining death rates into a single measure improved statistical power, provided that the association was strong. Conclusions Violent deaths, when combined, were associated with an increase in unemployment and an increase in Gini coefficient, creating a more robust variable. As the effects of macro-level factors (e.g., social and economic policies) on violent death rates in a population are shown to be more significant than those of micro-level influences (e.g., individual characteristics), these associations may be useful to discover. An expansion of socioeconomic variables and the inclusion of other forms of violence in future research could help elucidate long-term trends. PMID:26028985

  6. Spatial prediction of soil texture in region Centre (France) from summary data

    NASA Astrophysics Data System (ADS)

    Dobarco, Mercedes Roman; Saby, Nicolas; Paroissien, Jean-Baptiste; Orton, Tom G.

    2015-04-01

    Soil texture is a key controlling factor of important soil functions like water and nutrient holding capacity, retention of pollutants, drainage, soil biodiversity, and C cycling. High resolution soil texture maps enhance our understanding of the spatial distribution of soil properties and provide valuable information for decision making and crop management, environmental protection, and hydrological planning. We predicted the soil texture of agricultural topsoils in the Region Centre (France) combining regression and area-to-point kriging. Soil texture data was collected from the French soil-test database (BDAT), which is populated with soil analysis performed by farmers' demand. To protect the anonymity of the farms the data was treated by commune. In a first step, summary statistics of environmental covariates by commune were used to develop prediction models with Cubist, boosted regression trees, and random forests. In a second step the residuals of each individual observation were summarized by commune and kriged following the method developed by Orton et al. (2012). This approach allowed to include non-linear relationships among covariates and soil texture while accounting for the uncertainty on areal means in the area-to-point kriging step. Independent validation of the models was done using data from the systematic soil monitoring network of French soils. Future work will compare the performance of these models with a non-stationary variance geostatistical model using the most important covariates and summary statistics of texture data. The results will inform on whether the later and statistically more-challenging approach improves significantly texture predictions or whether the more simple area-to-point regression kriging can offer satisfactory results. The application of area-to-point regression kriging at national level using BDAT data has the potential to improve soil texture predictions for agricultural topsoils, especially when combined with existing maps (i.e., model ensemble).

  7. The introduction of hydrogen bond and hydrophobicity effects into the rotational isomeric states model for conformational analysis of unfolded peptides.

    PubMed

    Engin, Ozge; Sayar, Mehmet; Erman, Burak

    2009-01-13

    Relative contributions of local and non-local interactions to the unfolded conformations of peptides are examined by using the rotational isomeric states model which is a Markov model based on pairwise interactions of torsion angles. The isomeric states of a residue are well described by the Ramachandran map of backbone torsion angles. The statistical weight matrices for the states are determined by molecular dynamics simulations applied to monopeptides and dipeptides. Conformational properties of tripeptides formed from combinations of alanine, valine, tyrosine and tryptophan are investigated based on the Markov model. Comparison with molecular dynamics simulation results on these tripeptides identifies the sequence-distant long-range interactions that are missing in the Markov model. These are essentially the hydrogen bond and hydrophobic interactions that are obtained between the first and the third residue of a tripeptide. A systematic correction is proposed for incorporating these long-range interactions into the rotational isomeric states model. Preliminary results suggest that the Markov assumption can be improved significantly by renormalizing the statistical weight matrices to include the effects of the long-range correlations.

  8. Fitting and Modeling in the ASC Data Analysis Environment

    NASA Astrophysics Data System (ADS)

    Doe, S.; Siemiginowska, A.; Joye, W.; McDowell, J.

    As part of the AXAF Science Center (ASC) Data Analysis Environment, we will provide to the astronomical community a Fitting Application. We present a design of the application in this paper. Our design goal is to give the user the flexibility to use a variety of optimization techniques (Levenberg-Marquardt, maximum entropy, Monte Carlo, Powell, downhill simplex, CERN-Minuit, and simulated annealing) and fit statistics (chi (2) , Cash, variance, and maximum likelihood); our modular design allows the user easily to add their own optimization techniques and/or fit statistics. We also present a comparison of the optimization techniques to be provided by the Application. The high spatial and spectral resolutions that will be obtained with AXAF instruments require a sophisticated data modeling capability. We will provide not only a suite of astronomical spatial and spectral source models, but also the capability of combining these models into source models of up to four data dimensions (i.e., into source functions f(E,x,y,t)). We will also provide tools to create instrument response models appropriate for each observation.

  9. Trans-dimensional and hierarchical Bayesian approaches toward rigorous estimation of seismic sources and structures in the Northeast Asia

    NASA Astrophysics Data System (ADS)

    Kim, Seongryong; Tkalčić, Hrvoje; Mustać, Marija; Rhie, Junkee; Ford, Sean

    2016-04-01

    A framework is presented within which we provide rigorous estimations for seismic sources and structures in the Northeast Asia. We use Bayesian inversion methods, which enable statistical estimations of models and their uncertainties based on data information. Ambiguities in error statistics and model parameterizations are addressed by hierarchical and trans-dimensional (trans-D) techniques, which can be inherently implemented in the Bayesian inversions. Hence reliable estimation of model parameters and their uncertainties is possible, thus avoiding arbitrary regularizations and parameterizations. Hierarchical and trans-D inversions are performed to develop a three-dimensional velocity model using ambient noise data. To further improve the model, we perform joint inversions with receiver function data using a newly developed Bayesian method. For the source estimation, a novel moment tensor inversion method is presented and applied to regional waveform data of the North Korean nuclear explosion tests. By the combination of new Bayesian techniques and the structural model, coupled with meaningful uncertainties related to each of the processes, more quantitative monitoring and discrimination of seismic events is possible.

  10. Statistical modelling as an aid to the design of retail sampling plans for mycotoxins in food.

    PubMed

    MacArthur, Roy; MacDonald, Susan; Brereton, Paul; Murray, Alistair

    2006-01-01

    A study has been carried out to assess appropriate statistical models for use in evaluating retail sampling plans for the determination of mycotoxins in food. A compound gamma model was found to be a suitable fit. A simulation model based on the compound gamma model was used to produce operating characteristic curves for a range of parameters relevant to retail sampling. The model was also used to estimate the minimum number of increments necessary to minimize the overall measurement uncertainty. Simulation results showed that measurements based on retail samples (for which the maximum number of increments is constrained by cost) may produce fit-for-purpose results for the measurement of ochratoxin A in dried fruit, but are unlikely to do so for the measurement of aflatoxin B1 in pistachio nuts. In order to produce a more accurate simulation, further work is required to determine the degree of heterogeneity associated with batches of food products. With appropriate parameterization in terms of physical and biological characteristics, the systems developed in this study could be applied to other analyte/matrix combinations.

  11. The introduction of hydrogen bond and hydrophobicity effects into the rotational isomeric states model for conformational analysis of unfolded peptides

    NASA Astrophysics Data System (ADS)

    Engin, Ozge; Sayar, Mehmet; Erman, Burak

    2009-03-01

    Relative contributions of local and non-local interactions to the unfolded conformations of peptides are examined by using the rotational isomeric states model which is a Markov model based on pairwise interactions of torsion angles. The isomeric states of a residue are well described by the Ramachandran map of backbone torsion angles. The statistical weight matrices for the states are determined by molecular dynamics simulations applied to monopeptides and dipeptides. Conformational properties of tripeptides formed from combinations of alanine, valine, tyrosine and tryptophan are investigated based on the Markov model. Comparison with molecular dynamics simulation results on these tripeptides identifies the sequence-distant long-range interactions that are missing in the Markov model. These are essentially the hydrogen bond and hydrophobic interactions that are obtained between the first and the third residue of a tripeptide. A systematic correction is proposed for incorporating these long-range interactions into the rotational isomeric states model. Preliminary results suggest that the Markov assumption can be improved significantly by renormalizing the statistical weight matrices to include the effects of the long-range correlations.

  12. An Intercomparison and Evaluation of Aircraft-Derived and Simulated CO from Seven Chemical Transport Models During the TRACE-P Experiment

    NASA Technical Reports Server (NTRS)

    Kiley, C. M.; Fuelberg, Henry E.; Palmer, P. I.; Allen, D. J.; Carmichael, G. R.; Jacob, D. J.; Mari, C.; Pierce, R. B.; Pickering, K. E.; Tang, Y.

    2002-01-01

    Four global scale and three regional scale chemical transport models are intercompared and evaluated during NASA's TRACE-P experiment. Model simulated and measured CO are statistically analyzed along aircraft flight tracks. Results for the combination of eleven flights show an overall negative bias in simulated CO. Biases are most pronounced during large CO events. Statistical agreements vary greatly among the individual flights. Those flights with the greatest range of CO values tend to be the worst simulated. However, for each given flight, the models generally provide similar relative results. The models exhibit difficulties simulating intense CO plumes. CO error is found to be greatest in the lower troposphere. Convective mass flux is shown to be very important, particularly near emissions source regions. Occasionally meteorological lift associated with excessive model-calculated mass fluxes leads to an overestimation of mid- and upper- tropospheric mixing ratios. Planetary Boundary Layer (PBL) depth is found to play an important role in simulating intense CO plumes. PBL depth is shown to cap plumes, confining heavy pollution to the very lowest levels.

  13. A nonparametric statistical technique for combining global precipitation datasets: development and hydrological evaluation over the Iberian Peninsula

    NASA Astrophysics Data System (ADS)

    Abul Ehsan Bhuiyan, Md; Nikolopoulos, Efthymios I.; Anagnostou, Emmanouil N.; Quintana-Seguí, Pere; Barella-Ortiz, Anaïs

    2018-02-01

    This study investigates the use of a nonparametric, tree-based model, quantile regression forests (QRF), for combining multiple global precipitation datasets and characterizing the uncertainty of the combined product. We used the Iberian Peninsula as the study area, with a study period spanning 11 years (2000-2010). Inputs to the QRF model included three satellite precipitation products, CMORPH, PERSIANN, and 3B42 (V7); an atmospheric reanalysis precipitation and air temperature dataset; satellite-derived near-surface daily soil moisture data; and a terrain elevation dataset. We calibrated the QRF model for two seasons and two terrain elevation categories and used it to generate ensemble for these conditions. Evaluation of the combined product was based on a high-resolution, ground-reference precipitation dataset (SAFRAN) available at 5 km 1 h-1 resolution. Furthermore, to evaluate relative improvements and the overall impact of the combined product in hydrological response, we used the generated ensemble to force a distributed hydrological model (the SURFEX land surface model and the RAPID river routing scheme) and compared its streamflow simulation results with the corresponding simulations from the individual global precipitation and reference datasets. We concluded that the proposed technique could generate realizations that successfully encapsulate the reference precipitation and provide significant improvement in streamflow simulations, with reduction in systematic and random error on the order of 20-99 and 44-88 %, respectively, when considering the ensemble mean.

  14. A Novel Methodology for Charging Station Deployment

    NASA Astrophysics Data System (ADS)

    Sun, Zhonghao; Zhao, Yunwei; He, Yueying; Li, Mingzhe

    2018-02-01

    Lack of charging stations has been a main obstacle to the promotion of electric vehicles. This paper studies deploying charging stations in traffic networks considering grid constraints to balance the charging demand and grid stability. First, we propose a statistical model for charging demand. Then we combine the charging demand model with power grid constraints and give the formulation of the charging station deployment problem. Finally, we propose a theoretical solution for the problem by transforming it to a Markov Decision Process.

  15. A Framework for the Optimization of Discrete-Event Simulation Models

    NASA Technical Reports Server (NTRS)

    Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.

    1996-01-01

    With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.

  16. Model Update of a Micro Air Vehicle (MAV) Flexible Wing Frame with Uncertainty Quantification

    NASA Technical Reports Server (NTRS)

    Reaves, Mercedes C.; Horta, Lucas G.; Waszak, Martin R.; Morgan, Benjamin G.

    2004-01-01

    This paper describes a procedure to update parameters in the finite element model of a Micro Air Vehicle (MAV) to improve displacement predictions under aerodynamics loads. Because of fabrication, materials, and geometric uncertainties, a statistical approach combined with Multidisciplinary Design Optimization (MDO) is used to modify key model parameters. Static test data collected using photogrammetry are used to correlate with model predictions. Results show significant improvements in model predictions after parameters are updated; however, computed probabilities values indicate low confidence in updated values and/or model structure errors. Lessons learned in the areas of wing design, test procedures, modeling approaches with geometric nonlinearities, and uncertainties quantification are all documented.

  17. Optimization of Analytical Potentials for Coarse-Grained Biopolymer Models.

    PubMed

    Mereghetti, Paolo; Maccari, Giuseppe; Spampinato, Giulia Lia Beatrice; Tozzini, Valentina

    2016-08-25

    The increasing trend in the recent literature on coarse grained (CG) models testifies their impact in the study of complex systems. However, the CG model landscape is variegated: even considering a given resolution level, the force fields are very heterogeneous and optimized with very different parametrization procedures. Along the road for standardization of CG models for biopolymers, here we describe a strategy to aid building and optimization of statistics based analytical force fields and its implementation in the software package AsParaGS (Assisted Parameterization platform for coarse Grained modelS). Our method is based on the use and optimization of analytical potentials, optimized by targeting internal variables statistical distributions by means of the combination of different algorithms (i.e., relative entropy driven stochastic exploration of the parameter space and iterative Boltzmann inversion). This allows designing a custom model that endows the force field terms with a physically sound meaning. Furthermore, the level of transferability and accuracy can be tuned through the choice of statistical data set composition. The method-illustrated by means of applications to helical polypeptides-also involves the analysis of two and three variable distributions, and allows handling issues related to the FF term correlations. AsParaGS is interfaced with general-purpose molecular dynamics codes and currently implements the "minimalist" subclass of CG models (i.e., one bead per amino acid, Cα based). Extensions to nucleic acids and different levels of coarse graining are in the course.

  18. Hormone replacement therapy is associated with gastro-oesophageal reflux disease: a retrospective cohort study

    PubMed Central

    2012-01-01

    Background Oestrogen and progestogen have the potential to influence gastro-intestinal motility; both are key components of hormone replacement therapy (HRT). Results of observational studies in women taking HRT rely on self-reporting of gastro-oesophageal symptoms and the aetiology of gastro-oesophageal reflux disease (GORD) remains unclear. This study investigated the association between HRT and GORD in menopausal women using validated general practice records. Methods 51,182 menopausal women were identified using the UK General Practice Research Database between 1995–2004. Of these, 8,831 were matched with and without hormone use. Odds ratios (ORs) were calculated for GORD and proton-pump inhibitor (PPI) use in hormone and non-hormone users, adjusting for age, co-morbidities, and co-pharmacy. Results In unadjusted analysis, all forms of hormone use (oestrogen-only, tibolone, combined HRT and progestogen) were statistically significantly associated with GORD. In adjusted models, this association remained statistically significant for oestrogen-only treatment (OR 1.49; 1.18–1.89). Unadjusted analysis showed a statistically significant association between PPI use and oestrogen-only and combined HRT treatment. When adjusted for covariates, oestrogen-only treatment was significant (OR 1.34; 95% CI 1.03–1.74). Findings from the adjusted model demonstrated the greater use of PPI by progestogen users (OR 1.50; 1.01–2.22). Conclusions This first large cohort study of the association between GORD and HRT found a statistically significant association between oestrogen-only hormone and GORD and PPI use. This should be further investigated using prospective follow-up to validate the strength of association and describe its clinical significance. PMID:22642788

  19. Assessment of Coastal and Urban Flooding Hazards Applying Extreme Value Analysis and Multivariate Statistical Techniques: A Case Study in Elwood, Australia

    NASA Astrophysics Data System (ADS)

    Guimarães Nobre, Gabriela; Arnbjerg-Nielsen, Karsten; Rosbjerg, Dan; Madsen, Henrik

    2016-04-01

    Traditionally, flood risk assessment studies have been carried out from a univariate frequency analysis perspective. However, statistical dependence between hydrological variables, such as extreme rainfall and extreme sea surge, is plausible to exist, since both variables to some extent are driven by common meteorological conditions. Aiming to overcome this limitation, multivariate statistical techniques has the potential to combine different sources of flooding in the investigation. The aim of this study was to apply a range of statistical methodologies for analyzing combined extreme hydrological variables that can lead to coastal and urban flooding. The study area is the Elwood Catchment, which is a highly urbanized catchment located in the city of Port Phillip, Melbourne, Australia. The first part of the investigation dealt with the marginal extreme value distributions. Two approaches to extract extreme value series were applied (Annual Maximum and Partial Duration Series), and different probability distribution functions were fit to the observed sample. Results obtained by using the Generalized Pareto distribution demonstrate the ability of the Pareto family to model the extreme events. Advancing into multivariate extreme value analysis, first an investigation regarding the asymptotic properties of extremal dependence was carried out. As a weak positive asymptotic dependence between the bivariate extreme pairs was found, the Conditional method proposed by Heffernan and Tawn (2004) was chosen. This approach is suitable to model bivariate extreme values, which are relatively unlikely to occur together. The results show that the probability of an extreme sea surge occurring during a one-hour intensity extreme precipitation event (or vice versa) can be twice as great as what would occur when assuming independent events. Therefore, presuming independence between these two variables would result in severe underestimation of the flooding risk in the study area.

  20. Linear and Order Statistics Combiners for Pattern Classification

    NASA Technical Reports Server (NTRS)

    Tumer, Kagan; Ghosh, Joydeep; Lau, Sonie (Technical Monitor)

    2001-01-01

    Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the 'added' error. If N unbiased classifiers are combined by simple averaging. the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the i-th order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.

  1. Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

    NASA Astrophysics Data System (ADS)

    Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

    2014-09-01

    Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.

  2. Predictive modeling of hazardous waste landfill total above-ground biomass using passive optical and LIDAR remotely sensed data

    NASA Astrophysics Data System (ADS)

    Hadley, Brian Christopher

    This dissertation assessed remotely sensed data and geospatial modeling technique(s) to map the spatial distribution of total above-ground biomass present on the surface of the Savannah River National Laboratory's (SRNL) Mixed Waste Management Facility (MWMF) hazardous waste landfill. Ordinary least squares (OLS) regression, regression kriging, and tree-structured regression were employed to model the empirical relationship between in-situ measured Bahia (Paspalum notatum Flugge) and Centipede [Eremochloa ophiuroides (Munro) Hack.] grass biomass against an assortment of explanatory variables extracted from fine spatial resolution passive optical and LIDAR remotely sensed data. Explanatory variables included: (1) discrete channels of visible, near-infrared (NIR), and short-wave infrared (SWIR) reflectance, (2) spectral vegetation indices (SVI), (3) spectral mixture analysis (SMA) modeled fractions, (4) narrow-band derivative-based vegetation indices, and (5) LIDAR derived topographic variables (i.e. elevation, slope, and aspect). Results showed that a linear combination of the first- (1DZ_DGVI), second- (2DZ_DGVI), and third-derivative of green vegetation indices (3DZ_DGVI) calculated from hyperspectral data recorded over the 400--960 nm wavelengths of the electromagnetic spectrum explained the largest percentage of statistical variation (R2 = 0.5184) in the total above-ground biomass measurements. In general, the topographic variables did not correlate well with the MWMF biomass data, accounting for less than five percent of the statistical variation. It was concluded that tree-structured regression represented the optimum geospatial modeling technique due to a combination of model performance and efficiency/flexibility factors.

  3. Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

    NASA Astrophysics Data System (ADS)

    Babcock, Chad; Finley, Andrew O.; Andersen, Hans-Erik; Pattison, Robert; Cook, Bruce D.; Morton, Douglas C.; Alonzo, Michael; Nelson, Ross; Gregoire, Timothy; Ene, Liviu; Gobakken, Terje; Næsset, Erik

    2018-06-01

    The goal of this research was to develop and examine the performance of a geostatistical coregionalization modeling approach for combining field inventory measurements, strip samples of airborne lidar and Landsat-based remote sensing data products to predict aboveground biomass (AGB) in interior Alaska's Tanana Valley. The proposed modeling strategy facilitates pixel-level mapping of AGB density predictions across the entire spatial domain. Additionally, the coregionalization framework allows for statistically sound estimation of total AGB for arbitrary areal units within the study area---a key advance to support diverse management objectives in interior Alaska. This research focuses on appropriate characterization of prediction uncertainty in the form of posterior predictive coverage intervals and standard deviations. Using the framework detailed here, it is possible to quantify estimation uncertainty for any spatial extent, ranging from pixel-level predictions of AGB density to estimates of AGB stocks for the full domain. The lidar-informed coregionalization models consistently outperformed their counterpart lidar-free models in terms of point-level predictive performance and total AGB precision. Additionally, the inclusion of Landsat-derived forest cover as a covariate further improved estimation precision in regions with lower lidar sampling intensity. Our findings also demonstrate that model-based approaches that do not explicitly account for residual spatial dependence can grossly underestimate uncertainty, resulting in falsely precise estimates of AGB. On the other hand, in a geostatistical setting, residual spatial structure can be modeled within a Bayesian hierarchical framework to obtain statistically defensible assessments of uncertainty for AGB estimates.

  4. Power Enhancement in High Dimensional Cross-Sectional Tests

    PubMed Central

    Fan, Jianqing; Liao, Yuan; Yao, Jiawei

    2016-01-01

    We propose a novel technique to boost the power of testing a high-dimensional vector H : θ = 0 against sparse alternatives where the null hypothesis is violated only by a couple of components. Existing tests based on quadratic forms such as the Wald statistic often suffer from low powers due to the accumulation of errors in estimating high-dimensional parameters. More powerful tests for sparse alternatives such as thresholding and extreme-value tests, on the other hand, require either stringent conditions or bootstrap to derive the null distribution and often suffer from size distortions due to the slow convergence. Based on a screening technique, we introduce a “power enhancement component”, which is zero under the null hypothesis with high probability, but diverges quickly under sparse alternatives. The proposed test statistic combines the power enhancement component with an asymptotically pivotal statistic, and strengthens the power under sparse alternatives. The null distribution does not require stringent regularity conditions, and is completely determined by that of the pivotal statistic. As specific applications, the proposed methods are applied to testing the factor pricing models and validating the cross-sectional independence in panel data models. PMID:26778846

  5. Analysis of the interannual variability of tropical cyclones striking the California coast based on statistical downscaling

    NASA Astrophysics Data System (ADS)

    Mendez, F. J.; Rueda, A.; Barnard, P.; Mori, N.; Nakajo, S.; Espejo, A.; del Jesus, M.; Diez Sierra, J.; Cofino, A. S.; Camus, P.

    2016-02-01

    Hurricanes hitting California have a very low ocurrence probability due to typically cool ocean temperature and westward tracks. However, damages associated to these improbable events would be dramatic in Southern California and understanding the oceanographic and atmospheric drivers is of paramount importance for coastal risk management for present and future climates. A statistical analysis of the historical events is very difficult due to the limited resolution of atmospheric and oceanographic forcing data available. In this work, we propose a combination of: (a) statistical downscaling methods (Espejo et al, 2015); and (b) a synthetic stochastic tropical cyclone (TC) model (Nakajo et al, 2014). To build the statistical downscaling model, Y=f(X), we apply a combination of principal component analysis and the k-means classification algorithm to find representative patterns from a potential TC index derived from large-scale SST fields in Eastern Central Pacific (predictor X) and the associated tropical cyclone ocurrence (predictand Y). SST data comes from NOAA Extended Reconstructed SST V3b providing information from 1854 to 2013 on a 2.0 degree x 2.0 degree global grid. As data for the historical occurrence and paths of tropical cycloneas are scarce, we apply a stochastic TC model which is based on a Monte Carlo simulation of the joint distribution of track, minimum sea level pressure and translation speed of the historical events in the Eastern Central Pacific Ocean. Results will show the ability of the approach to explain seasonal-to-interannual variability of the predictor X, which is clearly related to El Niño Southern Oscillation. References Espejo, A., Méndez, F.J., Diez, J., Medina, R., Al-Yahyai, S. (2015) Seasonal probabilistic forecasting of tropical cyclone activity in the North Indian Ocean, Journal of Flood Risk Management, DOI: 10.1111/jfr3.12197 Nakajo, S., N. Mori, T. Yasuda, and H. Mase (2014) Global Stochastic Tropical Cyclone Model Based on Principal Component Analysis and Cluster Analysis, Journal of Applied Meteorology and Climatology, DOI: 10.1175/JAMC-D-13-08.1

  6. A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study.

    PubMed

    AbdelRahman, Samir E; Zhang, Mingyuan; Bray, Bruce E; Kawamoto, Kensaku

    2014-05-27

    The aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time. Our analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak. The analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%. The proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.

  7. Optimization of an electromagnetic linear actuator using a network and a finite element model

    NASA Astrophysics Data System (ADS)

    Neubert, Holger; Kamusella, Alfred; Lienig, Jens

    2011-03-01

    Model based design optimization leads to robust solutions only if the statistical deviations of design, load and ambient parameters from nominal values are considered. We describe an optimization methodology that involves these deviations as stochastic variables for an exemplary electromagnetic actuator used to drive a Braille printer. A combined model simulates the dynamic behavior of the actuator and its non-linear load. It consists of a dynamic network model and a stationary magnetic finite element (FE) model. The network model utilizes lookup tables of the magnetic force and the flux linkage computed by the FE model. After a sensitivity analysis using design of experiment (DoE) methods and a nominal optimization based on gradient methods, a robust design optimization is performed. Selected design variables are involved in form of their density functions. In order to reduce the computational effort we use response surfaces instead of the combined system model obtained in all stochastic analysis steps. Thus, Monte-Carlo simulations can be applied. As a result we found an optimum system design meeting our requirements with regard to function and reliability.

  8. Identifying and characterizing hepatitis C virus hotspots in Massachusetts: a spatial epidemiological approach.

    PubMed

    Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H

    2017-04-20

    Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression analyses, and assessment of associations between HCV clustering and the built environment are needed to expand upon our combined spatial epidemiological and statistical methods.

  9. Kalman filter for statistical monitoring of forest cover across sub-continental regions

    Treesearch

    Raymond L. Czaplewski

    1991-01-01

    The Kalman filter is a multivariate generalization of the composite estimator which recursively combines a current direct estimate with a past estimate that is updated for expected change over time with a prediction model. The Kalman filter can estimate proportions of different cover types for sub-continental regions each year. A random sample of high-resolution...

  10. Modelling lake-water photochemistry: three-decade assessment of the steady-state concentration of photoreactive transients (·OH, CO3(-·) and (3)CDOM(∗)) in the surface water of polymictic Lake Peipsi (Estonia/Russia).

    PubMed

    Minella, Marco; De Laurentiis, Elisa; Buhvestova, Olga; Haldna, Marina; Kangur, Külli; Maurino, Valter; Minero, Claudio; Vione, Davide

    2013-03-01

    Over the last 3-4 decades, Lake Peipsi water (sampling site A, middle part of the lake, and site B, northern part) has experienced a statistically significant increase of bicarbonate, pH, chemical oxygen demand, nitrate (and nitrite in site B), due to combination of climate change and eutrophication. By photochemical modelling, we predicted a statistically significant decrease of radicals ·OH and CO3(-·) (site A, by 45% and 35%, respectively) and an increase of triplet states of chromophoric dissolved organic matter ((3)CDOM(∗); site B, by ∼25%). These species are involved in pollutant degradation, but formation of harmful by-products is more likely with (3)CDOM(∗) than with ·OH. Therefore, the photochemical self-cleansing ability of Lake Peipsi probably decreased with time, due to combined effects of climate change and eutrophication. In different environments (e.g. Lake Maggiore, NW Italy), ecosystem restoration policies had the additional advantage of enhancing sunlight-driven detoxification, suggesting that photochemical self-cleansing would be positively correlated with lake water quality. Copyright © 2012 Elsevier Ltd. All rights reserved.

  11. Burglar Target Selection

    PubMed Central

    Townsley, Michael; Bernasco, Wim; Ruiter, Stijn; Johnson, Shane D.; White, Gentry; Baum, Scott

    2015-01-01

    Objectives: This study builds on research undertaken by Bernasco and Nieuwbeerta and explores the generalizability of a theoretically derived offender target selection model in three cross-national study regions. Methods: Taking a discrete spatial choice approach, we estimate the impact of both environment- and offender-level factors on residential burglary placement in the Netherlands, the United Kingdom, and Australia. Combining cleared burglary data from all study regions in a single statistical model, we make statistical comparisons between environments. Results: In all three study regions, the likelihood an offender selects an area for burglary is positively influenced by proximity to their home, the proportion of easily accessible targets, and the total number of targets available. Furthermore, in two of the three study regions, juvenile offenders under the legal driving age are significantly more influenced by target proximity than adult offenders. Post hoc tests indicate the magnitudes of these impacts vary significantly between study regions. Conclusions: While burglary target selection strategies are consistent with opportunity-based explanations of offending, the impact of environmental context is significant. As such, the approach undertaken in combining observations from multiple study regions may aid criminology scholars in assessing the generalizability of observed findings across multiple environments. PMID:25866418

  12. SD-MSAEs: Promoter recognition in human genome based on deep feature extraction.

    PubMed

    Xu, Wenxuan; Zhang, Li; Lu, Yaping

    2016-06-01

    The prediction and recognition of promoter in human genome play an important role in DNA sequence analysis. Entropy, in Shannon sense, of information theory is a multiple utility in bioinformatic details analysis. The relative entropy estimator methods based on statistical divergence (SD) are used to extract meaningful features to distinguish different regions of DNA sequences. In this paper, we choose context feature and use a set of methods of SD to select the most effective n-mers distinguishing promoter regions from other DNA regions in human genome. Extracted from the total possible combinations of n-mers, we can get four sparse distributions based on promoter and non-promoters training samples. The informative n-mers are selected by optimizing the differentiating extents of these distributions. Specially, we combine the advantage of statistical divergence and multiple sparse auto-encoders (MSAEs) in deep learning to extract deep feature for promoter recognition. And then we apply multiple SVMs and a decision model to construct a human promoter recognition method called SD-MSAEs. Framework is flexible that it can integrate new feature extraction or new classification models freely. Experimental results show that our method has high sensitivity and specificity. Copyright © 2016 Elsevier Inc. All rights reserved.

  13. Localized Smart-Interpretation

    NASA Astrophysics Data System (ADS)

    Lundh Gulbrandsen, Mats; Mejer Hansen, Thomas; Bach, Torben; Pallesen, Tom

    2014-05-01

    The complex task of setting up a geological model consists not only of combining available geological information into a conceptual plausible model, but also requires consistency with availably data, e.g. geophysical data. However, in many cases the direct geological information, e.g borehole samples, are very sparse, so in order to create a geological model, the geologist needs to rely on the geophysical data. The problem is however, that the amount of geophysical data in many cases are so vast that it is practically impossible to integrate all of them in the manual interpretation process. This means that a lot of the information available from the geophysical surveys are unexploited, which is a problem, due to the fact that the resulting geological model does not fulfill its full potential and hence are less trustworthy. We suggest an approach to geological modeling that 1. allow all geophysical data to be considered when building the geological model 2. is fast 3. allow quantification of geological modeling. The method is constructed to build a statistical model, f(d,m), describing the relation between what the geologists interpret, d, and what the geologist knows, m. The para- meter m reflects any available information that can be quantified, such as geophysical data, the result of a geophysical inversion, elevation maps, etc... The parameter d reflects an actual interpretation, such as for example the depth to the base of a ground water reservoir. First we infer a statistical model f(d,m), by examining sets of actual interpretations made by a geological expert, [d1, d2, ...], and the information used to perform the interpretation; [m1, m2, ...]. This makes it possible to quantify how the geological expert performs interpolation through f(d,m). As the geological expert proceeds interpreting, the number of interpreted datapoints from which the statistical model is inferred increases, and therefore the accuracy of the statistical model increases. When a model f(d,m) successfully has been inferred, we are able to simulate how the geological expert would perform an interpretation given some external information m, through f(d|m). We will demonstrate this method applied on geological interpretation and densely sampled airborne electromagnetic data. In short, our goal is to build a statistical model describing how a geological expert performs geological interpretation given some geophysical data. We then wish to use this statistical model to perform semi automatic interpretation, everywhere where such geophysical data exist, in a manner consistent with the choices made by a geological expert. Benefits of such a statistical model are that 1. it provides a quantification of how a geological expert performs interpretation based on available diverse data 2. all available geophysical information can be used 3. it allows much faster interpretation of large data sets.

  14. Fast Identification of Biological Pathways Associated with a Quantitative Trait Using Group Lasso with Overlaps

    PubMed Central

    Silver, Matt; Montana, Giovanni

    2012-01-01

    Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways. We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our “pathways group lasso with adaptive weights” (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets. In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small. PMID:22499682

  15. An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples. Volume 2: Software documentation

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes, These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.

  16. An improved approach for flight readiness certification: Methodology for failure risk assessment and application examples, volume 1

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with engineering analysis to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in engineering analyses of failure phenomena, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which engineering analysis models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. Conventional engineering analysis models currently employed for design of failure prediction are used in this methodology. The PFA methodology is described and examples of its application are presented. Conventional approaches to failure risk evaluation for spaceflight systems are discussed, and the rationale for the approach taken in the PFA methodology is presented. The statistical methods, engineering models, and computer software used in fatigue failure mode applications are thoroughly documented.

  17. Reentry survivability modeling

    NASA Astrophysics Data System (ADS)

    Fudge, Michael L.; Maher, Robert L.

    1997-10-01

    Statistical methods for expressing the impact risk posed to space systems in general [and the International Space Station (ISS) in particular] by other resident space objects have been examined. One of the findings of this investigation is that there are legitimate physical modeling reasons for the common statistical expression of the collision risk. A combination of statistical methods and physical modeling is also used to express the impact risk posed by re-entering space systems to objects of interest (e.g., people and property) on Earth. One of the largest uncertainties in the expressing of this risk is the estimation of survivable material which survives reentry to impact Earth's surface. This point was recently demonstrated in dramatic fashion by the impact of an intact expendable launch vehicle (ELV) upper stage near a private residence in the continental United States. Since approximately half of the missions supporting ISS will utilize ELVs, it is appropriate to examine the methods used to estimate the amount and physical characteristics of ELV debris surviving reentry to impact Earth's surface. This paper examines reentry survivability estimation methodology, including the specific methodology used by Caiman Sciences' 'Survive' model. Comparison between empirical results (observations of objects which have been recovered on Earth after surviving reentry) and Survive estimates are presented for selected upper stage or spacecraft components and a Delta launch vehicle second stage.

  18. The Interaction of TXNIP and AFq1 Genes Increases the Susceptibility of Schizophrenia.

    PubMed

    Su, Yousong; Ding, Wenhua; Xing, Mengjuan; Qi, Dake; Li, Zezhi; Cui, Donghong

    2017-08-01

    Although previous studies showed the reduced risk of cancer in patients with schizophrenia, whether patients with schizophrenia possess genetic factors that also contribute to tumor suppressor is still unknown. In the present study, based on our previous microarray data, we focused on the tumor suppressor genes TXNIP and AF1q, which differentially expressed in patients with schizophrenia. A total of 413 patients and 578 healthy controls were recruited. We found no significant differences in genotype, allele, or haplotype frequencies at the selected five single nucleotide polymorphisms (SNPs) (rs2236566 and rs7211 in TXNIP gene; rs10749659, rs2140709, and rs3738481 in AF1q gene) between patients with schizophrenia and controls. However, we found the association between the interaction of TXNIP and AF1q with schizophrenia by using the MDR method followed by traditional statistical analysis. The best gene-gene interaction model identified was a three-locus model TXNIP (rs2236566, rs7211)-AF1q (rs2140709). After traditional statistical analysis, we found the high-risk genotype combination was rs2236566 (GG)-rs7211(CC)-rs2140709(CC) (OR = 1.35 [1.03-1.76]). The low-risk genotype combination was rs2236566 (GT)-rs7211(CC)-rs2140709(CC) (OR = 0.67 [0.49-0.91]). Our finding suggested statistically significant role of interaction of TXNIP and AF1q polymorphisms (TXNIP-rs2236566, TXNIP-rs7211, and AF1q-rs2769605) in schizophrenia susceptibility.

  19. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    USGS Publications Warehouse

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.

  20. Scaling laws and fluctuations in the statistics of word frequencies

    NASA Astrophysics Data System (ADS)

    Gerlach, Martin; Altmann, Eduardo G.

    2014-11-01

    In this paper, we combine statistical analysis of written texts and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. The average vocabulary of an ensemble of fixed-length texts is known to scale sublinearly with the total number of words (Heaps’ law). Analyzing the fluctuations around this average in three large databases (Google-ngram, English Wikipedia, and a collection of scientific articles), we find that the standard deviation scales linearly with the average (Taylor's law), in contrast to the prediction of decaying fluctuations obtained using simple sampling arguments. We explain both scaling laws (Heaps’ and Taylor) by modeling the usage of words using a Poisson process with a fat-tailed distribution of word frequencies (Zipf's law) and topic-dependent frequencies of individual words (as in topic models). Considering topical variations lead to quenched averages, turn the vocabulary size a non-self-averaging quantity, and explain the empirical observations. For the numerous practical applications relying on estimations of vocabulary size, our results show that uncertainties remain large even for long texts. We show how to account for these uncertainties in measurements of lexical richness of texts with different lengths.

  1. Southeast Atlantic Cloud Properties in a Multivariate Statistical Model - How Relevant is Air Mass History for Local Cloud Properties?

    NASA Astrophysics Data System (ADS)

    Fuchs, Julia; Cermak, Jan; Andersen, Hendrik

    2017-04-01

    This study aims at untangling the impacts of external dynamics and local conditions on cloud properties in the Southeast Atlantic (SEA) by combining satellite and reanalysis data using multivariate statistics. The understanding of clouds and their determinants at different scales is important for constraining the Earth's radiative budget, and thus prominent in climate-system research. In this study, SEA stratocumulus cloud properties are observed not only as the result of local environmental conditions but also as affected by external dynamics and spatial origins of air masses entering the study area. In order to assess to what extent cloud properties are impacted by aerosol concentration, air mass history, and meteorology, a multivariate approach is conducted using satellite observations of aerosol and cloud properties (MODIS, SEVIRI), information on aerosol species composition (MACC) and meteorological context (ERA-Interim reanalysis). To account for the often-neglected but important role of air mass origin, information on air mass history based on HYSPLIT modeling is included in the statistical model. This multivariate approach is intended to lead to a better understanding of the physical processes behind observed stratocumulus cloud properties in the SEA.

  2. The effect of leverage and/or influential on structure-activity relationships.

    PubMed

    Bolboacă, Sorana D; Jäntschi, Lorentz

    2013-05-01

    In the spirit of reporting valid and reliable Quantitative Structure-Activity Relationship (QSAR) models, the aim of our research was to assess how the leverage (analysis with Hat matrix, h(i)) and the influential (analysis with Cook's distance, D(i)) of QSAR models may reflect the models reliability and their characteristics. The datasets included in this research were collected from previously published papers. Seven datasets which accomplished the imposed inclusion criteria were analyzed. Three models were obtained for each dataset (full-model, h(i)-model and D(i)-model) and several statistical validation criteria were applied to the models. In 5 out of 7 sets the correlation coefficient increased when compounds with either h(i) or D(i) higher than the threshold were removed. Withdrawn compounds varied from 2 to 4 for h(i)-models and from 1 to 13 for D(i)-models. Validation statistics showed that D(i)-models possess systematically better agreement than both full-models and h(i)-models. Removal of influential compounds from training set significantly improves the model and is recommended to be conducted in the process of quantitative structure-activity relationships developing. Cook's distance approach should be combined with hat matrix analysis in order to identify the compounds candidates for removal.

  3. Effect of variation of impression material combinations, dual arch tray types, and sequence of pour on the accuracy of working dies: “An in vitro study”

    PubMed Central

    Reddy, Nagam Raja; Reddy, Jakranpally Sathya; Padmaja, Bramha Josyula Indira; Reddy, Budigi Madan Mohan; Sunil, Motupalli; Reddy, Bommireddy Tejeswar

    2016-01-01

    Aims: To evaluate the accuracy of dies made from dual arch impressions using different sectional dual arch trays, combinations of elastomeric impression materials, and the sequence of pour of dies. Subjects and Methods: The dual arch impression materials were grouped into three groups depending on the combination of impression materials used and each group is subdivided into four subgroups. A sample size of 8 in each subgroup yielding a total 96 impressions will be made into three groups of 32 each (Group I, II, and III). Group I constitute impressions made using monophase (M) impression material, Group II constitute impressions made using combination of heavy body and light body (HL), and Group III constitute impressions made using combination of putty and light body (PL). Dies obtained were evaluated with a travelling microscope to measure the buccolingual width of the tooth at the margin by using the sharp corners of the notches as reference points. Statistical Analysis Used: Descriptive analysis namely mean and standard deviation, one-way analysis of variance test. Results: The results obtained in this study indicate that though not statistically significant, the metal dual arch trays performed better when compared to the plastic trays in reproducing die dimensions. Conclusions: From the results obtained, dies poured from combination of heavy body and light body impressions using plastic or metal dual arch trays showed least variation in bucco-lingual dimension from master model. PMID:27141172

  4. Testing effects of consumer richness, evenness and body size on ecosystem functioning.

    PubMed

    Reiss, Julia; Bailey, R A; Perkins, Daniel M; Pluchinotta, Angela; Woodward, Guy

    2011-11-01

    1. Numerous studies have revealed (usually positive) relationships between biodiversity and ecosystem functioning (B-EF), but the underpinning drivers are rarely addressed explicitly, hindering the development of a more predictive understanding. 2. We developed a suite of statistical models (where we combined existing models with novel ones) to test for richness and evenness effects on detrital processing in freshwater microcosms. Instead of using consumer species as biodiversity units, we used two size classes within three species (six types). This allowed us to test for diversity effects and also to focus on the role of body size and biomass. 3. Our statistical models tested for (i) whether performance in polyculture was more than the sum of its parts (non-additive effects), (ii) the effects of specific type combinations (assemblage identity effects) and (iii) whether types behaved differently when their absolute or relative abundances were altered (e.g. because type abundance in polyculture was lower compared with monoculture). The latter point meant we did not need additional density treatments. 4. Process rates were independent of richness and evenness and all types performed in an additive fashion. The performance of a type was mainly driven by the consumers' metabolic requirements (connected to body size). On an assemblage level, biomass explained a large proportion of detrital processing rates. 5. We conclude that B-EF studies would benefit from widening their statistical approaches. Further, they need to consider biomass of species assemblages and whether biomass is comprised of small or large individuals, because even if all species are present in the same biomass, small species (or individuals) will perform better. © 2011 The Authors. Journal of Animal Ecology © 2011 British Ecological Society.

  5. Quantity of dates trumps quality of dates for dense Bayesian radiocarbon sediment chronologies - Gas ion source 14C dating instructed by simultaneous Bayesian accumulation rate modeling

    NASA Astrophysics Data System (ADS)

    Rosenheim, B. E.; Firesinger, D.; Roberts, M. L.; Burton, J. R.; Khan, N.; Moyer, R. P.

    2016-12-01

    Radiocarbon (14C) sediment core chronologies benefit from a high density of dates, even when precision of individual dates is sacrificed. This is demonstrated by a combined approach of rapid 14C analysis of CO2 gas generated from carbonates and organic material coupled with Bayesian statistical modeling. Analysis of 14C is facilitated by the gas ion source on the Continuous Flow Accelerator Mass Spectrometry (CFAMS) system at the Woods Hole Oceanographic Institution's National Ocean Sciences Accelerator Mass Spectrometry facility. This instrument is capable of producing a 14C determination of +/- 100 14C y precision every 4-5 minutes, with limited sample handling (dissolution of carbonates and/or combustion of organic carbon in evacuated containers). Rapid analysis allows over-preparation of samples to include replicates at each depth and/or comparison of different sample types at particular depths in a sediment or peat core. Analysis priority is given to depths that have the least chronologic precision as determined by Bayesian modeling of the chronology of calibrated ages. Use of such a statistical approach to determine the order in which samples are run ensures that the chronology constantly improves so long as material is available for the analysis of chronologic weak points. Ultimately, accuracy of the chronology is determined by the material that is actually being dated, and our combined approach allows testing of different constituents of the organic carbon pool and the carbonate minerals within a core. We will present preliminary results from a deep-sea sediment core abundant in deep-sea foraminifera as well as coastal wetland peat cores to demonstrate statistical improvements in sediment- and peat-core chronologies obtained by increasing the quantity and decreasing the quality of individual dates.

  6. Modeling exposure–lag–response associations with distributed lag non-linear models

    PubMed Central

    Gasparrini, Antonio

    2014-01-01

    In biomedical research, a health effect is frequently associated with protracted exposures of varying intensity sustained in the past. The main complexity of modeling and interpreting such phenomena lies in the additional temporal dimension needed to express the association, as the risk depends on both intensity and timing of past exposures. This type of dependency is defined here as exposure–lag–response association. In this contribution, I illustrate a general statistical framework for such associations, established through the extension of distributed lag non-linear models, originally developed in time series analysis. This modeling class is based on the definition of a cross-basis, obtained by the combination of two functions to flexibly model linear or nonlinear exposure-responses and the lag structure of the relationship, respectively. The methodology is illustrated with an example application to cohort data and validated through a simulation study. This modeling framework generalizes to various study designs and regression models, and can be applied to study the health effects of protracted exposures to environmental factors, drugs or carcinogenic agents, among others. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24027094

  7. KECSA-Movable Type Implicit Solvation Model (KMTISM)

    PubMed Central

    2015-01-01

    Computation of the solvation free energy for chemical and biological processes has long been of significant interest. The key challenges to effective solvation modeling center on the choice of potential function and configurational sampling. Herein, an energy sampling approach termed the “Movable Type” (MT) method, and a statistical energy function for solvation modeling, “Knowledge-based and Empirical Combined Scoring Algorithm” (KECSA) are developed and utilized to create an implicit solvation model: KECSA-Movable Type Implicit Solvation Model (KMTISM) suitable for the study of chemical and biological systems. KMTISM is an implicit solvation model, but the MT method performs energy sampling at the atom pairwise level. For a specific molecular system, the MT method collects energies from prebuilt databases for the requisite atom pairs at all relevant distance ranges, which by its very construction encodes all possible molecular configurations simultaneously. Unlike traditional statistical energy functions, KECSA converts structural statistical information into categorized atom pairwise interaction energies as a function of the radial distance instead of a mean force energy function. Within the implicit solvent model approximation, aqueous solvation free energies are then obtained from the NVT ensemble partition function generated by the MT method. Validation is performed against several subsets selected from the Minnesota Solvation Database v2012. Results are compared with several solvation free energy calculation methods, including a one-to-one comparison against two commonly used classical implicit solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum mechanics based polarizable continuum model is also discussed (Cramer and Truhlar’s Solvation Model 12). PMID:25691832

  8. An Improved Shock Model for Bare and Covered Explosives

    NASA Astrophysics Data System (ADS)

    Scholtes, Gert; Bouma, Richard

    2017-06-01

    TNO developed a toolbox to estimate the probability of a violent event on a ship or other platform, when the munition bunker is hit by e.g. a bullet or fragment from a missile attack. To obtain the proper statistical output, several millions of calculations are needed to obtain a reliable estimate. Because millions of different scenarios have to be calculated, hydrocode calculations cannot be used for this type of application, but a fast and good engineering solutions is needed. At this moment the Haskins and Cook-model is used for this purpose. To obtain a better estimate for covered explosives and munitions, TNO has developed a new model which is a combination of the shock wave model at high pressure, as described by Haskins and Cook, in combination with the expanding shock wave model of Green. This combined model gives a better fit with the experimental values for explosives response calculations, using the same critical energy fluence values for covered as well as for bare explosives. In this paper the theory is explained and results of the calculations for several bare and covered explosives will be presented. To show this, the results will be compared with the experimental values from literature for composition B, Composition B-3 and PBX-9404.

  9. Combined proportional and additive residual error models in population pharmacokinetic modelling.

    PubMed

    Proost, Johannes H

    2017-11-15

    In pharmacokinetic modelling, a combined proportional and additive residual error model is often preferred over a proportional or additive residual error model. Different approaches have been proposed, but a comparison between approaches is still lacking. The theoretical background of the methods is described. Method VAR assumes that the variance of the residual error is the sum of the statistically independent proportional and additive components; this method can be coded in three ways. Method SD assumes that the standard deviation of the residual error is the sum of the proportional and additive components. Using datasets from literature and simulations based on these datasets, the methods are compared using NONMEM. The different coding of methods VAR yield identical results. Using method SD, the values of the parameters describing residual error are lower than for method VAR, but the values of the structural parameters and their inter-individual variability are hardly affected by the choice of the method. Both methods are valid approaches in combined proportional and additive residual error modelling, and selection may be based on OFV. When the result of an analysis is used for simulation purposes, it is essential that the simulation tool uses the same method as used during analysis. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Silica exposure during construction activities: statistical modeling of task-based measurements from the literature.

    PubMed

    Sauvé, Jean-François; Beaudry, Charles; Bégin, Denis; Dion, Chantal; Gérin, Michel; Lavoué, Jérôme

    2013-05-01

    Many construction activities can put workers at risk of breathing silica containing dusts, and there is an important body of literature documenting exposure levels using a task-based strategy. In this study, statistical modeling was used to analyze a data set containing 1466 task-based, personal respirable crystalline silica (RCS) measurements gathered from 46 sources to estimate exposure levels during construction tasks and the effects of determinants of exposure. Monte-Carlo simulation was used to recreate individual exposures from summary parameters, and the statistical modeling involved multimodel inference with Tobit models containing combinations of the following exposure variables: sampling year, sampling duration, construction sector, project type, workspace, ventilation, and controls. Exposure levels by task were predicted based on the median reported duration by activity, the year 1998, absence of source control methods, and an equal distribution of the other determinants of exposure. The model containing all the variables explained 60% of the variability and was identified as the best approximating model. Of the 27 tasks contained in the data set, abrasive blasting, masonry chipping, scabbling concrete, tuck pointing, and tunnel boring had estimated geometric means above 0.1mg m(-3) based on the exposure scenario developed. Water-fed tools and local exhaust ventilation were associated with a reduction of 71 and 69% in exposure levels compared with no controls, respectively. The predictive model developed can be used to estimate RCS concentrations for many construction activities in a wide range of circumstances.

  11. Estimating preferential flow in karstic aquifers using statistical mixed models.

    PubMed

    Anaya, Angel A; Padilla, Ingrid; Macchiavelli, Raul; Vesper, Dorothy J; Meeker, John D; Alshawabkeh, Akram N

    2014-01-01

    Karst aquifers are highly productive groundwater systems often associated with conduit flow. These systems can be highly vulnerable to contamination, resulting in a high potential for contaminant exposure to humans and ecosystems. This work develops statistical models to spatially characterize flow and transport patterns in karstified limestone and determines the effect of aquifer flow rates on these patterns. A laboratory-scale Geo-HydroBed model is used to simulate flow and transport processes in a karstic limestone unit. The model consists of stainless steel tanks containing a karstified limestone block collected from a karst aquifer formation in northern Puerto Rico. Experimental work involves making a series of flow and tracer injections, while monitoring hydraulic and tracer response spatially and temporally. Statistical mixed models (SMMs) are applied to hydraulic data to determine likely pathways of preferential flow in the limestone units. The models indicate a highly heterogeneous system with dominant, flow-dependent preferential flow regions. Results indicate that regions of preferential flow tend to expand at higher groundwater flow rates, suggesting a greater volume of the system being flushed by flowing water at higher rates. Spatial and temporal distribution of tracer concentrations indicates the presence of conduit-like and diffuse flow transport in the system, supporting the notion of both combined transport mechanisms in the limestone unit. The temporal response of tracer concentrations at different locations in the model coincide with, and confirms the preferential flow distribution generated with the SMMs used in the study. © 2013, National Ground Water Association.

  12. A multibody knee model with discrete cartilage prediction of tibio-femoral contact mechanics.

    PubMed

    Guess, Trent M; Liu, Hongzeng; Bhashyam, Sampath; Thiagarajan, Ganesh

    2013-01-01

    Combining musculoskeletal simulations with anatomical joint models capable of predicting cartilage contact mechanics would provide a valuable tool for studying the relationships between muscle force and cartilage loading. As a step towards producing multibody musculoskeletal models that include representation of cartilage tissue mechanics, this research developed a subject-specific multibody knee model that represented the tibia plateau cartilage as discrete rigid bodies that interacted with the femur through deformable contacts. Parameters for the compliant contact law were derived using three methods: (1) simplified Hertzian contact theory, (2) simplified elastic foundation contact theory and (3) parameter optimisation from a finite element (FE) solution. The contact parameters and contact friction were evaluated during a simulated walk in a virtual dynamic knee simulator, and the resulting kinematics were compared with measured in vitro kinematics. The effects on predicted contact pressures and cartilage-bone interface shear forces during the simulated walk were also evaluated. The compliant contact stiffness parameters had a statistically significant effect on predicted contact pressures as well as all tibio-femoral motions except flexion-extension. The contact friction was not statistically significant to contact pressures, but was statistically significant to medial-lateral translation and all rotations except flexion-extension. The magnitude of kinematic differences between model formulations was relatively small, but contact pressure predictions were sensitive to model formulation. The developed multibody knee model was computationally efficient and had a computation time 283 times faster than a FE simulation using the same geometries and boundary conditions.

  13. Mediation Analysis with Survival Outcomes: Accelerated Failure Time vs. Proportional Hazards Models.

    PubMed

    Gelfand, Lois A; MacKinnon, David P; DeRubeis, Robert J; Baraldi, Amanda N

    2016-01-01

    Survival time is an important type of outcome variable in treatment research. Currently, limited guidance is available regarding performing mediation analyses with survival outcomes, which generally do not have normally distributed errors, and contain unobserved (censored) events. We present considerations for choosing an approach, using a comparison of semi-parametric proportional hazards (PH) and fully parametric accelerated failure time (AFT) approaches for illustration. We compare PH and AFT models and procedures in their integration into mediation models and review their ability to produce coefficients that estimate causal effects. Using simulation studies modeling Weibull-distributed survival times, we compare statistical properties of mediation analyses incorporating PH and AFT approaches (employing SAS procedures PHREG and LIFEREG, respectively) under varied data conditions, some including censoring. A simulated data set illustrates the findings. AFT models integrate more easily than PH models into mediation models. Furthermore, mediation analyses incorporating LIFEREG produce coefficients that can estimate causal effects, and demonstrate superior statistical properties. Censoring introduces bias in the coefficient estimate representing the treatment effect on outcome-underestimation in LIFEREG, and overestimation in PHREG. With LIFEREG, this bias can be addressed using an alternative estimate obtained from combining other coefficients, whereas this is not possible with PHREG. When Weibull assumptions are not violated, there are compelling advantages to using LIFEREG over PHREG for mediation analyses involving survival-time outcomes. Irrespective of the procedures used, the interpretation of coefficients, effects of censoring on coefficient estimates, and statistical properties should be taken into account when reporting results.

  14. A statistical method for predicting seizure onset zones from human single-neuron recordings

    NASA Astrophysics Data System (ADS)

    Valdez, André B.; Hickman, Erin N.; Treiman, David M.; Smith, Kris A.; Steinmetz, Peter N.

    2013-02-01

    Objective. Clinicians often use depth-electrode recordings to localize human epileptogenic foci. To advance the diagnostic value of these recordings, we applied logistic regression models to single-neuron recordings from depth-electrode microwires to predict seizure onset zones (SOZs). Approach. We collected data from 17 epilepsy patients at the Barrow Neurological Institute and developed logistic regression models to calculate the odds of observing SOZs in the hippocampus, amygdala and ventromedial prefrontal cortex, based on statistics such as the burst interspike interval (ISI). Main results. Analysis of these models showed that, for a single-unit increase in burst ISI ratio, the left hippocampus was approximately 12 times more likely to contain a SOZ; and the right amygdala, 14.5 times more likely. Our models were most accurate for the hippocampus bilaterally (at 85% average sensitivity), and performance was comparable with current diagnostics such as electroencephalography. Significance. Logistic regression models can be combined with single-neuron recording to predict likely SOZs in epilepsy patients being evaluated for resective surgery, providing an automated source of clinically useful information.

  15. An improved empirical dynamic control system model of global mean sea level rise and surface temperature change

    NASA Astrophysics Data System (ADS)

    Wu, Qing; Luu, Quang-Hung; Tkalich, Pavel; Chen, Ge

    2018-04-01

    Having great impacts on human lives, global warming and associated sea level rise are believed to be strongly linked to anthropogenic causes. Statistical approach offers a simple and yet conceptually verifiable combination of remotely connected climate variables and indices, including sea level and surface temperature. We propose an improved statistical reconstruction model based on the empirical dynamic control system by taking into account the climate variability and deriving parameters from Monte Carlo cross-validation random experiments. For the historic data from 1880 to 2001, we yielded higher correlation results compared to those from other dynamic empirical models. The averaged root mean square errors are reduced in both reconstructed fields, namely, the global mean surface temperature (by 24-37%) and the global mean sea level (by 5-25%). Our model is also more robust as it notably diminished the unstable problem associated with varying initial values. Such results suggest that the model not only enhances significantly the global mean reconstructions of temperature and sea level but also may have a potential to improve future projections.

  16. Polarization-correlation diagnostics and differentiation of cholelithiasis in patients with chronic cholecystitis combined with diabetes mellitus type 2

    NASA Astrophysics Data System (ADS)

    Marchuk, Yu F.; Fediv, O. I.; Ivashchuk, I. O.; Andriychuk, D. R.

    2011-09-01

    The principles of optical modeling of human bile polycrystalline structure are described. The main types of polycrystalline structures are detailed. It has been proposed and founded the scenarios of formation of bile microscopic images polarization structure in coherent radiation. The results of investigating the interrelation between statistical moments of the 1st-4th order are presented that characterize the coordinate distributions of intensity of laser images of bile smears of cholelithiasis patients in combination with other pathologies. The diagnostic criteria of the cholelithiasis nascency and its severity degree differentiation are determined.

  17. PV cells electrical parameters measurement

    NASA Astrophysics Data System (ADS)

    Cibira, Gabriel

    2017-12-01

    When measuring optical parameters of a photovoltaic silicon cell, precise results bring good electrical parameters estimation, applying well-known physical-mathematical models. Nevertheless, considerable re-combination phenomena might occur in both surface and intrinsic thin layers within novel materials. Moreover, rear contact surface parameters may influence close-area re-combination phenomena, too. Therefore, the only precise electrical measurement approach is to prove assumed cell electrical parameters. Based on theoretical approach with respect to experiments, this paper analyses problems within measurement procedures and equipment used for electrical parameters acquisition within a photovoltaic silicon cell, as a case study. Statistical appraisal quality is contributed.

  18. Polarization-correlation diagnostics and differentiation of cholelithiasis in patients with chronic cholecystitis combined with diabetes mellitus type 2

    NASA Astrophysics Data System (ADS)

    Marchuk, Yu F.; Fediv, O. I.; Ivashchuk, I. O.; Andriychuk, D. R.

    2012-01-01

    The principles of optical modeling of human bile polycrystalline structure are described. The main types of polycrystalline structures are detailed. It has been proposed and founded the scenarios of formation of bile microscopic images polarization structure in coherent radiation. The results of investigating the interrelation between statistical moments of the 1st-4th order are presented that characterize the coordinate distributions of intensity of laser images of bile smears of cholelithiasis patients in combination with other pathologies. The diagnostic criteria of the cholelithiasis nascency and its severity degree differentiation are determined.

  19. Evaluation of direct and indirect ethanol biomarkers using a likelihood ratio approach to identify chronic alcohol abusers for forensic purposes.

    PubMed

    Alladio, Eugenio; Martyna, Agnieszka; Salomone, Alberto; Pirro, Valentina; Vincenti, Marco; Zadora, Grzegorz

    2017-02-01

    The detection of direct ethanol metabolites, such as ethyl glucuronide (EtG) and fatty acid ethyl esters (FAEEs), in scalp hair is considered the optimal strategy to effectively recognize chronic alcohol misuses by means of specific cut-offs suggested by the Society of Hair Testing. However, several factors (e.g. hair treatments) may alter the correlation between alcohol intake and biomarkers concentrations, possibly introducing bias in the interpretative process and conclusions. 125 subjects with various drinking habits were subjected to blood and hair sampling to determine indirect (e.g. CDT) and direct alcohol biomarkers. The overall data were investigated using several multivariate statistical methods. A likelihood ratio (LR) approach was used for the first time to provide predictive models for the diagnosis of alcohol abuse, based on different combinations of direct and indirect alcohol biomarkers. LR strategies provide a more robust outcome than the plain comparison with cut-off values, where tiny changes in the analytical results can lead to dramatic divergence in the way they are interpreted. An LR model combining EtG and FAEEs hair concentrations proved to discriminate non-chronic from chronic consumers with ideal correct classification rates, whereas the contribution of indirect biomarkers proved to be negligible. Optimal results were observed using a novel approach that associates LR methods with multivariate statistics. In particular, the combination of LR approach with either Principal Component Analysis (PCA) or Linear Discriminant Analysis (LDA) proved successful in discriminating chronic from non-chronic alcohol drinkers. These LR models were subsequently tested on an independent dataset of 43 individuals, which confirmed their high efficiency. These models proved to be less prone to bias than EtG and FAEEs independently considered. In conclusion, LR models may represent an efficient strategy to sustain the diagnosis of chronic alcohol consumption and provide a suitable gradation to support the judgment. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  20. Targeted versus statistical approaches to selecting parameters for modelling sediment provenance

    NASA Astrophysics Data System (ADS)

    Laceby, J. Patrick

    2017-04-01

    One effective field-based approach to modelling sediment provenance is the source fingerprinting technique. Arguably, one of the most important steps for this approach is selecting the appropriate suite of parameters or fingerprints used to model source contributions. Accordingly, approaches to selecting parameters for sediment source fingerprinting will be reviewed. Thereafter, opportunities and limitations of these approaches and some future research directions will be presented. For properties to be effective tracers of sediment, they must discriminate between sources whilst behaving conservatively. Conservative behavior is characterized by constancy in sediment properties, where the properties of sediment sources remain constant, or at the very least, any variation in these properties should occur in a predictable and measurable way. Therefore, properties selected for sediment source fingerprinting should remain constant through sediment detachment, transportation and deposition processes, or vary in a predictable and measurable way. One approach to select conservative properties for sediment source fingerprinting is to identify targeted tracers, such as caesium-137, that provide specific source information (e.g. surface versus subsurface origins). A second approach is to use statistical tests to select an optimal suite of conservative properties capable of modelling sediment provenance. In general, statistical approaches use a combination of a discrimination (e.g. Kruskal Wallis H-test, Mann-Whitney U-test) and parameter selection statistics (e.g. Discriminant Function Analysis or Principle Component Analysis). The challenge is that modelling sediment provenance is often not straightforward and there is increasing debate in the literature surrounding the most appropriate approach to selecting elements for modelling. Moving forward, it would be beneficial if researchers test their results with multiple modelling approaches, artificial mixtures, and multiple lines of evidence to provide secondary support to their initial modelling results. Indeed, element selection can greatly impact modelling results and having multiple lines of evidence will help provide confidence when modelling sediment provenance.

  1. Statistical mechanics of broadcast channels using low-density parity-check codes.

    PubMed

    Nakamura, Kazutaka; Kabashima, Yoshiyuki; Morelos-Zaragoza, Robert; Saad, David

    2003-03-01

    We investigate the use of Gallager's low-density parity-check (LDPC) codes in a degraded broadcast channel, one of the fundamental models in network information theory. Combining linear codes is a standard technique in practical network communication schemes and is known to provide better performance than simple time sharing methods when algebraic codes are used. The statistical physics based analysis shows that the practical performance of the suggested method, achieved by employing the belief propagation algorithm, is superior to that of LDPC based time sharing codes while the best performance, when received transmissions are optimally decoded, is bounded by the time sharing limit.

  2. Probabilistic Forecasting of Surface Ozone with a Novel Statistical Approach

    NASA Technical Reports Server (NTRS)

    Balashov, Nikolay V.; Thompson, Anne M.; Young, George S.

    2017-01-01

    The recent change in the Environmental Protection Agency's surface ozone regulation, lowering the surface ozone daily maximum 8-h average (MDA8) exceedance threshold from 75 to 70 ppbv, poses significant challenges to U.S. air quality (AQ) forecasters responsible for ozone MDA8 forecasts. The forecasters, supplied by only a few AQ model products, end up relying heavily on self-developed tools. To help U.S. AQ forecasters, this study explores a surface ozone MDA8 forecasting tool that is based solely on statistical methods and standard meteorological variables from the numerical weather prediction (NWP) models. The model combines the self-organizing map (SOM), which is a clustering technique, with a step wise weighted quadratic regression using meteorological variables as predictors for ozone MDA8. The SOM method identifies different weather regimes, to distinguish between various modes of ozone variability, and groups them according to similarity. In this way, when a regression is developed for a specific regime, data from the other regimes are also used, with weights that are based on their similarity to this specific regime. This approach, regression in SOM (REGiS), yields a distinct model for each regime taking into account both the training cases for that regime and other similar training cases. To produce probabilistic MDA8 ozone forecasts, REGiS weighs and combines all of the developed regression models on the basis of the weather patterns predicted by an NWP model. REGiS is evaluated over the San Joaquin Valley in California and the northeastern plains of Colorado. The results suggest that the model performs best when trained and adjusted separately for an individual AQ station and its corresponding meteorological site.

  3. Sizing Up the Milky Way: A Bayesian Mixture Model Meta-analysis of Photometric Scale Length Measurements

    NASA Astrophysics Data System (ADS)

    Licquia, Timothy C.; Newman, Jeffrey A.

    2016-11-01

    The exponential scale length (L d ) of the Milky Way’s (MW’s) disk is a critical parameter for describing the global physical size of our Galaxy, important both for interpreting other Galactic measurements and helping us to understand how our Galaxy fits into extragalactic contexts. Unfortunately, current estimates span a wide range of values and are often statistically incompatible with one another. Here, we perform a Bayesian meta-analysis to determine an improved, aggregate estimate for L d , utilizing a mixture-model approach to account for the possibility that any one measurement has not properly accounted for all statistical or systematic errors. Within this machinery, we explore a variety of ways of modeling the nature of problematic measurements, and then employ a Bayesian model averaging technique to derive net posterior distributions that incorporate any model-selection uncertainty. Our meta-analysis combines 29 different (15 visible and 14 infrared) photometric measurements of L d available in the literature; these involve a broad assortment of observational data sets, MW models and assumptions, and methodologies, all tabulated herein. Analyzing the visible and infrared measurements separately yields estimates for L d of {2.71}-0.20+0.22 kpc and {2.51}-0.13+0.15 kpc, respectively, whereas considering them all combined yields 2.64 ± 0.13 kpc. The ratio between the visible and infrared scale lengths determined here is very similar to that measured in external spiral galaxies. We use these results to update the model of the Galactic disk from our previous work, constraining its stellar mass to be {4.8}-1.1+1.5× {10}10 M ⊙, and the MW’s total stellar mass to be {5.7}-1.1+1.5× {10}10 M ⊙.

  4. Uncertainty quantification for nuclear density functional theory and information content of new measurements.

    PubMed

    McDonnell, J D; Schunck, N; Higdon, D; Sarich, J; Wild, S M; Nazarewicz, W

    2015-03-27

    Statistical tools of uncertainty quantification can be used to assess the information content of measured observables with respect to present-day theoretical models, to estimate model errors and thereby improve predictive capability, to extrapolate beyond the regions reached by experiment, and to provide meaningful input to applications and planned measurements. To showcase new opportunities offered by such tools, we make a rigorous analysis of theoretical statistical uncertainties in nuclear density functional theory using Bayesian inference methods. By considering the recent mass measurements from the Canadian Penning Trap at Argonne National Laboratory, we demonstrate how the Bayesian analysis and a direct least-squares optimization, combined with high-performance computing, can be used to assess the information content of the new data with respect to a model based on the Skyrme energy density functional approach. Employing the posterior probability distribution computed with a Gaussian process emulator, we apply the Bayesian framework to propagate theoretical statistical uncertainties in predictions of nuclear masses, two-neutron dripline, and fission barriers. Overall, we find that the new mass measurements do not impose a constraint that is strong enough to lead to significant changes in the model parameters. The example discussed in this study sets the stage for quantifying and maximizing the impact of new measurements with respect to current modeling and guiding future experimental efforts, thus enhancing the experiment-theory cycle in the scientific method.

  5. Comparison of statistical methods for detection of serum lipid biomarkers for mesothelioma and asbestos exposure.

    PubMed

    Xu, Rengyi; Mesaros, Clementina; Weng, Liwei; Snyder, Nathaniel W; Vachani, Anil; Blair, Ian A; Hwang, Wei-Ting

    2017-07-01

    We compared three statistical methods in selecting a panel of serum lipid biomarkers for mesothelioma and asbestos exposure. Serum samples from mesothelioma, asbestos-exposed subjects and controls (40 per group) were analyzed. Three variable selection methods were considered: top-ranked predictors from univariate model, stepwise and least absolute shrinkage and selection operator. Crossed-validated area under the receiver operating characteristic curve was used to compare the prediction performance. Lipids with high crossed-validated area under the curve were identified. Lipid with mass-to-charge ratio of 372.31 was selected by all three methods comparing mesothelioma versus control. Lipids with mass-to-charge ratio of 1464.80 and 329.21 were selected by two models for asbestos exposure versus control. Different methods selected a similar set of serum lipids. Combining candidate biomarkers can improve prediction.

  6. A statistical rain attenuation prediction model with application to the advanced communication technology satellite project. 3: A stochastic rain fade control algorithm for satellite link power via non linear Markow filtering theory

    NASA Technical Reports Server (NTRS)

    Manning, Robert M.

    1991-01-01

    The dynamic and composite nature of propagation impairments that are incurred on Earth-space communications links at frequencies in and above 30/20 GHz Ka band, i.e., rain attenuation, cloud and/or clear air scintillation, etc., combined with the need to counter such degradations after the small link margins have been exceeded, necessitate the use of dynamic statistical identification and prediction processing of the fading signal in order to optimally estimate and predict the levels of each of the deleterious attenuation components. Such requirements are being met in NASA's Advanced Communications Technology Satellite (ACTS) Project by the implementation of optimal processing schemes derived through the use of the Rain Attenuation Prediction Model and nonlinear Markov filtering theory.

  7. EEG and MEG data analysis in SPM8.

    PubMed

    Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl

    2011-01-01

    SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools.

  8. Quantifying Power Grid Risk from Geomagnetic Storms

    NASA Astrophysics Data System (ADS)

    Homeier, N.; Wei, L. H.; Gannon, J. L.

    2012-12-01

    We are creating a statistical model of the geophysical environment that can be used to quantify the geomagnetic storm hazard to power grid infrastructure. Our model is developed using a database of surface electric fields for the continental United States during a set of historical geomagnetic storms. These electric fields are derived from the SUPERMAG compilation of worldwide magnetometer data and surface impedances from the United States Geological Survey. This electric field data can be combined with a power grid model to determine GICs per node and reactive MVARs at each minute during a storm. Using publicly available substation locations, we derive relative risk maps by location by combining magnetic latitude and ground conductivity. We also estimate the surface electric fields during the August 1972 geomagnetic storm that caused a telephone cable outage across the middle of the United States. This event produced the largest surface electric fields in the continental U.S. in at least the past 40 years.

  9. Optimal Design of Experiments by Combining Coarse and Fine Measurements

    NASA Astrophysics Data System (ADS)

    Lee, Alpha A.; Brenner, Michael P.; Colwell, Lucy J.

    2017-11-01

    In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.

  10. EEG and MEG Data Analysis in SPM8

    PubMed Central

    Litvak, Vladimir; Mattout, Jérémie; Kiebel, Stefan; Phillips, Christophe; Henson, Richard; Kilner, James; Barnes, Gareth; Oostenveld, Robert; Daunizeau, Jean; Flandin, Guillaume; Penny, Will; Friston, Karl

    2011-01-01

    SPM is a free and open source software written in MATLAB (The MathWorks, Inc.). In addition to standard M/EEG preprocessing, we presently offer three main analysis tools: (i) statistical analysis of scalp-maps, time-frequency images, and volumetric 3D source reconstruction images based on the general linear model, with correction for multiple comparisons using random field theory; (ii) Bayesian M/EEG source reconstruction, including support for group studies, simultaneous EEG and MEG, and fMRI priors; (iii) dynamic causal modelling (DCM), an approach combining neural modelling with data analysis for which there are several variants dealing with evoked responses, steady state responses (power spectra and cross-spectra), induced responses, and phase coupling. SPM8 is integrated with the FieldTrip toolbox , making it possible for users to combine a variety of standard analysis methods with new schemes implemented in SPM and build custom analysis tools using powerful graphical user interface (GUI) and batching tools. PMID:21437221

  11. Extended Kalman Filter framework for forecasting shoreline evolution

    USGS Publications Warehouse

    Long, Joseph; Plant, Nathaniel G.

    2012-01-01

    A shoreline change model incorporating both long- and short-term evolution is integrated into a data assimilation framework that uses sparse observations to generate an updated forecast of shoreline position and to estimate unobserved geophysical variables and model parameters. Application of the assimilation algorithm provides quantitative statistical estimates of combined model-data forecast uncertainty which is crucial for developing hazard vulnerability assessments, evaluation of prediction skill, and identifying future data collection needs. Significant attention is given to the estimation of four non-observable parameter values and separating two scales of shoreline evolution using only one observable morphological quantity (i.e. shoreline position).

  12. A unified physical model of Seebeck coefficient in amorphous oxide semiconductor thin-film transistors

    NASA Astrophysics Data System (ADS)

    Lu, Nianduan; Li, Ling; Sun, Pengxiao; Banerjee, Writam; Liu, Ming

    2014-09-01

    A unified physical model for Seebeck coefficient was presented based on the multiple-trapping and release theory for amorphous oxide semiconductor thin-film transistors. According to the proposed model, the Seebeck coefficient is attributed to the Fermi-Dirac statistics combined with the energy dependent trap density of states and the gate-voltage dependence of the quasi-Fermi level. The simulation results show that the gate voltage, energy disorder, and temperature dependent Seebeck coefficient can be well described. The calculation also shows a good agreement with the experimental data in amorphous In-Ga-Zn-O thin-film transistor.

  13. Reverse engineering systems models of regulation: discovery, prediction and mechanisms.

    PubMed

    Ashworth, Justin; Wurtmann, Elisabeth J; Baliga, Nitin S

    2012-08-01

    Biological systems can now be understood in comprehensive and quantitative detail using systems biology approaches. Putative genome-scale models can be built rapidly based upon biological inventories and strategic system-wide molecular measurements. Current models combine statistical associations, causative abstractions, and known molecular mechanisms to explain and predict quantitative and complex phenotypes. This top-down 'reverse engineering' approach generates useful organism-scale models despite noise and incompleteness in data and knowledge. Here we review and discuss the reverse engineering of biological systems using top-down data-driven approaches, in order to improve discovery, hypothesis generation, and the inference of biological properties. Copyright © 2011 Elsevier Ltd. All rights reserved.

  14. A General Model for Testing Mediation and Moderation Effects

    PubMed Central

    MacKinnon, David P.

    2010-01-01

    This paper describes methods for testing mediation and moderation effects in a dataset, both together and separately. Investigations of this kind are especially valuable in prevention research to obtain information on the process by which a program achieves its effects and whether the program is effective for subgroups of individuals. A general model that simultaneously estimates mediation and moderation effects is presented, and the utility of combining the effects into a single model is described. Possible effects of interest in the model are explained, as are statistical methods to assess these effects. The methods are further illustrated in a hypothetical prevention program example. PMID:19003535

  15. A new market risk model for cogeneration project financing---combined heat and power development without a power purchase agreement

    NASA Astrophysics Data System (ADS)

    Lockwood, Timothy A.

    Federal legislative changes in 2006 no longer entitle cogeneration project financings by law to receive the benefit of a power purchase agreement underwritten by an investment-grade investor-owned utility. Consequently, this research explored the need for a new market-risk model for future cogeneration and combined heat and power (CHP) project financing. CHP project investment represents a potentially enormous energy efficiency benefit through its application by reducing fossil fuel use up to 55% when compared to traditional energy generation, and concurrently eliminates constituent air emissions up to 50%, including global warming gases. As a supplemental approach to a comprehensive technical analysis, a quantitative multivariate modeling was also used to test the statistical validity and reliability of host facility energy demand and CHP supply ratios in predicting the economic performance of CHP project financing. The resulting analytical models, although not statistically reliable at this time, suggest a radically simplified CHP design method for future profitable CHP investments using four easily attainable energy ratios. This design method shows that financially successful CHP adoption occurs when the average system heat-to-power-ratio supply is less than or equal to the average host-convertible-energy-ratio, and when the average nominally-rated capacity is less than average host facility-load-factor demands. New CHP investments can play a role in solving the world-wide problem of accommodating growing energy demand while preserving our precious and irreplaceable air quality for future generations.

  16. Robust inference for group sequential trials.

    PubMed

    Ganju, Jitendra; Lin, Yunzhi; Zhou, Kefei

    2017-03-01

    For ethical reasons, group sequential trials were introduced to allow trials to stop early in the event of extreme results. Endpoints in such trials are usually mortality or irreversible morbidity. For a given endpoint, the norm is to use a single test statistic and to use that same statistic for each analysis. This approach is risky because the test statistic has to be specified before the study is unblinded, and there is loss in power if the assumptions that ensure optimality for each analysis are not met. To minimize the risk of moderate to substantial loss in power due to a suboptimal choice of a statistic, a robust method was developed for nonsequential trials. The concept is analogous to diversification of financial investments to minimize risk. The method is based on combining P values from multiple test statistics for formal inference while controlling the type I error rate at its designated value.This article evaluates the performance of 2 P value combining methods for group sequential trials. The emphasis is on time to event trials although results from less complex trials are also included. The gain or loss in power with the combination method relative to a single statistic is asymmetric in its favor. Depending on the power of each individual test, the combination method can give more power than any single test or give power that is closer to the test with the most power. The versatility of the method is that it can combine P values from different test statistics for analysis at different times. The robustness of results suggests that inference from group sequential trials can be strengthened with the use of combined tests. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Combining archeomagnetic and volcanic data with historical geomagnetic observations to reconstruct global field evolution over the past 1000 years, including new paleomagnetic data from historical lava flows on Fogo, Cape Verde

    NASA Astrophysics Data System (ADS)

    Korte, M. C.; Senftleben, R.; Brown, M. C.; Finlay, C. C.; Feinberg, J. M.; Biggin, A. J.

    2016-12-01

    Geomagnetic field evolution of the recent past can be studied using different data sources: Jackson et al. (2000) combined historical observations with modern field measurements to derive a global geomagnetic field model (gufm1) spanning 1590 to 1990. Several published young archeo- and volcanic paleomagnetic data fall into this time interval. Here, we directly combine data from these different sources to derive a global field model covering the past 1000 years. We particularly focus on reliably recovering dipole moment evolution prior to the times of the first direct absolute intensity observations at around 1840. We first compared the different data types and their agreement with the gufm1 model to assess their compatibility and reliability. We used these results, in combination with statistical modelling tests, to obtain suitable uncertainty estimates as weighting factors for the data in the final model. In addition, we studied samples from seven lava flows from the island of Fogo, Cape Verde, erupted between 1664 and 1857. Oriented samples were available for two of them, providing declination and inclination results. Due to the complicated mineralogy of three of the flows, microwave paleointensity experiments using a modified version of the IZZI protocol were carried out on flows erupted in 1664, 1769, 1816 and 1847. The new directional results are compared with nearby historical data and the influence on, and agreement with, the new model are discussed.

  18. Constraints on Cosmological Parameters from the Angular Power Spectrum of a Combined 2500 deg$^2$ SPT-SZ and Planck Gravitational Lensing Map

    DOE PAGES

    Simard, G.; et al.

    2018-06-20

    We report constraints on cosmological parameters from the angular power spectrum of a cosmic microwave background (CMB) gravitational lensing potential map created using temperature data from 2500 degmore » $^2$ of South Pole Telescope (SPT) data supplemented with data from Planck in the same sky region, with the statistical power in the combined map primarily from the SPT data. We fit the corresponding lensing angular power spectrum to a model including cold dark matter and a cosmological constant ($$\\Lambda$$CDM), and to models with single-parameter extensions to $$\\Lambda$$CDM. We find constraints that are comparable to and consistent with constraints found using the full-sky Planck CMB lensing data. Specifically, we find $$\\sigma_8 \\Omega_{\\rm m}^{0.25}=0.598 \\pm 0.024$$ from the lensing data alone with relatively weak priors placed on the other $$\\Lambda$$CDM parameters. In combination with primary CMB data from Planck, we explore single-parameter extensions to the $$\\Lambda$$CDM model. We find $$\\Omega_k = -0.012^{+0.021}_{-0.023}$$ or $$M_{\

  19. Constraints on Cosmological Parameters from the Angular Power Spectrum of a Combined 2500 deg$^2$ SPT-SZ and Planck Gravitational Lensing Map

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Simard, G.; et al.

    We report constraints on cosmological parameters from the angular power spectrum of a cosmic microwave background (CMB) gravitational lensing potential map created using temperature data from 2500 degmore » $^2$ of South Pole Telescope (SPT) data supplemented with data from Planck in the same sky region, with the statistical power in the combined map primarily from the SPT data. We fit the corresponding lensing angular power spectrum to a model including cold dark matter and a cosmological constant ($$\\Lambda$$CDM), and to models with single-parameter extensions to $$\\Lambda$$CDM. We find constraints that are comparable to and consistent with constraints found using the full-sky Planck CMB lensing data. Specifically, we find $$\\sigma_8 \\Omega_{\\rm m}^{0.25}=0.598 \\pm 0.024$$ from the lensing data alone with relatively weak priors placed on the other $$\\Lambda$$CDM parameters. In combination with primary CMB data from Planck, we explore single-parameter extensions to the $$\\Lambda$$CDM model. We find $$\\Omega_k = -0.012^{+0.021}_{-0.023}$$ or $$M_{\

  20. Measuring leader perceptions of school readiness for reforms: use of an iterative model combining classical and Rasch methods.

    PubMed

    Chatterji, Madhabi

    2002-01-01

    This study examines validity of data generated by the School Readiness for Reforms: Leader Questionnaire (SRR-LQ) using an iterative procedure that combines classical and Rasch rating scale analysis. Following content-validation and pilot-testing, principal axis factor extraction and promax rotation of factors yielded a five factor structure consistent with the content-validated subscales of the original instrument. Factors were identified based on inspection of pattern and structure coefficients. The rotated factor pattern, inter-factor correlations, convergent validity coefficients, and Cronbach's alpha reliability estimates supported the hypothesized construct properties. To further examine unidimensionality and efficacy of the rating scale structures, item-level data from each factor-defined subscale were subjected to analysis with the Rasch rating scale model. Data-to-model fit statistics and separation reliability for items and persons met acceptable criteria. Rating scale results suggested consistency of expected and observed step difficulties in rating categories, and correspondence of step calibrations with increases in the underlying variables. The combined approach yielded more comprehensive diagnostic information on the quality of the five SRR-LQ subscales; further research is continuing.

Top