Application of a New Resampling Method to SEM: A Comparison of S-SMART with the Bootstrap
ERIC Educational Resources Information Center
Bai, Haiyan; Sivo, Stephen A.; Pan, Wei; Fan, Xitao
2016-01-01
Among the commonly used resampling methods of dealing with small-sample problems, the bootstrap enjoys the widest applications because it often outperforms its counterparts. However, the bootstrap still has limitations when its operations are contemplated. Therefore, the purpose of this study is to examine an alternative, new resampling method…
ERIC Educational Resources Information Center
Nevitt, Jonathan; Hancock, Gregory R.
2001-01-01
Evaluated the bootstrap method under varying conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Results for the bootstrap suggest the resampling-based method may be conservative in its control over model rejections, thus having an impact on the statistical power associated…
Cui, Ming; Xu, Lili; Wang, Huimin; Ju, Shaoqing; Xu, Shuizhu; Jing, Rongrong
2017-12-01
Measurement uncertainty (MU) is a metrological concept, which can be used for objectively estimating the quality of test results in medical laboratories. The Nordtest guide recommends an approach that uses both internal quality control (IQC) and external quality assessment (EQA) data to evaluate the MU. Bootstrap resampling is employed to simulate the unknown distribution based on the mathematical statistics method using an existing small sample of data, where the aim is to transform the small sample into a large sample. However, there have been no reports of the utilization of this method in medical laboratories. Thus, this study applied the Nordtest guide approach based on bootstrap resampling for estimating the MU. We estimated the MU for the white blood cell (WBC) count, red blood cell (RBC) count, hemoglobin (Hb), and platelets (Plt). First, we used 6months of IQC data and 12months of EQA data to calculate the MU according to the Nordtest method. Second, we combined the Nordtest method and bootstrap resampling with the quality control data and calculated the MU using MATLAB software. We then compared the MU results obtained using the two approaches. The expanded uncertainty results determined for WBC, RBC, Hb, and Plt using the bootstrap resampling method were 4.39%, 2.43%, 3.04%, and 5.92%, respectively, and 4.38%, 2.42%, 3.02%, and 6.00% with the existing quality control data (U [k=2]). For WBC, RBC, Hb, and Plt, the differences between the results obtained using the two methods were lower than 1.33%. The expanded uncertainty values were all less than the target uncertainties. The bootstrap resampling method allows the statistical analysis of the MU. Combining the Nordtest method and bootstrap resampling is considered a suitable alternative method for estimating the MU. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
The Beginner's Guide to the Bootstrap Method of Resampling.
ERIC Educational Resources Information Center
Lane, Ginny G.
The bootstrap method of resampling can be useful in estimating the replicability of study results. The bootstrap procedure creates a mock population from a given sample of data from which multiple samples are then drawn. The method extends the usefulness of the jackknife procedure as it allows for computation of a given statistic across a maximal…
Comparison of bootstrap approaches for estimation of uncertainties of DTI parameters.
Chung, SungWon; Lu, Ying; Henry, Roland G
2006-11-01
Bootstrap is an empirical non-parametric statistical technique based on data resampling that has been used to quantify uncertainties of diffusion tensor MRI (DTI) parameters, useful in tractography and in assessing DTI methods. The current bootstrap method (repetition bootstrap) used for DTI analysis performs resampling within the data sharing common diffusion gradients, requiring multiple acquisitions for each diffusion gradient. Recently, wild bootstrap was proposed that can be applied without multiple acquisitions. In this paper, two new approaches are introduced called residual bootstrap and repetition bootknife. We show that repetition bootknife corrects for the large bias present in the repetition bootstrap method and, therefore, better estimates the standard errors. Like wild bootstrap, residual bootstrap is applicable to single acquisition scheme, and both are based on regression residuals (called model-based resampling). Residual bootstrap is based on the assumption that non-constant variance of measured diffusion-attenuated signals can be modeled, which is actually the assumption behind the widely used weighted least squares solution of diffusion tensor. The performances of these bootstrap approaches were compared in terms of bias, variance, and overall error of bootstrap-estimated standard error by Monte Carlo simulation. We demonstrate that residual bootstrap has smaller biases and overall errors, which enables estimation of uncertainties with higher accuracy. Understanding the properties of these bootstrap procedures will help us to choose the optimal approach for estimating uncertainties that can benefit hypothesis testing based on DTI parameters, probabilistic fiber tracking, and optimizing DTI methods.
Warton, David I; Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.
Thibaut, Loïc; Wang, Yi Alice
2017-01-01
Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071
Xiao, Yongling; Abrahamowicz, Michal
2010-03-30
We propose two bootstrap-based methods to correct the standard errors (SEs) from Cox's model for within-cluster correlation of right-censored event times. The cluster-bootstrap method resamples, with replacement, only the clusters, whereas the two-step bootstrap method resamples (i) the clusters, and (ii) individuals within each selected cluster, with replacement. In simulations, we evaluate both methods and compare them with the existing robust variance estimator and the shared gamma frailty model, which are available in statistical software packages. We simulate clustered event time data, with latent cluster-level random effects, which are ignored in the conventional Cox's model. For cluster-level covariates, both proposed bootstrap methods yield accurate SEs, and type I error rates, and acceptable coverage rates, regardless of the true random effects distribution, and avoid serious variance under-estimation by conventional Cox-based standard errors. However, the two-step bootstrap method over-estimates the variance for individual-level covariates. We also apply the proposed bootstrap methods to obtain confidence bands around flexible estimates of time-dependent effects in a real-life analysis of cluster event times.
Assessment of resampling methods for causality testing: A note on the US inflation behavior
Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms. PMID:28708870
Assessment of resampling methods for causality testing: A note on the US inflation behavior.
Papana, Angeliki; Kyrtsou, Catherine; Kugiumtzis, Dimitris; Diks, Cees
2017-01-01
Different resampling methods for the null hypothesis of no Granger causality are assessed in the setting of multivariate time series, taking into account that the driving-response coupling is conditioned on the other observed variables. As appropriate test statistic for this setting, the partial transfer entropy (PTE), an information and model-free measure, is used. Two resampling techniques, time-shifted surrogates and the stationary bootstrap, are combined with three independence settings (giving a total of six resampling methods), all approximating the null hypothesis of no Granger causality. In these three settings, the level of dependence is changed, while the conditioning variables remain intact. The empirical null distribution of the PTE, as the surrogate and bootstrapped time series become more independent, is examined along with the size and power of the respective tests. Additionally, we consider a seventh resampling method by contemporaneously resampling the driving and the response time series using the stationary bootstrap. Although this case does not comply with the no causality hypothesis, one can obtain an accurate sampling distribution for the mean of the test statistic since its value is zero under H0. Results indicate that as the resampling setting gets more independent, the test becomes more conservative. Finally, we conclude with a real application. More specifically, we investigate the causal links among the growth rates for the US CPI, money supply and crude oil. Based on the PTE and the seven resampling methods, we consistently find that changes in crude oil cause inflation conditioning on money supply in the post-1986 period. However this relationship cannot be explained on the basis of traditional cost-push mechanisms.
Confidence limit calculation for antidotal potency ratio derived from lethal dose 50
Manage, Ananda; Petrikovics, Ilona
2013-01-01
AIM: To describe confidence interval calculation for antidotal potency ratios using bootstrap method. METHODS: We can easily adapt the nonparametric bootstrap method which was invented by Efron to construct confidence intervals in such situations like this. The bootstrap method is a resampling method in which the bootstrap samples are obtained by resampling from the original sample. RESULTS: The described confidence interval calculation using bootstrap method does not require the sampling distribution antidotal potency ratio. This can serve as a substantial help for toxicologists, who are directed to employ the Dixon up-and-down method with the application of lower number of animals to determine lethal dose 50 values for characterizing the investigated toxic molecules and eventually for characterizing the antidotal protections by the test antidotal systems. CONCLUSION: The described method can serve as a useful tool in various other applications. Simplicity of the method makes it easier to do the calculation using most of the programming software packages. PMID:25237618
Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A
2017-06-30
Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Fan, Xitao
This paper empirically and systematically assessed the performance of bootstrap resampling procedure as it was applied to a regression model. Parameter estimates from Monte Carlo experiments (repeated sampling from population) and bootstrap experiments (repeated resampling from one original bootstrap sample) were generated and compared. Sample…
Comparison of parametric and bootstrap method in bioequivalence test.
Ahn, Byung-Jin; Yim, Dong-Seok
2009-10-01
The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption.
Comparison of Parametric and Bootstrap Method in Bioequivalence Test
Ahn, Byung-Jin
2009-01-01
The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption. PMID:19915699
ERIC Educational Resources Information Center
Hand, Michael L.
1990-01-01
Use of the bootstrap resampling technique (BRT) is assessed in its application to resampling analysis associated with measurement of payment allocation errors by federally funded Family Assistance Programs. The BRT is applied to a food stamp quality control database in Oregon. This analysis highlights the outlier-sensitivity of the…
Classifier performance prediction for computer-aided diagnosis using a limited dataset.
Sahiner, Berkman; Chan, Heang-Ping; Hadjiiski, Lubomir
2008-04-01
In a practical classifier design problem, the true population is generally unknown and the available sample is finite-sized. A common approach is to use a resampling technique to estimate the performance of the classifier that will be trained with the available sample. We conducted a Monte Carlo simulation study to compare the ability of the different resampling techniques in training the classifier and predicting its performance under the constraint of a finite-sized sample. The true population for the two classes was assumed to be multivariate normal distributions with known covariance matrices. Finite sets of sample vectors were drawn from the population. The true performance of the classifier is defined as the area under the receiver operating characteristic curve (AUC) when the classifier designed with the specific sample is applied to the true population. We investigated methods based on the Fukunaga-Hayes and the leave-one-out techniques, as well as three different types of bootstrap methods, namely, the ordinary, 0.632, and 0.632+ bootstrap. The Fisher's linear discriminant analysis was used as the classifier. The dimensionality of the feature space was varied from 3 to 15. The sample size n2 from the positive class was varied between 25 and 60, while the number of cases from the negative class was either equal to n2 or 3n2. Each experiment was performed with an independent dataset randomly drawn from the true population. Using a total of 1000 experiments for each simulation condition, we compared the bias, the variance, and the root-mean-squared error (RMSE) of the AUC estimated using the different resampling techniques relative to the true AUC (obtained from training on a finite dataset and testing on the population). Our results indicated that, under the study conditions, there can be a large difference in the RMSE obtained using different resampling methods, especially when the feature space dimensionality is relatively large and the sample size is small. Under this type of conditions, the 0.632 and 0.632+ bootstrap methods have the lowest RMSE, indicating that the difference between the estimated and the true performances obtained using the 0.632 and 0.632+ bootstrap will be statistically smaller than those obtained using the other three resampling methods. Of the three bootstrap methods, the 0.632+ bootstrap provides the lowest bias. Although this investigation is performed under some specific conditions, it reveals important trends for the problem of classifier performance prediction under the constraint of a limited dataset.
Mattfeldt, Torsten
2011-04-01
Computer-intensive methods may be defined as data analytical procedures involving a huge number of highly repetitive computations. We mention resampling methods with replacement (bootstrap methods), resampling methods without replacement (randomization tests) and simulation methods. The resampling methods are based on simple and robust principles and are largely free from distributional assumptions. Bootstrap methods may be used to compute confidence intervals for a scalar model parameter and for summary statistics from replicated planar point patterns, and for significance tests. For some simple models of planar point processes, point patterns can be simulated by elementary Monte Carlo methods. The simulation of models with more complex interaction properties usually requires more advanced computing methods. In this context, we mention simulation of Gibbs processes with Markov chain Monte Carlo methods using the Metropolis-Hastings algorithm. An alternative to simulations on the basis of a parametric model consists of stochastic reconstruction methods. The basic ideas behind the methods are briefly reviewed and illustrated by simple worked examples in order to encourage novices in the field to use computer-intensive methods. © 2010 The Authors Journal of Microscopy © 2010 Royal Microscopical Society.
NASA Astrophysics Data System (ADS)
Olafsdottir, Kristin B.; Mudelsee, Manfred
2013-04-01
Estimation of the Pearson's correlation coefficient between two time series to evaluate the influences of one time depended variable on another is one of the most often used statistical method in climate sciences. Various methods are used to estimate confidence interval to support the correlation point estimate. Many of them make strong mathematical assumptions regarding distributional shape and serial correlation, which are rarely met. More robust statistical methods are needed to increase the accuracy of the confidence intervals. Bootstrap confidence intervals are estimated in the Fortran 90 program PearsonT (Mudelsee, 2003), where the main intention was to get an accurate confidence interval for correlation coefficient between two time series by taking the serial dependence of the process that generated the data into account. However, Monte Carlo experiments show that the coverage accuracy for smaller data sizes can be improved. Here we adapt the PearsonT program into a new version called PearsonT3, by calibrating the confidence interval to increase the coverage accuracy. Calibration is a bootstrap resampling technique, which basically performs a second bootstrap loop or resamples from the bootstrap resamples. It offers, like the non-calibrated bootstrap confidence intervals, robustness against the data distribution. Pairwise moving block bootstrap is used to preserve the serial correlation of both time series. The calibration is applied to standard error based bootstrap Student's t confidence intervals. The performances of the calibrated confidence intervals are examined with Monte Carlo simulations, and compared with the performances of confidence intervals without calibration, that is, PearsonT. The coverage accuracy is evidently better for the calibrated confidence intervals where the coverage error is acceptably small (i.e., within a few percentage points) already for data sizes as small as 20. One form of climate time series is output from numerical models which simulate the climate system. The method is applied to model data from the high resolution ocean model, INALT01 where the relationship between the Agulhas Leakage and the North Brazil Current is evaluated. Preliminary results show significant correlation between the two variables when there is 10 year lag between them, which is more or less the time that takes the Agulhas Leakage water to reach the North Brazil Current. Mudelsee, M., 2003. Estimating Pearson's correlation coefficient with bootstrap confidence interval from serially dependent time series. Mathematical Geology 35, 651-665.
The Bootstrap, the Jackknife, and the Randomization Test: A Sampling Taxonomy.
Rodgers, J L
1999-10-01
A simple sampling taxonomy is defined that shows the differences between and relationships among the bootstrap, the jackknife, and the randomization test. Each method has as its goal the creation of an empirical sampling distribution that can be used to test statistical hypotheses, estimate standard errors, and/or create confidence intervals. Distinctions between the methods can be made based on the sampling approach (with replacement versus without replacement) and the sample size (replacing the whole original sample versus replacing a subset of the original sample). The taxonomy is useful for teaching the goals and purposes of resampling schemes. An extension of the taxonomy implies other possible resampling approaches that have not previously been considered. Univariate and multivariate examples are presented.
What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum
Hesterberg, Tim C.
2015-01-01
Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong. For example, the common combination of nonparametric bootstrapping and bootstrap percentile confidence intervals is less accurate than using t-intervals for small samples, though more accurate for larger samples. My goals in this article are to provide a deeper understanding of bootstrap methods—how they work, when they work or not, and which methods work better—and to highlight pedagogical issues. Supplementary materials for this article are available online. [Received December 2014. Revised August 2015] PMID:27019512
Assessing uncertainties in superficial water provision by different bootstrap-based techniques
NASA Astrophysics Data System (ADS)
Rodrigues, Dulce B. B.; Gupta, Hoshin V.; Mendiondo, Eduardo Mario
2014-05-01
An assessment of water security can incorporate several water-related concepts, characterizing the interactions between societal needs, ecosystem functioning, and hydro-climatic conditions. The superficial freshwater provision level depends on the methods chosen for 'Environmental Flow Requirement' estimations, which integrate the sources of uncertainty in the understanding of how water-related threats to aquatic ecosystem security arise. Here, we develop an uncertainty assessment of superficial freshwater provision based on different bootstrap techniques (non-parametric resampling with replacement). To illustrate this approach, we use an agricultural basin (291 km2) within the Cantareira water supply system in Brazil monitored by one daily streamflow gage (24-year period). The original streamflow time series has been randomly resampled for different times or sample sizes (N = 500; ...; 1000), then applied to the conventional bootstrap approach and variations of this method, such as: 'nearest neighbor bootstrap'; and 'moving blocks bootstrap'. We have analyzed the impact of the sampling uncertainty on five Environmental Flow Requirement methods, based on: flow duration curves or probability of exceedance (Q90%, Q75% and Q50%); 7-day 10-year low-flow statistic (Q7,10); and presumptive standard (80% of the natural monthly mean ?ow). The bootstrap technique has been also used to compare those 'Environmental Flow Requirement' (EFR) methods among themselves, considering the difference between the bootstrap estimates and the "true" EFR characteristic, which has been computed averaging the EFR values of the five methods and using the entire streamflow record at monitoring station. This study evaluates the bootstrapping strategies, the representativeness of streamflow series for EFR estimates and their confidence intervals, in addition to overview of the performance differences between the EFR methods. The uncertainties arisen during EFR methods assessment will be propagated through water security indicators referring to water scarcity and vulnerability, seeking to provide meaningful support to end-users and water managers facing the incorporation of uncertainties in the decision making process.
Resampling methods in Microsoft Excel® for estimating reference intervals
Theodorsson, Elvar
2015-01-01
Computer- intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles. The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular. Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples. PMID:26527366
Resampling methods in Microsoft Excel® for estimating reference intervals.
Theodorsson, Elvar
2015-01-01
Computer-intensive resampling/bootstrap methods are feasible when calculating reference intervals from non-Gaussian or small reference samples. Microsoft Excel® in version 2010 or later includes natural functions, which lend themselves well to this purpose including recommended interpolation procedures for estimating 2.5 and 97.5 percentiles. The purpose of this paper is to introduce the reader to resampling estimation techniques in general and in using Microsoft Excel® 2010 for the purpose of estimating reference intervals in particular. Parametric methods are preferable to resampling methods when the distributions of observations in the reference samples is Gaussian or can transformed to that distribution even when the number of reference samples is less than 120. Resampling methods are appropriate when the distribution of data from the reference samples is non-Gaussian and in case the number of reference individuals and corresponding samples are in the order of 40. At least 500-1000 random samples with replacement should be taken from the results of measurement of the reference samples.
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.
Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai
2014-12-18
A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.
How bootstrap can help in forecasting time series with more than one seasonal pattern
NASA Astrophysics Data System (ADS)
Cordeiro, Clara; Neves, M. Manuela
2012-09-01
The search for the future is an appealing challenge in time series analysis. The diversity of forecasting methodologies is inevitable and is still in expansion. Exponential smoothing methods are the launch platform for modelling and forecasting in time series analysis. Recently this methodology has been combined with bootstrapping revealing a good performance. The algorithm (Boot. EXPOS) using exponential smoothing and bootstrap methodologies, has showed promising results for forecasting time series with one seasonal pattern. In case of more than one seasonal pattern, the double seasonal Holt-Winters methods and the exponential smoothing methods were developed. A new challenge was now to combine these seasonal methods with bootstrap and carry over a similar resampling scheme used in Boot. EXPOS procedure. The performance of such partnership will be illustrated for some well-know data sets existing in software.
Assessing Uncertainties in Surface Water Security: A Probabilistic Multi-model Resampling approach
NASA Astrophysics Data System (ADS)
Rodrigues, D. B. B.
2015-12-01
Various uncertainties are involved in the representation of processes that characterize interactions between societal needs, ecosystem functioning, and hydrological conditions. Here, we develop an empirical uncertainty assessment of water security indicators that characterize scarcity and vulnerability, based on a multi-model and resampling framework. We consider several uncertainty sources including those related to: i) observed streamflow data; ii) hydrological model structure; iii) residual analysis; iv) the definition of Environmental Flow Requirement method; v) the definition of critical conditions for water provision; and vi) the critical demand imposed by human activities. We estimate the overall uncertainty coming from the hydrological model by means of a residual bootstrap resampling approach, and by uncertainty propagation through different methodological arrangements applied to a 291 km² agricultural basin within the Cantareira water supply system in Brazil. Together, the two-component hydrograph residual analysis and the block bootstrap resampling approach result in a more accurate and precise estimate of the uncertainty (95% confidence intervals) in the simulated time series. We then compare the uncertainty estimates associated with water security indicators using a multi-model framework and provided by each model uncertainty estimation approach. The method is general and can be easily extended forming the basis for meaningful support to end-users facing water resource challenges by enabling them to incorporate a viable uncertainty analysis into a robust decision making process.
ERIC Educational Resources Information Center
Peterson, Ivars
1991-01-01
A method that enables people to obtain the benefits of statistics and probability theory without the shortcomings of conventional methods because it is free of mathematical formulas and is easy to understand and use is described. A resampling technique called the "bootstrap" is discussed in terms of application and development. (KR)
Phu, Jack; Bui, Bang V; Kalloniatis, Michael; Khuu, Sieu K
2018-03-01
The number of subjects needed to establish the normative limits for visual field (VF) testing is not known. Using bootstrap resampling, we determined whether the ground truth mean, distribution limits, and standard deviation (SD) could be approximated using different set size ( x ) levels, in order to provide guidance for the number of healthy subjects required to obtain robust VF normative data. We analyzed the 500 Humphrey Field Analyzer (HFA) SITA-Standard results of 116 healthy subjects and 100 HFA full threshold results of 100 psychophysically experienced healthy subjects. These VFs were resampled (bootstrapped) to determine mean sensitivity, distribution limits (5th and 95th percentiles), and SD for different ' x ' and numbers of resamples. We also used the VF results of 122 glaucoma patients to determine the performance of ground truth and bootstrapped results in identifying and quantifying VF defects. An x of 150 (for SITA-Standard) and 60 (for full threshold) produced bootstrapped descriptive statistics that were no longer different to the original distribution limits and SD. Removing outliers produced similar results. Differences between original and bootstrapped limits in detecting glaucomatous defects were minimized at x = 250. Ground truth statistics of VF sensitivities could be approximated using set sizes that are significantly smaller than the original cohort. Outlier removal facilitates the use of Gaussian statistics and does not significantly affect the distribution limits. We provide guidance for choosing the cohort size for different levels of error when performing normative comparisons with glaucoma patients.
Epistemic uncertainty in the location and magnitude of earthquakes in Italy from Macroseismic data
Bakun, W.H.; Gomez, Capera A.; Stucchi, M.
2011-01-01
Three independent techniques (Bakun and Wentworth, 1997; Boxer from Gasperini et al., 1999; and Macroseismic Estimation of Earthquake Parameters [MEEP; see Data and Resources section, deliverable D3] from R.M.W. Musson and M.J. Jimenez) have been proposed for estimating an earthquake location and magnitude from intensity data alone. The locations and magnitudes obtained for a given set of intensity data are almost always different, and no one technique is consistently best at matching instrumental locations and magnitudes of recent well-recorded earthquakes in Italy. Rather than attempting to select one of the three solutions as best, we use all three techniques to estimate the location and the magnitude and the epistemic uncertainties among them. The estimates are calculated using bootstrap resampled data sets with Monte Carlo sampling of a decision tree. The decision-tree branch weights are based on goodness-of-fit measures of location and magnitude for recent earthquakes. The location estimates are based on the spatial distribution of locations calculated from the bootstrap resampled data. The preferred source location is the locus of the maximum bootstrap location spatial density. The location uncertainty is obtained from contours of the bootstrap spatial density: 68% of the bootstrap locations are within the 68% confidence region, and so on. For large earthquakes, our preferred location is not associated with the epicenter but with a location on the extended rupture surface. For small earthquakes, the epicenters are generally consistent with the location uncertainties inferred from the intensity data if an epicenter inaccuracy of 2-3 km is allowed. The preferred magnitude is the median of the distribution of bootstrap magnitudes. As with location uncertainties, the uncertainties in magnitude are obtained from the distribution of bootstrap magnitudes: the bounds of the 68% uncertainty range enclose 68% of the bootstrap magnitudes, and so on. The instrumental magnitudes for large and small earthquakes are generally consistent with the confidence intervals inferred from the distribution of bootstrap resampled magnitudes.
Bootstrap position analysis for forecasting low flow frequency
Tasker, Gary D.; Dunne, P.
1997-01-01
A method of random resampling of residuals from stochastic models is used to generate a large number of 12-month-long traces of natural monthly runoff to be used in a position analysis model for a water-supply storage and delivery system. Position analysis uses the traces to forecast the likelihood of specified outcomes such as reservoir levels falling below a specified level or streamflows falling below statutory passing flows conditioned on the current reservoir levels and streamflows. The advantages of this resampling scheme, called bootstrap position analysis, are that it does not rely on the unverifiable assumption of normality, fewer parameters need to be estimated directly from the data, and accounting for parameter uncertainty is easily done. For a given set of operating rules and water-use requirements for a system, water managers can use such a model as a decision-making tool to evaluate different operating rules. ?? ASCE,.
NASA Astrophysics Data System (ADS)
Angrisano, Antonio; Maratea, Antonio; Gaglione, Salvatore
2018-01-01
In the absence of obstacles, a GPS device is generally able to provide continuous and accurate estimates of position, while in urban scenarios buildings can generate multipath and echo-only phenomena that severely affect the continuity and the accuracy of the provided estimates. Receiver autonomous integrity monitoring (RAIM) techniques are able to reduce the negative consequences of large blunders in urban scenarios, but require both a good redundancy and a low contamination to be effective. In this paper a resampling strategy based on bootstrap is proposed as an alternative to RAIM, in order to estimate accurately position in case of low redundancy and multiple blunders: starting with the pseudorange measurement model, at each epoch the available measurements are bootstrapped—that is random sampled with replacement—and the generated a posteriori empirical distribution is exploited to derive the final position. Compared to standard bootstrap, in this paper the sampling probabilities are not uniform, but vary according to an indicator of the measurement quality. The proposed method has been compared with two different RAIM techniques on a data set collected in critical conditions, resulting in a clear improvement on all considered figures of merit.
Small sample mediation testing: misplaced confidence in bootstrapped confidence intervals.
Koopman, Joel; Howe, Michael; Hollenbeck, John R; Sin, Hock-Peng
2015-01-01
Bootstrapping is an analytical tool commonly used in psychology to test the statistical significance of the indirect effect in mediation models. Bootstrapping proponents have particularly advocated for its use for samples of 20-80 cases. This advocacy has been heeded, especially in the Journal of Applied Psychology, as researchers are increasingly utilizing bootstrapping to test mediation with samples in this range. We discuss reasons to be concerned with this escalation, and in a simulation study focused specifically on this range of sample sizes, we demonstrate not only that bootstrapping has insufficient statistical power to provide a rigorous hypothesis test in most conditions but also that bootstrapping has a tendency to exhibit an inflated Type I error rate. We then extend our simulations to investigate an alternative empirical resampling method as well as a Bayesian approach and demonstrate that they exhibit comparable statistical power to bootstrapping in small samples without the associated inflated Type I error. Implications for researchers testing mediation hypotheses in small samples are presented. For researchers wishing to use these methods in their own research, we have provided R syntax in the online supplemental materials. (c) 2015 APA, all rights reserved.
ERIC Educational Resources Information Center
Longford, Nicholas T.
Large scale surveys usually employ a complex sampling design and as a consequence, no standard methods for estimation of the standard errors associated with the estimates of population means are available. Resampling methods, such as jackknife or bootstrap, are often used, with reference to their properties of robustness and reduction of bias. A…
Assessing uncertainties in surface water security: An empirical multimodel approach
NASA Astrophysics Data System (ADS)
Rodrigues, Dulce B. B.; Gupta, Hoshin V.; Mendiondo, Eduardo M.; Oliveira, Paulo Tarso S.
2015-11-01
Various uncertainties are involved in the representation of processes that characterize interactions among societal needs, ecosystem functioning, and hydrological conditions. Here we develop an empirical uncertainty assessment of water security indicators that characterize scarcity and vulnerability, based on a multimodel and resampling framework. We consider several uncertainty sources including those related to (i) observed streamflow data; (ii) hydrological model structure; (iii) residual analysis; (iv) the method for defining Environmental Flow Requirement; (v) the definition of critical conditions for water provision; and (vi) the critical demand imposed by human activities. We estimate the overall hydrological model uncertainty by means of a residual bootstrap resampling approach, and by uncertainty propagation through different methodological arrangements applied to a 291 km2 agricultural basin within the Cantareira water supply system in Brazil. Together, the two-component hydrograph residual analysis and the block bootstrap resampling approach result in a more accurate and precise estimate of the uncertainty (95% confidence intervals) in the simulated time series. We then compare the uncertainty estimates associated with water security indicators using a multimodel framework and the uncertainty estimates provided by each model uncertainty estimation approach. The range of values obtained for the water security indicators suggests that the models/methods are robust and performs well in a range of plausible situations. The method is general and can be easily extended, thereby forming the basis for meaningful support to end-users facing water resource challenges by enabling them to incorporate a viable uncertainty analysis into a robust decision-making process.
Explanation of Two Anomalous Results in Statistical Mediation Analysis
ERIC Educational Resources Information Center
Fritz, Matthew S.; Taylor, Aaron B.; MacKinnon, David P.
2012-01-01
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special…
Estimating uncertainty in respondent-driven sampling using a tree bootstrap method.
Baraff, Aaron J; McCormick, Tyler H; Raftery, Adrian E
2016-12-20
Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.
Exploring the Replicability of a Study's Results: Bootstrap Statistics for the Multivariate Case.
ERIC Educational Resources Information Center
Thompson, Bruce
Conventional statistical significance tests do not inform the researcher regarding the likelihood that results will replicate. One strategy for evaluating result replication is to use a "bootstrap" resampling of a study's data so that the stability of results across numerous configurations of the subjects can be explored. This paper…
NASA Astrophysics Data System (ADS)
Coupon, Jean; Leauthaud, Alexie; Kilbinger, Martin; Medezinski, Elinor
2017-07-01
SWOT (Super W Of Theta) computes two-point statistics for very large data sets, based on “divide and conquer” algorithms, mainly, but not limited to data storage in binary trees, approximation at large scale, parellelization (open MPI), and bootstrap and jackknife resampling methods “on the fly”. It currently supports projected and 3D galaxy auto and cross correlations, galaxy-galaxy lensing, and weighted histograms.
Babamoradi, Hamid; van den Berg, Frans; Rinnan, Åsmund
2016-02-18
In Multivariate Statistical Process Control, when a fault is expected or detected in the process, contribution plots are essential for operators and optimization engineers in identifying those process variables that were affected by or might be the cause of the fault. The traditional way of interpreting a contribution plot is to examine the largest contributing process variables as the most probable faulty ones. This might result in false readings purely due to the differences in natural variation, measurement uncertainties, etc. It is more reasonable to compare variable contributions for new process runs with historical results achieved under Normal Operating Conditions, where confidence limits for contribution plots estimated from training data are used to judge new production runs. Asymptotic methods cannot provide confidence limits for contribution plots, leaving re-sampling methods as the only option. We suggest bootstrap re-sampling to build confidence limits for all contribution plots in online PCA-based MSPC. The new strategy to estimate CLs is compared to the previously reported CLs for contribution plots. An industrial batch process dataset was used to illustrate the concepts. Copyright © 2016 Elsevier B.V. All rights reserved.
2014-01-01
Background Cost-effectiveness analyses (CEAs) that use patient-specific data from a randomized controlled trial (RCT) are popular, yet such CEAs are criticized because they neglect to incorporate evidence external to the trial. A popular method for quantifying uncertainty in a RCT-based CEA is the bootstrap. The objective of the present study was to further expand the bootstrap method of RCT-based CEA for the incorporation of external evidence. Methods We utilize the Bayesian interpretation of the bootstrap and derive the distribution for the cost and effectiveness outcomes after observing the current RCT data and the external evidence. We propose simple modifications of the bootstrap for sampling from such posterior distributions. Results In a proof-of-concept case study, we use data from a clinical trial and incorporate external evidence on the effect size of treatments to illustrate the method in action. Compared to the parametric models of evidence synthesis, the proposed approach requires fewer distributional assumptions, does not require explicit modeling of the relation between external evidence and outcomes of interest, and is generally easier to implement. A drawback of this approach is potential computational inefficiency compared to the parametric Bayesian methods. Conclusions The bootstrap method of RCT-based CEA can be extended to incorporate external evidence, while preserving its appealing features such as no requirement for parametric modeling of cost and effectiveness outcomes. PMID:24888356
Sadatsafavi, Mohsen; Marra, Carlo; Aaron, Shawn; Bryan, Stirling
2014-06-03
Cost-effectiveness analyses (CEAs) that use patient-specific data from a randomized controlled trial (RCT) are popular, yet such CEAs are criticized because they neglect to incorporate evidence external to the trial. A popular method for quantifying uncertainty in a RCT-based CEA is the bootstrap. The objective of the present study was to further expand the bootstrap method of RCT-based CEA for the incorporation of external evidence. We utilize the Bayesian interpretation of the bootstrap and derive the distribution for the cost and effectiveness outcomes after observing the current RCT data and the external evidence. We propose simple modifications of the bootstrap for sampling from such posterior distributions. In a proof-of-concept case study, we use data from a clinical trial and incorporate external evidence on the effect size of treatments to illustrate the method in action. Compared to the parametric models of evidence synthesis, the proposed approach requires fewer distributional assumptions, does not require explicit modeling of the relation between external evidence and outcomes of interest, and is generally easier to implement. A drawback of this approach is potential computational inefficiency compared to the parametric Bayesian methods. The bootstrap method of RCT-based CEA can be extended to incorporate external evidence, while preserving its appealing features such as no requirement for parametric modeling of cost and effectiveness outcomes.
Abstract: Inference and Interval Estimation for Indirect Effects With Latent Variable Models.
Falk, Carl F; Biesanz, Jeremy C
2011-11-30
Models specifying indirect effects (or mediation) and structural equation modeling are both popular in the social sciences. Yet relatively little research has compared methods that test for indirect effects among latent variables and provided precise estimates of the effectiveness of different methods. This simulation study provides an extensive comparison of methods for constructing confidence intervals and for making inferences about indirect effects with latent variables. We compared the percentile (PC) bootstrap, bias-corrected (BC) bootstrap, bias-corrected accelerated (BC a ) bootstrap, likelihood-based confidence intervals (Neale & Miller, 1997), partial posterior predictive (Biesanz, Falk, and Savalei, 2010), and joint significance tests based on Wald tests or likelihood ratio tests. All models included three reflective latent variables representing the independent, dependent, and mediating variables. The design included the following fully crossed conditions: (a) sample size: 100, 200, and 500; (b) number of indicators per latent variable: 3 versus 5; (c) reliability per set of indicators: .7 versus .9; (d) and 16 different path combinations for the indirect effect (α = 0, .14, .39, or .59; and β = 0, .14, .39, or .59). Simulations were performed using a WestGrid cluster of 1680 3.06GHz Intel Xeon processors running R and OpenMx. Results based on 1,000 replications per cell and 2,000 resamples per bootstrap method indicated that the BC and BC a bootstrap methods have inflated Type I error rates. Likelihood-based confidence intervals and the PC bootstrap emerged as methods that adequately control Type I error and have good coverage rates.
Trends and Correlation Estimation in Climate Sciences: Effects of Timescale Errors
NASA Astrophysics Data System (ADS)
Mudelsee, M.; Bermejo, M. A.; Bickert, T.; Chirila, D.; Fohlmeister, J.; Köhler, P.; Lohmann, G.; Olafsdottir, K.; Scholz, D.
2012-12-01
Trend describes time-dependence in the first moment of a stochastic process, and correlation measures the linear relation between two random variables. Accurately estimating the trend and correlation, including uncertainties, from climate time series data in the uni- and bivariate domain, respectively, allows first-order insights into the geophysical process that generated the data. Timescale errors, ubiquitious in paleoclimatology, where archives are sampled for proxy measurements and dated, poses a problem to the estimation. Statistical science and the various applied research fields, including geophysics, have almost completely ignored this problem due to its theoretical almost-intractability. However, computational adaptations or replacements of traditional error formulas have become technically feasible. This contribution gives a short overview of such an adaptation package, bootstrap resampling combined with parametric timescale simulation. We study linear regression, parametric change-point models and nonparametric smoothing for trend estimation. We introduce pairwise-moving block bootstrap resampling for correlation estimation. Both methods share robustness against autocorrelation and non-Gaussian distributional shape. We shortly touch computing-intensive calibration of bootstrap confidence intervals and consider options to parallelize the related computer code. Following examples serve not only to illustrate the methods but tell own climate stories: (1) the search for climate drivers of the Agulhas Current on recent timescales, (2) the comparison of three stalagmite-based proxy series of regional, western German climate over the later part of the Holocene, and (3) trends and transitions in benthic oxygen isotope time series from the Cenozoic. Financial support by Deutsche Forschungsgemeinschaft (FOR 668, FOR 1070, MU 1595/4-1) and the European Commission (MC ITN 238512, MC ITN 289447) is acknowledged.
A bootstrap estimation scheme for chemical compositional data with nondetects
Palarea-Albaladejo, J; Martín-Fernández, J.A; Olea, Ricardo A.
2014-01-01
The bootstrap method is commonly used to estimate the distribution of estimators and their associated uncertainty when explicit analytic expressions are not available or are difficult to obtain. It has been widely applied in environmental and geochemical studies, where the data generated often represent parts of whole, typically chemical concentrations. This kind of constrained data is generically called compositional data, and they require specialised statistical methods to properly account for their particular covariance structure. On the other hand, it is not unusual in practice that those data contain labels denoting nondetects, that is, concentrations falling below detection limits. Nondetects impede the implementation of the bootstrap and represent an additional source of uncertainty that must be taken into account. In this work, a bootstrap scheme is devised that handles nondetects by adding an imputation step within the resampling process and conveniently propagates their associated uncertainly. In doing so, it considers the constrained relationships between chemical concentrations originated from their compositional nature. Bootstrap estimates using a range of imputation methods, including new stochastic proposals, are compared across scenarios of increasing difficulty. They are formulated to meet compositional principles following the log-ratio approach, and an adjustment is introduced in the multivariate case to deal with nonclosed samples. Results suggest that nondetect bootstrap based on model-based imputation is generally preferable. A robust approach based on isometric log-ratio transformations appears to be particularly suited in this context. Computer routines in the R statistical programming language are provided.
Generalized Bootstrap Method for Assessment of Uncertainty in Semivariogram Inference
Olea, R.A.; Pardo-Iguzquiza, E.
2011-01-01
The semivariogram and its related function, the covariance, play a central role in classical geostatistics for modeling the average continuity of spatially correlated attributes. Whereas all methods are formulated in terms of the true semivariogram, in practice what can be used are estimated semivariograms and models based on samples. A generalized form of the bootstrap method to properly model spatially correlated data is used to advance knowledge about the reliability of empirical semivariograms and semivariogram models based on a single sample. Among several methods available to generate spatially correlated resamples, we selected a method based on the LU decomposition and used several examples to illustrate the approach. The first one is a synthetic, isotropic, exhaustive sample following a normal distribution, the second example is also a synthetic but following a non-Gaussian random field, and a third empirical sample consists of actual raingauge measurements. Results show wider confidence intervals than those found previously by others with inadequate application of the bootstrap. Also, even for the Gaussian example, distributions for estimated semivariogram values and model parameters are positively skewed. In this sense, bootstrap percentile confidence intervals, which are not centered around the empirical semivariogram and do not require distributional assumptions for its construction, provide an achieved coverage similar to the nominal coverage. The latter cannot be achieved by symmetrical confidence intervals based on the standard error, regardless if the standard error is estimated from a parametric equation or from bootstrap. ?? 2010 International Association for Mathematical Geosciences.
Forecasting drought risks for a water supply storage system using bootstrap position analysis
Tasker, Gary; Dunne, Paul
1997-01-01
Forecasting the likelihood of drought conditions is an integral part of managing a water supply storage and delivery system. Position analysis uses a large number of possible flow sequences as inputs to a simulation of a water supply storage and delivery system. For a given set of operating rules and water use requirements, water managers can use such a model to forecast the likelihood of specified outcomes such as reservoir levels falling below a specified level or streamflows falling below statutory passing flows a few months ahead conditioned on the current reservoir levels and streamflows. The large number of possible flow sequences are generated using a stochastic streamflow model with a random resampling of innovations. The advantages of this resampling scheme, called bootstrap position analysis, are that it does not rely on the unverifiable assumption of normality and it allows incorporation of long-range weather forecasts into the analysis.
NASA Astrophysics Data System (ADS)
Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan
2018-04-01
Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Frequency Analysis Using Bootstrap Method and SIR Algorithm for Prevention of Natural Disasters
NASA Astrophysics Data System (ADS)
Kim, T.; Kim, Y. S.
2017-12-01
The frequency analysis of hydrometeorological data is one of the most important factors in response to natural disaster damage, and design standards for a disaster prevention facilities. In case of frequency analysis of hydrometeorological data, it assumes that observation data have statistical stationarity, and a parametric method considering the parameter of probability distribution is applied. For a parametric method, it is necessary to sufficiently collect reliable data; however, snowfall observations are needed to compensate for insufficient data in Korea, because of reducing the number of days for snowfall observations and mean maximum daily snowfall depth due to climate change. In this study, we conducted the frequency analysis for snowfall using the Bootstrap method and SIR algorithm which are the resampling methods that can overcome the problems of insufficient data. For the 58 meteorological stations distributed evenly in Korea, the probability of snowfall depth was estimated by non-parametric frequency analysis using the maximum daily snowfall depth data. The results show that probabilistic daily snowfall depth by frequency analysis is decreased at most stations, and most stations representing the rate of change were found to be consistent in both parametric and non-parametric frequency analysis. This study shows that the resampling methods can do the frequency analysis of the snowfall depth that has insufficient observed samples, which can be applied to interpretation of other natural disasters such as summer typhoons with seasonal characteristics. Acknowledgment.This research was supported by a grant(MPSS-NH-2015-79) from Disaster Prediction and Mitigation Technology Development Program funded by Korean Ministry of Public Safety and Security(MPSS).
Efficient bootstrap estimates for tail statistics
NASA Astrophysics Data System (ADS)
Breivik, Øyvind; Aarnes, Ole Johan
2017-03-01
Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.
Determination of Time Dependent Virus Inactivation Rates
NASA Astrophysics Data System (ADS)
Chrysikopoulos, C. V.; Vogler, E. T.
2003-12-01
A methodology is developed for estimating temporally variable virus inactivation rate coefficients from experimental virus inactivation data. The methodology consists of a technique for slope estimation of normalized virus inactivation data in conjunction with a resampling parameter estimation procedure. The slope estimation technique is based on a relatively flexible geostatistical method known as universal kriging. Drift coefficients are obtained by nonlinear fitting of bootstrap samples and the corresponding confidence intervals are obtained by bootstrap percentiles. The proposed methodology yields more accurate time dependent virus inactivation rate coefficients than those estimated by fitting virus inactivation data to a first-order inactivation model. The methodology is successfully applied to a set of poliovirus batch inactivation data. Furthermore, the importance of accurate inactivation rate coefficient determination on virus transport in water saturated porous media is demonstrated with model simulations.
Uncertainty Quantification in High Throughput Screening ...
Using uncertainty quantification, we aim to improve the quality of modeling data from high throughput screening assays for use in risk assessment. ToxCast is a large-scale screening program that analyzes thousands of chemicals using over 800 assays representing hundreds of biochemical and cellular processes, including endocrine disruption, cytotoxicity, and zebrafish development. Over 2.6 million concentration response curves are fit to models to extract parameters related to potency and efficacy. Models built on ToxCast results are being used to rank and prioritize the toxicological risk of tested chemicals and to predict the toxicity of tens of thousands of chemicals not yet tested in vivo. However, the data size also presents challenges. When fitting the data, the choice of models, model selection strategy, and hit call criteria must reflect the need for computational efficiency and robustness, requiring hard and somewhat arbitrary cutoffs. When coupled with unavoidable noise in the experimental concentration response data, these hard cutoffs cause uncertainty in model parameters and the hit call itself. The uncertainty will then propagate through all of the models built on the data. Left unquantified, this uncertainty makes it difficult to fully interpret the data for risk assessment. We used bootstrap resampling methods to quantify the uncertainty in fitting models to the concentration response data. Bootstrap resampling determines confidence intervals for
Yang, Xianjin; Chen, Xiao; Carrigan, Charles R.; ...
2014-06-03
A parametric bootstrap approach is presented for uncertainty quantification (UQ) of CO₂ saturation derived from electrical resistance tomography (ERT) data collected at the Cranfield, Mississippi (USA) carbon sequestration site. There are many sources of uncertainty in ERT-derived CO₂ saturation, but we focus on how the ERT observation errors propagate to the estimated CO₂ saturation in a nonlinear inversion process. Our UQ approach consists of three steps. We first estimated the observational errors from a large number of reciprocal ERT measurements. The second step was to invert the pre-injection baseline data and the resulting resistivity tomograph was used as the priormore » information for nonlinear inversion of time-lapse data. We assigned a 3% random noise to the baseline model. Finally, we used a parametric bootstrap method to obtain bootstrap CO₂ saturation samples by deterministically solving a nonlinear inverse problem many times with resampled data and resampled baseline models. Then the mean and standard deviation of CO₂ saturation were calculated from the bootstrap samples. We found that the maximum standard deviation of CO₂ saturation was around 6% with a corresponding maximum saturation of 30% for a data set collected 100 days after injection began. There was no apparent spatial correlation between the mean and standard deviation of CO₂ saturation but the standard deviation values increased with time as the saturation increased. The uncertainty in CO₂ saturation also depends on the ERT reciprocal error threshold used to identify and remove noisy data and inversion constraints such as temporal roughness. Five hundred realizations requiring 3.5 h on a single 12-core node were needed for the nonlinear Monte Carlo inversion to arrive at stationary variances while the Markov Chain Monte Carlo (MCMC) stochastic inverse approach may expend days for a global search. This indicates that UQ of 2D or 3D ERT inverse problems can be performed on a laptop or desktop PC.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra
Low-cost flight-based hyperspectral imaging systems have the potential to provide important information for ecosystem and environmental studies as well as aide in land management. To realize this potential, methods must be developed to provide large-area surface reflectance data allowing for temporal data sets at the mesoscale. This paper describes a bootstrap method of producing a large-area, radiometrically referenced hyperspectral data set using the Landsat surface reflectance (LaSRC) data product as a reference target. The bootstrap method uses standard hyperspectral processing techniques that are extended to remove uneven illumination conditions between flight passes, allowing for radiometrically self-consistent data after mosaicking. Throughmore » selective spectral and spatial resampling, LaSRC data are used as a radiometric reference target. Advantages of the bootstrap method include the need for minimal site access, no ancillary instrumentation, and automated data processing. Data from two hyperspectral flights over the same managed agricultural and unmanaged range land covering approximately 5.8 km 2 acquired on June 21, 2014 and June 24, 2015 are presented. As a result, data from a flight over agricultural land collected on June 6, 2016 are compared with concurrently collected ground-based reflectance spectra as a means of validation.« less
McCann, Cooper; Repasky, Kevin S.; Morin, Mikindra; ...
2017-07-25
Low-cost flight-based hyperspectral imaging systems have the potential to provide important information for ecosystem and environmental studies as well as aide in land management. To realize this potential, methods must be developed to provide large-area surface reflectance data allowing for temporal data sets at the mesoscale. This paper describes a bootstrap method of producing a large-area, radiometrically referenced hyperspectral data set using the Landsat surface reflectance (LaSRC) data product as a reference target. The bootstrap method uses standard hyperspectral processing techniques that are extended to remove uneven illumination conditions between flight passes, allowing for radiometrically self-consistent data after mosaicking. Throughmore » selective spectral and spatial resampling, LaSRC data are used as a radiometric reference target. Advantages of the bootstrap method include the need for minimal site access, no ancillary instrumentation, and automated data processing. Data from two hyperspectral flights over the same managed agricultural and unmanaged range land covering approximately 5.8 km 2 acquired on June 21, 2014 and June 24, 2015 are presented. As a result, data from a flight over agricultural land collected on June 6, 2016 are compared with concurrently collected ground-based reflectance spectra as a means of validation.« less
Kolmogorov-Smirnov test for spatially correlated data
Olea, R.A.; Pawlowsky-Glahn, V.
2009-01-01
The Kolmogorov-Smirnov test is a convenient method for investigating whether two underlying univariate probability distributions can be regarded as undistinguishable from each other or whether an underlying probability distribution differs from a hypothesized distribution. Application of the test requires that the sample be unbiased and the outcomes be independent and identically distributed, conditions that are violated in several degrees by spatially continuous attributes, such as topographical elevation. A generalized form of the bootstrap method is used here for the purpose of modeling the distribution of the statistic D of the Kolmogorov-Smirnov test. The innovation is in the resampling, which in the traditional formulation of bootstrap is done by drawing from the empirical sample with replacement presuming independence. The generalization consists of preparing resamplings with the same spatial correlation as the empirical sample. This is accomplished by reading the value of unconditional stochastic realizations at the sampling locations, realizations that are generated by simulated annealing. The new approach was tested by two empirical samples taken from an exhaustive sample closely following a lognormal distribution. One sample was a regular, unbiased sample while the other one was a clustered, preferential sample that had to be preprocessed. Our results show that the p-value for the spatially correlated case is always larger that the p-value of the statistic in the absence of spatial correlation, which is in agreement with the fact that the information content of an uncorrelated sample is larger than the one for a spatially correlated sample of the same size. ?? Springer-Verlag 2008.
Bishara, Anthony J; Hittner, James B
2012-09-01
It is well known that when data are nonnormally distributed, a test of the significance of Pearson's r may inflate Type I error rates and reduce power. Statistics textbooks and the simulation literature provide several alternatives to Pearson's correlation. However, the relative performance of these alternatives has been unclear. Two simulation studies were conducted to compare 12 methods, including Pearson, Spearman's rank-order, transformation, and resampling approaches. With most sample sizes (n ≥ 20), Type I and Type II error rates were minimized by transforming the data to a normal shape prior to assessing the Pearson correlation. Among transformation approaches, a general purpose rank-based inverse normal transformation (i.e., transformation to rankit scores) was most beneficial. However, when samples were both small (n ≤ 10) and extremely nonnormal, the permutation test often outperformed other alternatives, including various bootstrap tests.
Bias-Corrected Estimation of Noncentrality Parameters of Covariance Structure Models
ERIC Educational Resources Information Center
Raykov, Tenko
2005-01-01
A bias-corrected estimator of noncentrality parameters of covariance structure models is discussed. The approach represents an application of the bootstrap methodology for purposes of bias correction, and utilizes the relation between average of resample conventional noncentrality parameter estimates and their sample counterpart. The…
Resampling to Address the Winner's Curse in Genetic Association Analysis of Time to Event
Poirier, Julia G.; Faye, Laura L.; Dimitromanolakis, Apostolos; Paterson, Andrew D.; Sun, Lei
2015-01-01
ABSTRACT The “winner's curse” is a subtle and difficult problem in interpretation of genetic association, in which association estimates from large‐scale gene detection studies are larger in magnitude than those from subsequent replication studies. This is practically important because use of a biased estimate from the original study will yield an underestimate of sample size requirements for replication, leaving the investigators with an underpowered study. Motivated by investigation of the genetics of type 1 diabetes complications in a longitudinal cohort of participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Genetics Study, we apply a bootstrap resampling method in analysis of time to nephropathy under a Cox proportional hazards model, examining 1,213 single‐nucleotide polymorphisms (SNPs) in 201 candidate genes custom genotyped in 1,361 white probands. Among 15 top‐ranked SNPs, bias reduction in log hazard ratio estimates ranges from 43.1% to 80.5%. In simulation studies based on the observed DCCT/EDIC genotype data, genome‐wide bootstrap estimates for false‐positive SNPs and for true‐positive SNPs with low‐to‐moderate power are closer to the true values than uncorrected naïve estimates, but tend to overcorrect SNPs with high power. This bias‐reduction technique is generally applicable for complex trait studies including quantitative, binary, and time‐to‐event traits. PMID:26411674
Wang, Jung-Han; Abdel-Aty, Mohamed; Wang, Ling
2017-07-01
There have been plenty of studies intended to use different methods, for example, empirical Bayes before-after methods, to get accurate estimation of CMFs. All of them have different assumptions toward crash count if there was no treatment. Additionally, another major assumption is that multiple sites share the same true CMF. Under this assumption, the CMF at an individual intersection is randomly drawn from a normally distributed population of CMFs at all intersections. Since CMFs are non-zero values, the population of all CMFs might not follow normal distributions, and even if it does, the true mean of CMFs at some intersections may be different from that at others. Therefore, a bootstrap method based on before-after empirical Bayes theory was proposed to estimate CMFs, but it did not make distributional assumptions. This bootstrap procedure has the added benefit of producing a measure of CMF stability. Furthermore, based on the bootstrapped CMF, a new CMF precision rating method was proposed to evaluate the reliability of CMFs. This study chose 29 urban four-legged intersections as treated sites, and their controls were changed from stop-controlled to signal-controlled. Meanwhile, 124 urban four-legged stop-controlled intersections were selected as reference sites. At first, different safety performance functions (SPFs) were applied to five crash categories, and it was found that each crash category had different optimal SPF form. Then, the CMFs of these five crash categories were estimated using the bootstrap empirical Bayes method. The results of the bootstrapped method showed that signalization significantly decreased Angle+Left-Turn crashes, and its CMF had the highest precision. While, the CMF for Rear-End crashes was unreliable. For KABCO, KABC, and KAB crashes, their CMFs were proved to be reliable for the majority of intersections, but the estimated effect of signalization may be not accurate at some sites. Copyright © 2017 Elsevier Ltd. All rights reserved.
Applying Bootstrap Resampling to Compute Confidence Intervals for Various Statistics with R
ERIC Educational Resources Information Center
Dogan, C. Deha
2017-01-01
Background: Most of the studies in academic journals use p values to represent statistical significance. However, this is not a good indicator of practical significance. Although confidence intervals provide information about the precision of point estimation, they are, unfortunately, rarely used. The infrequent use of confidence intervals might…
Confidence Intervals for Effect Sizes: Applying Bootstrap Resampling
ERIC Educational Resources Information Center
Banjanovic, Erin S.; Osborne, Jason W.
2016-01-01
Confidence intervals for effect sizes (CIES) provide readers with an estimate of the strength of a reported statistic as well as the relative precision of the point estimate. These statistics offer more information and context than null hypothesis statistic testing. Although confidence intervals have been recommended by scholars for many years,…
Bootstrapping under constraint for the assessment of group behavior in human contact networks
NASA Astrophysics Data System (ADS)
Tremblay, Nicolas; Barrat, Alain; Forest, Cary; Nornberg, Mark; Pinton, Jean-François; Borgnat, Pierre
2013-11-01
The increasing availability of time- and space-resolved data describing human activities and interactions gives insights into both static and dynamic properties of human behavior. In practice, nevertheless, real-world data sets can often be considered as only one realization of a particular event. This highlights a key issue in social network analysis: the statistical significance of estimated properties. In this context, we focus here on the assessment of quantitative features of specific subset of nodes in empirical networks. We present a method of statistical resampling based on bootstrapping groups of nodes under constraints within the empirical network. The method enables us to define acceptance intervals for various null hypotheses concerning relevant properties of the subset of nodes under consideration in order to characterize by a statistical test its behavior as “normal” or not. We apply this method to a high-resolution data set describing the face-to-face proximity of individuals during two colocated scientific conferences. As a case study, we show how to probe whether colocating the two conferences succeeded in bringing together the two corresponding groups of scientists.
Chaibub Neto, Elias
2015-01-01
In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling. PMID:26125965
permGPU: Using graphics processing units in RNA microarray association studies.
Shterev, Ivo D; Jung, Sin-Ho; George, Stephen L; Owzar, Kouros
2010-06-16
Many analyses of microarray association studies involve permutation, bootstrap resampling and cross-validation, that are ideally formulated as embarrassingly parallel computing problems. Given that these analyses are computationally intensive, scalable approaches that can take advantage of multi-core processor systems need to be developed. We have developed a CUDA based implementation, permGPU, that employs graphics processing units in microarray association studies. We illustrate the performance and applicability of permGPU within the context of permutation resampling for a number of test statistics. An extensive simulation study demonstrates a dramatic increase in performance when using permGPU on an NVIDIA GTX 280 card compared to an optimized C/C++ solution running on a conventional Linux server. permGPU is available as an open-source stand-alone application and as an extension package for the R statistical environment. It provides a dramatic increase in performance for permutation resampling analysis in the context of microarray association studies. The current version offers six test statistics for carrying out permutation resampling analyses for binary, quantitative and censored time-to-event traits.
Mist net effort required to inventory a forest bat species assemblage.
Theodore J. Weller; Danny C. Lee
2007-01-01
Little quantitative information exists about the survey effort necessary to inventory temperate bat species assemblages. We used a bootstrap resampling lgorithm to estimate the number of mist net surveys required to capture individuals from 9 species at both study area and site levels using data collected in a forested watershed in northwestern California, USA, during...
Grain Size and Parameter Recovery with TIMSS and the General Diagnostic Model
ERIC Educational Resources Information Center
Skaggs, Gary; Wilkins, Jesse L. M.; Hein, Serge F.
2016-01-01
The purpose of this study was to explore the degree of grain size of the attributes and the sample sizes that can support accurate parameter recovery with the General Diagnostic Model (GDM) for a large-scale international assessment. In this resampling study, bootstrap samples were obtained from the 2003 Grade 8 TIMSS in Mathematics at varying…
Bennett, Iain; Paracha, Noman; Abrams, Keith; Ray, Joshua
2018-01-01
Rank Preserving Structural Failure Time models are one of the most commonly used statistical methods to adjust for treatment switching in oncology clinical trials. The method is often applied in a decision analytic model without appropriately accounting for additional uncertainty when determining the allocation of health care resources. The aim of the study is to describe novel approaches to adequately account for uncertainty when using a Rank Preserving Structural Failure Time model in a decision analytic model. Using two examples, we tested and compared the performance of the novel Test-based method with the resampling bootstrap method and with the conventional approach of no adjustment. In the first example, we simulated life expectancy using a simple decision analytic model based on a hypothetical oncology trial with treatment switching. In the second example, we applied the adjustment method on published data when no individual patient data were available. Mean estimates of overall and incremental life expectancy were similar across methods. However, the bootstrapped and test-based estimates consistently produced greater estimates of uncertainty compared with the estimate without any adjustment applied. Similar results were observed when using the test based approach on a published data showing that failing to adjust for uncertainty led to smaller confidence intervals. Both the bootstrapping and test-based approaches provide a solution to appropriately incorporate uncertainty, with the benefit that the latter can implemented by researchers in the absence of individual patient data. Copyright © 2018 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Moreno, Claudia E.; Guevara, Roger; Sánchez-Rojas, Gerardo; Téllez, Dianeis; Verdú, José R.
2008-01-01
Environmental assessment at the community level in highly diverse ecosystems is limited by taxonomic constraints and statistical methods requiring true replicates. Our objective was to show how diverse systems can be studied at the community level using higher taxa as biodiversity surrogates, and re-sampling methods to allow comparisons. To illustrate this we compared the abundance, richness, evenness and diversity of the litter fauna in a pine-oak forest in central Mexico among seasons, sites and collecting methods. We also assessed changes in the abundance of trophic guilds and evaluated the relationships between community parameters and litter attributes. With the direct search method we observed differences in the rate of taxa accumulation between sites. Bootstrap analysis showed that abundance varied significantly between seasons and sampling methods, but not between sites. In contrast, diversity and evenness were significantly higher at the managed than at the non-managed site. Tree regression models show that abundance varied mainly between seasons, whereas taxa richness was affected by litter attributes (composition and moisture content). The abundance of trophic guilds varied among methods and seasons, but overall we found that parasitoids, predators and detrivores decreased under management. Therefore, although our results suggest that management has positive effects on the richness and diversity of litter fauna, the analysis of trophic guilds revealed a contrasting story. Our results indicate that functional groups and re-sampling methods may be used as tools for describing community patterns in highly diverse systems. Also, the higher taxa surrogacy could be seen as a preliminary approach when it is not possible to identify the specimens at a low taxonomic level in a reasonable period of time and in a context of limited financial resources, but further studies are needed to test whether the results are specific to a system or whether they are general with regards to land management.
ERIC Educational Resources Information Center
Zhang, Dongbo; Koda, Keiko
2012-01-01
Within the Structural Equation Modeling framework, this study tested the direct and indirect effects of morphological awareness and lexical inferencing ability on L2 vocabulary knowledge and reading comprehension among advanced Chinese EFL readers in a university in China. Using both regular z-test and the bootstrapping (data-based resampling)…
Fast Computation of the Two-Point Correlation Function in the Age of Big Data
NASA Astrophysics Data System (ADS)
Pellegrino, Andrew; Timlin, John
2018-01-01
We present a new code which quickly computes the two-point correlation function for large sets of astronomical data. This code combines the ease of use of Python with the speed of parallel shared libraries written in C. We include the capability to compute the auto- and cross-correlation statistics, and allow the user to calculate the three-dimensional and angular correlation functions. Additionally, the code automatically divides the user-provided sky masks into contiguous subsamples of similar size, using the HEALPix pixelization scheme, for the purpose of resampling. Errors are computed using jackknife and bootstrap resampling in a way that adds negligible extra runtime, even with many subsamples. We demonstrate comparable speed with other clustering codes, and code accuracy compared to known and analytic results.
NASA Astrophysics Data System (ADS)
Traversaro, Francisco; O. Redelico, Francisco
2018-04-01
In nonlinear dynamics, and to a lesser extent in other fields, a widely used measure of complexity is the Permutation Entropy. But there is still no known method to determine the accuracy of this measure. There has been little research on the statistical properties of this quantity that characterize time series. The literature describes some resampling methods of quantities used in nonlinear dynamics - as the largest Lyapunov exponent - but these seems to fail. In this contribution, we propose a parametric bootstrap methodology using a symbolic representation of the time series to obtain the distribution of the Permutation Entropy estimator. We perform several time series simulations given by well-known stochastic processes: the 1/fα noise family, and show in each case that the proposed accuracy measure is as efficient as the one obtained by the frequentist approach of repeating the experiment. The complexity of brain electrical activity, measured by the Permutation Entropy, has been extensively used in epilepsy research for detection in dynamical changes in electroencephalogram (EEG) signal with no consideration of the variability of this complexity measure. An application of the parametric bootstrap methodology is used to compare normal and pre-ictal EEG signals.
White, H; Racine, J
2001-01-01
We propose tests for individual and joint irrelevance of network inputs. Such tests can be used to determine whether an input or group of inputs "belong" in a particular model, thus permitting valid statistical inference based on estimated feedforward neural-network models. The approaches employ well-known statistical resampling techniques. We conduct a small Monte Carlo experiment showing that our tests have reasonable level and power behavior, and we apply our methods to examine whether there are predictable regularities in foreign exchange rates. We find that exchange rates do appear to contain information that is exploitable for enhanced point prediction, but the nature of the predictive relations evolves through time.
Comparison of variance estimators for meta-analysis of instrumental variable estimates
Schmidt, AF; Hingorani, AD; Jefferis, BJ; White, J; Groenwold, RHH; Dudbridge, F
2016-01-01
Abstract Background: Mendelian randomization studies perform instrumental variable (IV) analysis using genetic IVs. Results of individual Mendelian randomization studies can be pooled through meta-analysis. We explored how different variance estimators influence the meta-analysed IV estimate. Methods: Two versions of the delta method (IV before or after pooling), four bootstrap estimators, a jack-knife estimator and a heteroscedasticity-consistent (HC) variance estimator were compared using simulation. Two types of meta-analyses were compared, a two-stage meta-analysis pooling results, and a one-stage meta-analysis pooling datasets. Results: Using a two-stage meta-analysis, coverage of the point estimate using bootstrapped estimators deviated from nominal levels at weak instrument settings and/or outcome probabilities ≤ 0.10. The jack-knife estimator was the least biased resampling method, the HC estimator often failed at outcome probabilities ≤ 0.50 and overall the delta method estimators were the least biased. In the presence of between-study heterogeneity, the delta method before meta-analysis performed best. Using a one-stage meta-analysis all methods performed equally well and better than two-stage meta-analysis of greater or equal size. Conclusions: In the presence of between-study heterogeneity, two-stage meta-analyses should preferentially use the delta method before meta-analysis. Weak instrument bias can be reduced by performing a one-stage meta-analysis. PMID:27591262
Terrestrial laser scanning used to detect asymmetries in boat hulls
NASA Astrophysics Data System (ADS)
Roca-Pardiñas, Javier; López-Alvarez, Francisco; Ordóñez, Celestino; Menéndez, Agustín; Bernardo-Sánchez, Antonio
2012-01-01
We describe a methodology for identifying asymmetries in boat hull sections reconstructed from point clouds captured using a terrestrial laser scanner (TLS). A surface was first fit to the point cloud using a nonparametric regression method that permitted the construction of a continuous smooth surface. Asymmetries in cross-sections of the surface were identified using a bootstrap resampling technique that took into account uncertainty in the coordinates of the scanned points. Each reconstructed section was analyzed to check, for a given level of significance, that it was within the confidence interval for the theoretical symmetrical section. The method was applied to the study of asymmetries in a medium-sized yacht. Identified were differences of up to 5 cm between the real and theoretical sections in some parts of the hull.
Tests for informative cluster size using a novel balanced bootstrap scheme.
Nevalainen, Jaakko; Oja, Hannu; Datta, Somnath
2017-07-20
Clustered data are often encountered in biomedical studies, and to date, a number of approaches have been proposed to analyze such data. However, the phenomenon of informative cluster size (ICS) is a challenging problem, and its presence has an impact on the choice of a correct analysis methodology. For example, Dutta and Datta (2015, Biometrics) presented a number of marginal distributions that could be tested. Depending on the nature and degree of informativeness of the cluster size, these marginal distributions may differ, as do the choices of the appropriate test. In particular, they applied their new test to a periodontal data set where the plausibility of the informativeness was mentioned, but no formal test for the same was conducted. We propose bootstrap tests for testing the presence of ICS. A balanced bootstrap method is developed to successfully estimate the null distribution by merging the re-sampled observations with closely matching counterparts. Relying on the assumption of exchangeability within clusters, the proposed procedure performs well in simulations even with a small number of clusters, at different distributions and against different alternative hypotheses, thus making it an omnibus test. We also explain how to extend the ICS test to a regression setting and thereby enhancing its practical utility. The methodologies are illustrated using the periodontal data set mentioned earlier. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
New Methods for Estimating Seasonal Potential Climate Predictability
NASA Astrophysics Data System (ADS)
Feng, Xia
This study develops two new statistical approaches to assess the seasonal potential predictability of the observed climate variables. One is the univariate analysis of covariance (ANOCOVA) model, a combination of autoregressive (AR) model and analysis of variance (ANOVA). It has the advantage of taking into account the uncertainty of the estimated parameter due to sampling errors in statistical test, which is often neglected in AR based methods, and accounting for daily autocorrelation that is not considered in traditional ANOVA. In the ANOCOVA model, the seasonal signals arising from external forcing are determined to be identical or not to assess any interannual variability that may exist is potentially predictable. The bootstrap is an attractive alternative method that requires no hypothesis model and is available no matter how mathematically complicated the parameter estimator. This method builds up the empirical distribution of the interannual variance from the resamplings drawn with replacement from the given sample, in which the only predictability in seasonal means arises from the weather noise. These two methods are applied to temperature and water cycle components including precipitation and evaporation, to measure the extent to which the interannual variance of seasonal means exceeds the unpredictable weather noise compared with the previous methods, including Leith-Shukla-Gutzler (LSG), Madden, and Katz. The potential predictability of temperature from ANOCOVA model, bootstrap, LSG and Madden exhibits a pronounced tropical-extratropical contrast with much larger predictability in the tropics dominated by El Nino/Southern Oscillation (ENSO) than in higher latitudes where strong internal variability lowers predictability. Bootstrap tends to display highest predictability of the four methods, ANOCOVA lies in the middle, while LSG and Madden appear to generate lower predictability. Seasonal precipitation from ANOCOVA, bootstrap, and Katz, resembling that for temperature, is more predictable over the tropical regions, and less predictable in extropics. Bootstrap and ANOCOVA are in good agreement with each other, both methods generating larger predictability than Katz. The seasonal predictability of evaporation over land bears considerably similarity with that of temperature using ANOCOVA, bootstrap, LSG and Madden. The remote SST forcing and soil moisture reveal substantial seasonality in their relations with the potentially predictable seasonal signals. For selected regions, either SST or soil moisture or both shows significant relationships with predictable signals, hence providing indirect insight on slowly varying boundary processes involved to enable useful seasonal climate predication. A multivariate analysis of covariance (MANOCOVA) model is established to identify distinctive predictable patterns, which are uncorrelated with each other. Generally speaking, the seasonal predictability from multivariate model is consistent with that from ANOCOVA. Besides unveiling the spatial variability of predictability, MANOCOVA model also reveals the temporal variability of each predictable pattern, which could be linked to the periodic oscillations.
Analysis of spreadable cheese by Raman spectroscopy and chemometric tools.
Oliveira, Kamila de Sá; Callegaro, Layce de Souza; Stephani, Rodrigo; Almeida, Mariana Ramos; de Oliveira, Luiz Fernando Cappa
2016-03-01
In this work, FT-Raman spectroscopy was explored to evaluate spreadable cheese samples. A partial least squares discriminant analysis was employed to identify the spreadable cheese samples containing starch. To build the models, two types of samples were used: commercial samples and samples manufactured in local industries. The method of supervised classification PLS-DA was employed to classify the samples as adulterated or without starch. Multivariate regression was performed using the partial least squares method to quantify the starch in the spreadable cheese. The limit of detection obtained for the model was 0.34% (w/w) and the limit of quantification was 1.14% (w/w). The reliability of the models was evaluated by determining the confidence interval, which was calculated using the bootstrap re-sampling technique. The results show that the classification models can be used to complement classical analysis and as screening methods. Copyright © 2015 Elsevier Ltd. All rights reserved.
Simulation and statistics: Like rhythm and song
NASA Astrophysics Data System (ADS)
Othman, Abdul Rahman
2013-04-01
Simulation has been introduced to solve problems in the form of systems. By using this technique the following two problems can be overcome. First, a problem that has an analytical solution but the cost of running an experiment to solve is high in terms of money and lives. Second, a problem exists but has no analytical solution. In the field of statistical inference the second problem is often encountered. With the advent of high-speed computing devices, a statistician can now use resampling techniques such as the bootstrap and permutations to form pseudo sampling distribution that will lead to the solution of the problem that cannot be solved analytically. This paper discusses how a Monte Carlo simulation was and still being used to verify the analytical solution in inference. This paper also discusses the resampling techniques as simulation techniques. The misunderstandings about these two techniques are examined. The successful usages of both techniques are also explained.
Explanation of Two Anomalous Results in Statistical Mediation Analysis.
Fritz, Matthew S; Taylor, Aaron B; Mackinnon, David P
2012-01-01
Previous studies of different methods of testing mediation models have consistently found two anomalous results. The first result is elevated Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap tests not found in nonresampling tests or in resampling tests that did not include a bias correction. This is of special concern as the bias-corrected bootstrap is often recommended and used due to its higher statistical power compared with other tests. The second result is statistical power reaching an asymptote far below 1.0 and in some conditions even declining slightly as the size of the relationship between X and M , a , increased. Two computer simulations were conducted to examine these findings in greater detail. Results from the first simulation found that the increased Type I error rates for the bias-corrected and accelerated bias-corrected bootstrap are a function of an interaction between the size of the individual paths making up the mediated effect and the sample size, such that elevated Type I error rates occur when the sample size is small and the effect size of the nonzero path is medium or larger. Results from the second simulation found that stagnation and decreases in statistical power as a function of the effect size of the a path occurred primarily when the path between M and Y , b , was small. Two empirical mediation examples are provided using data from a steroid prevention and health promotion program aimed at high school football players (Athletes Training and Learning to Avoid Steroids; Goldberg et al., 1996), one to illustrate a possible Type I error for the bias-corrected bootstrap test and a second to illustrate a loss in power related to the size of a . Implications of these findings are discussed.
Variable Selection in the Presence of Missing Data: Imputation-based Methods.
Zhao, Yize; Long, Qi
2017-01-01
Variable selection plays an essential role in regression analysis as it identifies important variables that associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid used under the assumptions of missing at random (MAR) and missing completely at random (MCAR), largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combine variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under-developed and offers fertile ground for further research.
Dyson, Greg; Frikke-Schmidt, Ruth; Nordestgaard, Børge G; Tybjaerg-Hansen, Anne; Sing, Charles F
2009-05-01
This article extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (Genet Epidemiol 31:515-527) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2,258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors.
Dyson, Greg; Frikke-Schmidt, Ruth; Nordestgaard, Børge G.; Tybjærg-Hansen, Anne; Sing, Charles F.
2009-01-01
This paper extends the Patient Rule-Induction Method (PRIM) for modeling cumulative incidence of disease developed by Dyson et al. (2007) to include the simultaneous consideration of non-additive combinations of predictor variables, a significance test of each combination, an adjustment for multiple testing and a confidence interval for the estimate of the cumulative incidence of disease in each partition. We employ the partitioning algorithm component of the Combinatorial Partitioning Method (CPM) to construct combinations of predictors, permutation testing to assess the significance of each combination, theoretical arguments for incorporating a multiple testing adjustment and bootstrap resampling to produce the confidence intervals. An illustration of this revised PRIM utilizing a sample of 2258 European male participants from the Copenhagen City Heart Study is presented that assesses the utility of genetic variants in predicting the presence of ischemic heart disease beyond the established risk factors. PMID:19025787
Clausen, J L; Georgian, T; Gardner, K H; Douglas, T A
2018-01-01
Research shows grab sampling is inadequate for evaluating military ranges contaminated with energetics because of their highly heterogeneous distribution. Similar studies assessing the heterogeneous distribution of metals at small-arms ranges (SAR) are lacking. To address this we evaluated whether grab sampling provides appropriate data for performing risk analysis at metal-contaminated SARs characterized with 30-48 grab samples. We evaluated the extractable metal content of Cu, Pb, Sb, and Zn of the field data using a Monte Carlo random resampling with replacement (bootstrapping) simulation approach. Results indicate the 95% confidence interval of the mean for Pb (432 mg/kg) at one site was 200-700 mg/kg with a data range of 5-4500 mg/kg. Considering the U.S. Environmental Protection Agency screening level for lead is 400 mg/kg, the necessity of cleanup at this site is unclear. Resampling based on populations of 7 and 15 samples, a sample size more realistic for the area yielded high false negative rates.
Uncertainties in the cluster-cluster correlation function
NASA Astrophysics Data System (ADS)
Ling, E. N.; Frenk, C. S.; Barrow, J. D.
1986-12-01
The bootstrap resampling technique is applied to estimate sampling errors and significance levels of the two-point correlation functions determined for a subset of the CfA redshift survey of galaxies and a redshift sample of 104 Abell clusters. The angular correlation function for a sample of 1664 Abell clusters is also calculated. The standard errors in xi(r) for the Abell data are found to be considerably larger than quoted 'Poisson errors'. The best estimate for the ratio of the correlation length of Abell clusters (richness class R greater than or equal to 1, distance class D less than or equal to 4) to that of CfA galaxies is 4.2 + 1.4 or - 1.0 (68 percentile error). The enhancement of cluster clustering over galaxy clustering is statistically significant in the presence of resampling errors. The uncertainties found do not include the effects of possible systematic biases in the galaxy and cluster catalogs and could be regarded as lower bounds on the true uncertainty range.
Prediction of resource volumes at untested locations using simple local prediction models
Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.
2006-01-01
This paper shows how local spatial nonparametric prediction models can be applied to estimate volumes of recoverable gas resources at individual undrilled sites, at multiple sites on a regional scale, and to compute confidence bounds for regional volumes based on the distribution of those estimates. An approach that combines cross-validation, the jackknife, and bootstrap procedures is used to accomplish this task. Simulation experiments show that cross-validation can be applied beneficially to select an appropriate prediction model. The cross-validation procedure worked well for a wide range of different states of nature and levels of information. Jackknife procedures are used to compute individual prediction estimation errors at undrilled locations. The jackknife replicates also are used with a bootstrap resampling procedure to compute confidence bounds for the total volume. The method was applied to data (partitioned into a training set and target set) from the Devonian Antrim Shale continuous-type gas play in the Michigan Basin in Otsego County, Michigan. The analysis showed that the model estimate of total recoverable volumes at prediction sites is within 4 percent of the total observed volume. The model predictions also provide frequency distributions of the cell volumes at the production unit scale. Such distributions are the basis for subsequent economic analyses. ?? Springer Science+Business Media, LLC 2007.
Nomogram Prediction of Overall Survival After Curative Irradiation for Uterine Cervical Cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seo, YoungSeok; Yoo, Seong Yul; Kim, Mi-Sook
Purpose: The purpose of this study was to develop a nomogram capable of predicting the probability of 5-year survival after radical radiotherapy (RT) without chemotherapy for uterine cervical cancer. Methods and Materials: We retrospectively analyzed 549 patients that underwent radical RT for uterine cervical cancer between March 1994 and April 2002 at our institution. Multivariate analysis using Cox proportional hazards regression was performed and this Cox model was used as the basis for the devised nomogram. The model was internally validated for discrimination and calibration by bootstrap resampling. Results: By multivariate regression analysis, the model showed that age, hemoglobin levelmore » before RT, Federation Internationale de Gynecologie Obstetrique (FIGO) stage, maximal tumor diameter, lymph node status, and RT dose at Point A significantly predicted overall survival. The survival prediction model demonstrated good calibration and discrimination. The bootstrap-corrected concordance index was 0.67. The predictive ability of the nomogram proved to be superior to FIGO stage (p = 0.01). Conclusions: The devised nomogram offers a significantly better level of discrimination than the FIGO staging system. In particular, it improves predictions of survival probability and could be useful for counseling patients, choosing treatment modalities and schedules, and designing clinical trials. However, before this nomogram is used clinically, it should be externally validated.« less
[The analysis of threshold effect using Empower Stats software].
Lin, Lin; Chen, Chang-zhong; Yu, Xiao-dan
2013-11-01
In many studies about biomedical research factors influence on the outcome variable, it has no influence or has a positive effect within a certain range. Exceeding a certain threshold value, the size of the effect and/or orientation will change, which called threshold effect. Whether there are threshold effects in the analysis of factors (x) on the outcome variable (y), it can be observed through a smooth curve fitting to see whether there is a piecewise linear relationship. And then using segmented regression model, LRT test and Bootstrap resampling method to analyze the threshold effect. Empower Stats software developed by American X & Y Solutions Inc has a threshold effect analysis module. You can input the threshold value at a given threshold segmentation simulated data. You may not input the threshold, but determined the optimal threshold analog data by the software automatically, and calculated the threshold confidence intervals.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Engemann, Kristine; Enquist, Brian J; Sandel, Brody; Boyle, Brad; Jørgensen, Peter M; Morueta-Holme, Naia; Peet, Robert K; Violle, Cyrille; Svenning, Jens-Christian
2015-01-01
Macro-scale species richness studies often use museum specimens as their main source of information. However, such datasets are often strongly biased due to variation in sampling effort in space and time. These biases may strongly affect diversity estimates and may, thereby, obstruct solid inference on the underlying diversity drivers, as well as mislead conservation prioritization. In recent years, this has resulted in an increased focus on developing methods to correct for sampling bias. In this study, we use sample-size-correcting methods to examine patterns of tropical plant diversity in Ecuador, one of the most species-rich and climatically heterogeneous biodiversity hotspots. Species richness estimates were calculated based on 205,735 georeferenced specimens of 15,788 species using the Margalef diversity index, the Chao estimator, the second-order Jackknife and Bootstrapping resampling methods, and Hill numbers and rarefaction. Species richness was heavily correlated with sampling effort, and only rarefaction was able to remove this effect, and we recommend this method for estimation of species richness with “big data” collections. PMID:25692000
Engemann, Kristine; Enquist, Brian J; Sandel, Brody; Boyle, Brad; Jørgensen, Peter M; Morueta-Holme, Naia; Peet, Robert K; Violle, Cyrille; Svenning, Jens-Christian
2015-02-01
Macro-scale species richness studies often use museum specimens as their main source of information. However, such datasets are often strongly biased due to variation in sampling effort in space and time. These biases may strongly affect diversity estimates and may, thereby, obstruct solid inference on the underlying diversity drivers, as well as mislead conservation prioritization. In recent years, this has resulted in an increased focus on developing methods to correct for sampling bias. In this study, we use sample-size-correcting methods to examine patterns of tropical plant diversity in Ecuador, one of the most species-rich and climatically heterogeneous biodiversity hotspots. Species richness estimates were calculated based on 205,735 georeferenced specimens of 15,788 species using the Margalef diversity index, the Chao estimator, the second-order Jackknife and Bootstrapping resampling methods, and Hill numbers and rarefaction. Species richness was heavily correlated with sampling effort, and only rarefaction was able to remove this effect, and we recommend this method for estimation of species richness with "big data" collections.
Predicting survival of men with recurrent prostate cancer after radical prostatectomy.
Dell'Oglio, Paolo; Suardi, Nazareno; Boorjian, Stephen A; Fossati, Nicola; Gandaglia, Giorgio; Tian, Zhe; Moschini, Marco; Capitanio, Umberto; Karakiewicz, Pierre I; Montorsi, Francesco; Karnes, R Jeffrey; Briganti, Alberto
2016-02-01
To develop and externally validate a novel nomogram aimed at predicting cancer-specific mortality (CSM) after biochemical recurrence (BCR) among prostate cancer (PCa) patients treated with radical prostatectomy (RP) with or without adjuvant external beam radiotherapy (aRT) and/or hormonal therapy (aHT). The development cohort included 689 consecutive PCa patients treated with RP between 1987 and 2011 with subsequent BCR, defined as two subsequent prostate-specific antigen values >0.2 ng/ml. Multivariable competing-risks regression analyses tested the predictors of CSM after BCR for the purpose of 5-year CSM nomogram development. Validation (2000 bootstrap resamples) was internally tested. External validation was performed into a population of 6734 PCa patients with BCR after treatment with RP at the Mayo Clinic from 1987 to 2011. The predictive accuracy (PA) was quantified using the receiver operating characteristic-derived area under the curve and the calibration plot method. The 5-year CSM-free survival rate was 83.6% (confidence interval [CI]: 79.6-87.2). In multivariable analyses, pathologic stage T3b or more (hazard ratio [HR]: 7.42; p = 0.008), pathologic Gleason score 8-10 (HR: 2.19; p = 0.003), lymph node invasion (HR: 3.57; p = 0.001), time to BCR (HR: 0.99; p = 0.03) and age at BCR (HR: 1.04; p = 0.04), were each significantly associated with the risk of CSM after BCR. The bootstrap-corrected PA was 87.4% (bootstrap 95% CI: 82.0-91.7%). External validation of our nomogram showed a good PA at 83.2%. We developed and externally validated the first nomogram predicting 5-year CSM applicable to contemporary patients with BCR after RP with or without adjuvant treatment. Copyright © 2015 Elsevier Ltd. All rights reserved.
Shannon, Casey P; Chen, Virginia; Takhar, Mandeep; Hollander, Zsuzsanna; Balshaw, Robert; McManus, Bruce M; Tebbutt, Scott J; Sin, Don D; Ng, Raymond T
2016-11-14
Gene network inference (GNI) algorithms can be used to identify sets of coordinately expressed genes, termed network modules from whole transcriptome gene expression data. The identification of such modules has become a popular approach to systems biology, with important applications in translational research. Although diverse computational and statistical approaches have been devised to identify such modules, their performance behavior is still not fully understood, particularly in complex human tissues. Given human heterogeneity, one important question is how the outputs of these computational methods are sensitive to the input sample set, or stability. A related question is how this sensitivity depends on the size of the sample set. We describe here the SABRE (Similarity Across Bootstrap RE-sampling) procedure for assessing the stability of gene network modules using a re-sampling strategy, introduce a novel criterion for identifying stable modules, and demonstrate the utility of this approach in a clinically-relevant cohort, using two different gene network module discovery algorithms. The stability of modules increased as sample size increased and stable modules were more likely to be replicated in larger sets of samples. Random modules derived from permutated gene expression data were consistently unstable, as assessed by SABRE, and provide a useful baseline value for our proposed stability criterion. Gene module sets identified by different algorithms varied with respect to their stability, as assessed by SABRE. Finally, stable modules were more readily annotated in various curated gene set databases. The SABRE procedure and proposed stability criterion may provide guidance when designing systems biology studies in complex human disease and tissues.
Hung, Peter S-P; Chen, David Q; Davis, Karen D; Zhong, Jidan; Hodaie, Mojgan
2017-01-01
Trigeminal neuralgia (TN) is a chronic neuropathic facial pain disorder that commonly responds to surgery. A proportion of patients, however, do not benefit and suffer ongoing pain. There are currently no imaging tools that permit the prediction of treatment response. To address this paucity, we used diffusion tensor imaging (DTI) to determine whether pre-surgical trigeminal nerve microstructural diffusivities can prognosticate response to TN treatment. In 31 TN patients and 16 healthy controls, multi-tensor tractography was used to extract DTI-derived metrics-axial (AD), radial (RD), mean diffusivity (MD), and fractional anisotropy (FA)-from the cisternal segment, root entry zone and pontine segment of trigeminal nerves for false discovery rate-corrected Student's t -tests. Ipsilateral diffusivities were bootstrap resampled to visualize group-level diffusivity thresholds of long-term response. To obtain an individual-level statistical classifier of surgical response, we conducted discriminant function analysis (DFA) with the type of surgery chosen alongside ipsilateral measurements and ipsilateral/contralateral ratios of AD and RD from all regions of interest as prediction variables. Abnormal diffusivity in the trigeminal pontine fibers, demonstrated by increased AD, highlighted non-responders (n = 14) compared to controls. Bootstrap resampling revealed three ipsilateral diffusivity thresholds of response-pontine AD, MD, cisternal FA-separating 85% of non-responders from responders. DFA produced an 83.9% (71.0% using leave-one-out-cross-validation) accurate prognosticator of response that successfully identified 12/14 non-responders. Our study demonstrates that pre-surgical DTI metrics can serve as a highly predictive, individualized tool to prognosticate surgical response. We further highlight abnormal pontine segment diffusivities as key features of treatment non-response and confirm the axiom that central pain does not commonly benefit from peripheral treatments.
Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard
2010-03-31
Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
Comment on: 'A Poisson resampling method for simulating reduced counts in nuclear medicine images'.
de Nijs, Robin
2015-07-21
In order to be able to calculate half-count images from already acquired data, White and Lawson published their method based on Poisson resampling. They verified their method experimentally by measurements with a Co-57 flood source. In this comment their results are reproduced and confirmed by a direct numerical simulation in Matlab. Not only Poisson resampling, but also two direct redrawing methods were investigated. Redrawing methods were based on a Poisson and a Gaussian distribution. Mean, standard deviation, skewness and excess kurtosis half-count/full-count ratios were determined for all methods, and compared to the theoretical values for a Poisson distribution. Statistical parameters showed the same behavior as in the original note and showed the superiority of the Poisson resampling method. Rounding off before saving of the half count image had a severe impact on counting statistics for counts below 100. Only Poisson resampling was not affected by this, while Gaussian redrawing was less affected by it than Poisson redrawing. Poisson resampling is the method of choice, when simulating half-count (or less) images from full-count images. It simulates correctly the statistical properties, also in the case of rounding off of the images.
Labonté, Josiane; Roy, Jean-Philippe; Dubuc, Jocelyn; Buczinski, Sébastien
2015-06-01
Cardiac troponin I (cTnI) has been shown to be an accurate predictor of myocardial injury in cattle. The point-of-care i-STAT 1 immunoassay can be used to quantify blood cTnI in cattle. However, the cTnI reference interval in whole blood of healthy early lactating dairy cows remains unknown. To determine a blood cTnI reference interval in healthy early lactating Holstein dairy cows using the analyzer i-STAT 1. Forty healthy lactating dairy Holstein cows (0-60 days in milk) were conveniently selected from four commercial dairy farms. Each selected cow was examined by a veterinarian and transthoracic echocardiography was performed. A cow-side blood cTnI dosage was measured at the same time. A bootstrap statistical analysis method using unrestricted resampling was used to determine a reference interval for blood cTnI values. Forty healthy cows were recruited in the study. Median blood cTnI was 0.02 ng/mL (minimum: 0.00, maximum: 0.05). Based on the bootstrap analysis method with 40 cases, the 95th percentile of cTnI values in healthy cows was 0.036 ng/mL (90% CI: 0.02-0.05 ng/mL). A reference interval for blood cTnI values in healthy lactating cows was determined. Further research is needed to determine whether cTnI blood values could be used to diagnose and provide a prognosis for cardiac and noncardiac diseases in lactating dairy cows. Copyright © 2015 Elsevier B.V. All rights reserved.
Parks, Nathan A.; Gannon, Matthew A.; Long, Stephanie M.; Young, Madeleine E.
2016-01-01
Analysis of event-related potential (ERP) data includes several steps to ensure that ERPs meet an appropriate level of signal quality. One such step, subject exclusion, rejects subject data if ERP waveforms fail to meet an appropriate level of signal quality. Subject exclusion is an important quality control step in the ERP analysis pipeline as it ensures that statistical inference is based only upon those subjects exhibiting clear evoked brain responses. This critical quality control step is most often performed simply through visual inspection of subject-level ERPs by investigators. Such an approach is qualitative, subjective, and susceptible to investigator bias, as there are no standards as to what constitutes an ERP of sufficient signal quality. Here, we describe a standardized and objective method for quantifying waveform quality in individual subjects and establishing criteria for subject exclusion. The approach uses bootstrap resampling of ERP waveforms (from a pool of all available trials) to compute a signal-to-noise ratio confidence interval (SNR-CI) for individual subject waveforms. The lower bound of this SNR-CI (SNRLB) yields an effective and objective measure of signal quality as it ensures that ERP waveforms statistically exceed a desired signal-to-noise criterion. SNRLB provides a quantifiable metric of individual subject ERP quality and eliminates the need for subjective evaluation of waveform quality by the investigator. We detail the SNR-CI methodology, establish the efficacy of employing this approach with Monte Carlo simulations, and demonstrate its utility in practice when applied to ERP datasets. PMID:26903849
A Statistical Analysis of Brain Morphology Using Wild Bootstrapping
Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.
2008-01-01
Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909
A non-parametric peak calling algorithm for DamID-Seq.
Li, Renhua; Hempel, Leonie U; Jiang, Tingbo
2015-01-01
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
Collell, Guillem; Prelec, Drazen; Patil, Kaustubh R
2018-01-31
Class imbalance presents a major hurdle in the application of classification methods. A commonly taken approach is to learn ensembles of classifiers using rebalanced data. Examples include bootstrap averaging (bagging) combined with either undersampling or oversampling of the minority class examples. However, rebalancing methods entail asymmetric changes to the examples of different classes, which in turn can introduce their own biases. Furthermore, these methods often require specifying the performance measure of interest a priori, i.e., before learning. An alternative is to employ the threshold moving technique, which applies a threshold to the continuous output of a model, offering the possibility to adapt to a performance measure a posteriori , i.e., a plug-in method. Surprisingly, little attention has been paid to this combination of a bagging ensemble and threshold-moving. In this paper, we study this combination and demonstrate its competitiveness. Contrary to the other resampling methods, we preserve the natural class distribution of the data resulting in well-calibrated posterior probabilities. Additionally, we extend the proposed method to handle multiclass data. We validated our method on binary and multiclass benchmark data sets by using both, decision trees and neural networks as base classifiers. We perform analyses that provide insights into the proposed method.
NASA Astrophysics Data System (ADS)
Pierini, J. O.; Restrepo, J. C.; Aguirre, J.; Bustamante, A. M.; Velásquez, G. J.
2017-04-01
A measure of the variability in seasonal extreme streamflow was estimated for the Colombian Caribbean coast, using monthly time series of freshwater discharge from ten watersheds. The aim was to detect modifications in the streamflow monthly distribution, seasonal trends, variance and extreme monthly values. A 20-year length time moving window, with 1-year successive shiftments, was applied to the monthly series to analyze the seasonal variability of streamflow. The seasonal-windowed data were statistically fitted through the Gamma distribution function. Scale and shape parameters were computed using the Maximum Likelihood Estimation (MLE) and the bootstrap method for 1000 resample. A trend analysis was performed for each windowed-serie, allowing to detect the window of maximum absolute values for trends. Significant temporal shifts in seasonal streamflow distribution and quantiles (QT), were obtained for different frequencies. Wet and dry extremes periods increased significantly in the last decades. Such increase did not occur simultaneously through the region. Some locations exhibited continuous increases only at minimum QT.
Quantifying the risk of extreme aviation accidents
NASA Astrophysics Data System (ADS)
Das, Kumer Pial; Dey, Asim Kumer
2016-12-01
Air travel is considered a safe means of transportation. But when aviation accidents do occur they often result in fatalities. Fortunately, the most extreme accidents occur rarely. However, 2014 was the deadliest year in the past decade causing 111 plane crashes, and among them worst four crashes cause 298, 239, 162 and 116 deaths. In this study, we want to assess the risk of the catastrophic aviation accidents by studying historical aviation accidents. Applying a generalized Pareto model we predict the maximum fatalities from an aviation accident in future. The fitted model is compared with some of its competitive models. The uncertainty in the inferences are quantified using simulated aviation accident series, generated by bootstrap resampling and Monte Carlo simulations.
Mambet Doue, Constance; Roussiau, Nicolas
2016-12-01
This research investigates the indirect effects of religiosity (practice and belief) on therapeutic compliance in 81 HIV-positive patients who are migrants from sub-Saharan Africa (23 men and 58 women). Using analyses of mediation and standard multiple regression, including a resampling procedure by bootstrapping, the role of these mediators (magical-religious beliefs and nonuse of toxic substances) was tested. The results show that, through magical-religious beliefs, religiosity has a negative indirect effect, while with the nonuse of toxic substances, religious practice has a positive indirect effect. Beyond religiosity, the role of mediators is highlighted in the interaction with therapeutic compliance.
Image re-sampling detection through a novel interpolation kernel.
Hilal, Alaa
2018-06-01
Image re-sampling involved in re-size and rotation transformations is an essential element block in a typical digital image alteration. Fortunately, traces left from such processes are detectable, proving that the image has gone a re-sampling transformation. Within this context, we present in this paper two original contributions. First, we propose a new re-sampling interpolation kernel. It depends on five independent parameters that controls its amplitude, angular frequency, standard deviation, and duration. Then, we demonstrate its capacity to imitate the same behavior of the most frequent interpolation kernels used in digital image re-sampling applications. Secondly, the proposed model is used to characterize and detect the correlation coefficients involved in re-sampling transformations. The involved process includes a minimization of an error function using the gradient method. The proposed method is assessed over a large database of 11,000 re-sampled images. Additionally, it is implemented within an algorithm in order to assess images that had undergone complex transformations. Obtained results demonstrate better performance and reduced processing time when compared to a reference method validating the suitability of the proposed approaches. Copyright © 2018 Elsevier B.V. All rights reserved.
Kang, Sokbom; Lee, Jong-Min; Lee, Jae-Kwan; Kim, Jae-Weon; Cho, Chi-Heum; Kim, Seok-Mo; Park, Sang-Yoon; Park, Chan-Yong; Kim, Ki-Tae
2014-03-01
The purpose of this study is to develop a Web-based nomogram for predicting the individualized risk of para-aortic nodal metastasis in incompletely staged patients with endometrial cancer. From 8 institutions, the medical records of 397 patients who underwent pelvic and para-aortic lymphadenectomy as a surgical staging procedure were retrospectively reviewed. A multivariate logistic regression model was created and internally validated by rigorous bootstrap resampling methods. Finally, the model was transformed into a user-friendly Web-based nomogram (http://http://www.kgog.org/nomogram/empa001.html). The rate of para-aortic nodal metastasis was 14.4% (57/397 patients). Using a stepwise variable selection, 4 variables including deep myometrial invasion, non-endometrioid subtype, lymphovascular space invasion, and log-transformed CA-125 levels were finally adopted. After 1000 repetitions of bootstrapping, all of these 4 variables retained a significant association with para-aortic nodal metastasis in the multivariate analysis-deep myometrial invasion (P = 0.001), non-endometrioid histologic subtype (P = 0.034), lymphovascular space invasion (P = 0.003), and log-transformed serum CA-125 levels (P = 0.004). The model showed good discrimination (C statistics = 0.87; 95% confidence interval, 0.82-0.92) and accurate calibration (Hosmer-Lemeshow P = 0.74). This nomogram showed good performance in predicting para-aortic metastasis in patients with endometrial cancer. The tool may be useful in determining the extent of lymphadenectomy after incomplete surgery.
Understanding catastrophizing from a misdirected problem-solving perspective.
Flink, Ida K; Boersma, Katja; MacDonald, Shane; Linton, Steven J
2012-05-01
The aim is to explore pain catastrophizing from a problem-solving perspective. The links between catastrophizing, problem framing, and problem-solving behaviour are examined through two possible models of mediation as inferred by two contemporary and complementary theoretical models, the misdirected problem solving model (Eccleston & Crombez, 2007) and the fear-anxiety-avoidance model (Asmundson, Norton, & Vlaeyen, 2004). In this prospective study, a general population sample (n= 173) with perceived problems with spinal pain filled out questionnaires twice; catastrophizing and problem framing were assessed on the first occasion and health care seeking (as a proxy for medically oriented problem solving) was assessed 7 months later. Two different approaches were used to explore whether the data supported any of the proposed models of mediation. First, multiple regressions were used according to traditional recommendations for mediation analyses. Second, a bootstrapping method (n= 1000 bootstrap resamples) was used to explore the significance of the indirect effects in both possible models of mediation. The results verified the concepts included in the misdirected problem solving model. However, the direction of the relations was more in line with the fear-anxiety-avoidance model. More specifically, the mediation analyses provided support for viewing catastrophizing as a mediator of the relation between biomedical problem framing and medically oriented problem-solving behaviour. These findings provide support for viewing catastrophizing from a problem-solving perspective and imply a need to examine and address problem framing and catastrophizing in back pain patients. ©2011 The British Psychological Society.
Wavelet analysis in ecology and epidemiology: impact of statistical tests
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-01-01
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the ‘beta-surrogate’ method. PMID:24284892
Wavelet analysis in ecology and epidemiology: impact of statistical tests.
Cazelles, Bernard; Cazelles, Kévin; Chavez, Mario
2014-02-06
Wavelet analysis is now frequently used to extract information from ecological and epidemiological time series. Statistical hypothesis tests are conducted on associated wavelet quantities to assess the likelihood that they are due to a random process. Such random processes represent null models and are generally based on synthetic data that share some statistical characteristics with the original time series. This allows the comparison of null statistics with those obtained from original time series. When creating synthetic datasets, different techniques of resampling result in different characteristics shared by the synthetic time series. Therefore, it becomes crucial to consider the impact of the resampling method on the results. We have addressed this point by comparing seven different statistical testing methods applied with different real and simulated data. Our results show that statistical assessment of periodic patterns is strongly affected by the choice of the resampling method, so two different resampling techniques could lead to two different conclusions about the same time series. Moreover, our results clearly show the inadequacy of resampling series generated by white noise and red noise that are nevertheless the methods currently used in the wide majority of wavelets applications. Our results highlight that the characteristics of a time series, namely its Fourier spectrum and autocorrelation, are important to consider when choosing the resampling technique. Results suggest that data-driven resampling methods should be used such as the hidden Markov model algorithm and the 'beta-surrogate' method.
Forensic discrimination of copper wire using trace element concentrations.
Dettman, Joshua R; Cassabaum, Alyssa A; Saunders, Christopher P; Snyder, Deanna L; Buscaglia, JoAnn
2014-08-19
Copper may be recovered as evidence in high-profile cases such as thefts and improvised explosive device incidents; comparison of copper samples from the crime scene and those associated with the subject of an investigation can provide probative associative evidence and investigative support. A solution-based inductively coupled plasma mass spectrometry method for measuring trace element concentrations in high-purity copper was developed using standard reference materials. The method was evaluated for its ability to use trace element profiles to statistically discriminate between copper samples considering the precision of the measurement and manufacturing processes. The discriminating power was estimated by comparing samples chosen on the basis of the copper refining and production process to represent the within-source (samples expected to be similar) and between-source (samples expected to be different) variability using multivariate parametric- and empirical-based data simulation models with bootstrap resampling. If the false exclusion rate is set to 5%, >90% of the copper samples can be correctly determined to originate from different sources using a parametric-based model and >87% with an empirical-based approach. These results demonstrate the potential utility of the developed method for the comparison of copper samples encountered as forensic evidence.
Modelling road accident blackspots data with the discrete generalized Pareto distribution.
Prieto, Faustino; Gómez-Déniz, Emilio; Sarabia, José María
2014-10-01
This study shows how road traffic networks events, in particular road accidents on blackspots, can be modelled with simple probabilistic distributions. We considered the number of crashes and the number of fatalities on Spanish blackspots in the period 2003-2007, from Spanish General Directorate of Traffic (DGT). We modelled those datasets, respectively, with the discrete generalized Pareto distribution (a discrete parametric model with three parameters) and with the discrete Lomax distribution (a discrete parametric model with two parameters, and particular case of the previous model). For that, we analyzed the basic properties of both parametric models: cumulative distribution, survival, probability mass, quantile and hazard functions, genesis and rth-order moments; applied two estimation methods of their parameters: the μ and (μ+1) frequency method and the maximum likelihood method; used two goodness-of-fit tests: Chi-square test and discrete Kolmogorov-Smirnov test based on bootstrap resampling; and compared them with the classical negative binomial distribution in terms of absolute probabilities and in models including covariates. We found that those probabilistic models can be useful to describe the road accident blackspots datasets analyzed. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
McCann, Cooper Patrick
Low-cost flight-based hyperspectral imaging systems have the potential to provide valuable information for ecosystem and environmental studies as well as aide in land management and land health monitoring. This thesis describes (1) a bootstrap method of producing mesoscale, radiometrically-referenced hyperspectral data using the Landsat surface reflectance (LaSRC) data product as a reference target, (2) biophysically relevant basis functions to model the reflectance spectra, (3) an unsupervised classification technique based on natural histogram splitting of these biophysically relevant parameters, and (4) local and multi-temporal anomaly detection. The bootstrap method extends standard processing techniques to remove uneven illumination conditions between flight passes, allowing the creation of radiometrically self-consistent data. Through selective spectral and spatial resampling, LaSRC data is used as a radiometric reference target. Advantages of the bootstrap method include the need for minimal site access, no ancillary instrumentation, and automated data processing. Data from a flight on 06/02/2016 is compared with concurrently collected ground based reflectance spectra as a means of validation achieving an average error of 2.74%. Fitting reflectance spectra using basis functions, based on biophysically relevant spectral features, allows both noise and data reductions while shifting information from spectral bands to biophysical features. Histogram splitting is used to determine a clustering based on natural splittings of these fit parameters. The Indian Pines reference data enabled comparisons of the efficacy of this technique to established techniques. The splitting technique is shown to be an improvement over the ISODATA clustering technique with an overall accuracy of 34.3/19.0% before merging and 40.9/39.2% after merging. This improvement is also seen as an improvement of kappa before/after merging of 24.8/30.5 for the histogram splitting technique compared to 15.8/28.5 for ISODATA. Three hyperspectral flights over the Kevin Dome area, covering 1843 ha, acquired 06/21/2014, 06/24/2015 and 06/26/2016 are examined with different methods of anomaly detection. Detection of anomalies within a single data set is examined to determine, on a local scale, areas that are significantly different from the surrounding area. Additionally, the detection and identification of persistent anomalies and non-persistent anomalies was investigated across multiple data sets.
Prosperi, Mattia C F; De Luca, Andrea; Di Giambenedetto, Simona; Bracciale, Laura; Fabbiani, Massimiliano; Cauda, Roberto; Salemi, Marco
2010-10-25
Phylogenetic methods produce hierarchies of molecular species, inferring knowledge about taxonomy and evolution. However, there is not yet a consensus methodology that provides a crisp partition of taxa, desirable when considering the problem of intra/inter-patient quasispecies classification or infection transmission event identification. We introduce the threshold bootstrap clustering (TBC), a new methodology for partitioning molecular sequences, that does not require a phylogenetic tree estimation. The TBC is an incremental partition algorithm, inspired by the stochastic Chinese restaurant process, and takes advantage of resampling techniques and models of sequence evolution. TBC uses as input a multiple alignment of molecular sequences and its output is a crisp partition of the taxa into an automatically determined number of clusters. By varying initial conditions, the algorithm can produce different partitions. We describe a procedure that selects a prime partition among a set of candidate ones and calculates a measure of cluster reliability. TBC was successfully tested for the identification of type-1 human immunodeficiency and hepatitis C virus subtypes, and compared with previously established methodologies. It was also evaluated in the problem of HIV-1 intra-patient quasispecies clustering, and for transmission cluster identification, using a set of sequences from patients with known transmission event histories. TBC has been shown to be effective for the subtyping of HIV and HCV, and for identifying intra-patient quasispecies. To some extent, the algorithm was able also to infer clusters corresponding to events of infection transmission. The computational complexity of TBC is quadratic in the number of taxa, lower than other established methods; in addition, TBC has been enhanced with a measure of cluster reliability. The TBC can be useful to characterise molecular quasipecies in a broad context.
Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D
2013-01-01
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Seurinck, Sylvie; Deschepper, Ellen; Deboch, Bishaw; Verstraete, Willy; Siciliano, Steven
2006-03-01
Microbial source tracking (MST) methods need to be rapid, inexpensive and accurate. Unfortunately, many MST methods provide a wealth of information that is difficult to interpret by the regulators who use this information to make decisions. This paper describes the use of classification tree analysis to interpret the results of a MST method based on fatty acid methyl ester (FAME) profiles of Escherichia coli isolates, and to present results in a format readily interpretable by water quality managers. Raw sewage E. coli isolates and animal E. coli isolates from cow, dog, gull, and horse were isolated and their FAME profiles collected. Correct classification rates determined with leaveone-out cross-validation resulted in an overall low correct classification rate of 61%. A higher overall correct classification rate of 85% was obtained when the animal isolates were pooled together and compared to the raw sewage isolates. Bootstrap aggregation or adaptive resampling and combining of the FAME profile data increased correct classification rates substantially. Other MST methods may be better suited to differentiate between different fecal sources but classification tree analysis has enabled us to distinguish raw sewage from animal E. coli isolates, which previously had not been possible with other multivariate methods such as principal component analysis and cluster analysis.
Peace of Mind, Academic Motivation, and Academic Achievement in Filipino High School Students.
Datu, Jesus Alfonso D
2017-04-09
Recent literature has recognized the advantageous role of low-arousal positive affect such as feelings of peacefulness and internal harmony in collectivist cultures. However, limited research has explored the benefits of low-arousal affective states in the educational setting. The current study examined the link of peace of mind (PoM) to academic motivation (i.e., amotivation, controlled motivation, and autonomous motivation) and academic achievement among 525 Filipino high school students. Findings revealed that PoM was positively associated with academic achievement β = .16, p < .05, autonomous motivation β = .48, p < .001, and controlled motivation β = .25, p < .01. As expected, PoM was negatively related to amotivation β = -.19, p < .05, and autonomous motivation was positively associated with academic achievement β = .52, p < .01. Furthermore, the results of bias-corrected bootstrap analyses at 95% confidence interval based on 5,000 bootstrapped resamples demonstrated that peace of mind had an indirect influence on academic achievement through the mediating effects of autonomous motivation. In terms of the effect sizes, the findings showed that PoM explained about 1% to 18% of the variance in academic achievement and motivation. The theoretical and practical implications of the results are elucidated.
Testing for Granger Causality in the Frequency Domain: A Phase Resampling Method.
Liu, Siwei; Molenaar, Peter
2016-01-01
This article introduces phase resampling, an existing but rarely used surrogate data method for making statistical inferences of Granger causality in frequency domain time series analysis. Granger causality testing is essential for establishing causal relations among variables in multivariate dynamic processes. However, testing for Granger causality in the frequency domain is challenging due to the nonlinear relation between frequency domain measures (e.g., partial directed coherence, generalized partial directed coherence) and time domain data. Through a simulation study, we demonstrate that phase resampling is a general and robust method for making statistical inferences even with short time series. With Gaussian data, phase resampling yields satisfactory type I and type II error rates in all but one condition we examine: when a small effect size is combined with an insufficient number of data points. Violations of normality lead to slightly higher error rates but are mostly within acceptable ranges. We illustrate the utility of phase resampling with two empirical examples involving multivariate electroencephalography (EEG) and skin conductance data.
Resampling: A Marriage of Computers and Statistics. ERIC/TM Digest.
ERIC Educational Resources Information Center
Rudner, Lawrence M.; Shafer, Mary Morello
Advances in computer technology are making it possible for educational researchers to use simpler statistical methods to address a wide range of questions with smaller data sets and fewer, and less restrictive, assumptions. This digest introduces computationally intensive statistics, collectively called resampling techniques. Resampling is a…
2010-01-01
Background Aneurysmal subarachnoid haemorrhage (aSAH) is a devastating event with a frequently disabling outcome. Our aim was to develop a prognostic model to predict an ordinal clinical outcome at two months in patients with aSAH. Methods We studied patients enrolled in the International Subarachnoid Aneurysm Trial (ISAT), a randomized multicentre trial to compare coiling and clipping in aSAH patients. Several models were explored to estimate a patient's outcome according to the modified Rankin Scale (mRS) at two months after aSAH. Our final model was validated internally with bootstrapping techniques. Results The study population comprised of 2,128 patients of whom 159 patients died within 2 months (8%). Multivariable proportional odds analysis identified World Federation of Neurosurgical Societies (WFNS) grade as the most important predictor, followed by age, sex, lumen size of the aneurysm, Fisher grade, vasospasm on angiography, and treatment modality. The model discriminated moderately between those with poor and good mRS scores (c statistic = 0.65), with minor optimism according to bootstrap re-sampling (optimism corrected c statistic = 0.64). Conclusion We presented a calibrated and internally validated ordinal prognostic model to predict two month mRS in aSAH patients who survived the early stage up till a treatment decision. Although generalizability of the model is limited due to the selected population in which it was developed, this model could eventually be used to support clinical decision making after external validation. Trial Registration International Standard Randomised Controlled Trial, Number ISRCTN49866681 PMID:20920243
Simulating ensembles of source water quality using a K-nearest neighbor resampling approach.
Towler, Erin; Rajagopalan, Balaji; Seidel, Chad; Summers, R Scott
2009-03-01
Climatological, geological, and water management factors can cause significant variability in surface water quality. As drinking water quality standards become more stringent, the ability to quantify the variability of source water quality becomes more important for decision-making and planning in water treatment for regulatory compliance. However, paucity of long-term water quality data makes it challenging to apply traditional simulation techniques. To overcome this limitation, we have developed and applied a robust nonparametric K-nearest neighbor (K-nn) bootstrap approach utilizing the United States Environmental Protection Agency's Information Collection Rule (ICR) data. In this technique, first an appropriate "feature vector" is formed from the best available explanatory variables. The nearest neighbors to the feature vector are identified from the ICR data and are resampled using a weight function. Repetition of this results in water quality ensembles, and consequently the distribution and the quantification of the variability. The main strengths of the approach are its flexibility, simplicity, and the ability to use a large amount of spatial data with limited temporal extent to provide water quality ensembles for any given location. We demonstrate this approach by applying it to simulate monthly ensembles of total organic carbon for two utilities in the U.S. with very different watersheds and to alkalinity and bromide at two other U.S. utilities.
Simmons, Mark P; Goloboff, Pablo A
2013-10-01
Empirical and simulated examples are used to demonstrate an artifact caused by undersampling optimal trees in data matrices that consist mostly or entirely of locally sampled (as opposed to globally, for most or all terminals) characters. The artifact is that unsupported clades consisting entirely of terminals scored for the same locally sampled partition may be resolved and assigned high resampling support-despite their being properly unsupported (i.e., not resolved in the strict consensus of all optimal trees). This artifact occurs despite application of random-addition sequences for stepwise terminal addition. The artifact is not necessarily obviated with thorough conventional branch swapping methods (even tree-bisection-reconnection) when just a single tree is held, as is sometimes implemented in parsimony bootstrap pseudoreplicates, and in every GARLI, PhyML, and RAxML pseudoreplicate and search for the most likely tree for the matrix as a whole. Hence GARLI, RAxML, and PhyML-based likelihood results require extra scrutiny, particularly when they provide high resolution and support for clades that are entirely unsupported by methods that perform more thorough searches, as in most parsimony analyses. Copyright © 2013 Elsevier Inc. All rights reserved.
Jiang, Wenyu; Simon, Richard
2007-12-20
This paper first provides a critical review on some existing methods for estimating the prediction error in classifying microarray data where the number of genes greatly exceeds the number of specimens. Special attention is given to the bootstrap-related methods. When the sample size n is small, we find that all the reviewed methods suffer from either substantial bias or variability. We introduce a repeated leave-one-out bootstrap (RLOOB) method that predicts for each specimen in the sample using bootstrap learning sets of size ln. We then propose an adjusted bootstrap (ABS) method that fits a learning curve to the RLOOB estimates calculated with different bootstrap learning set sizes. The ABS method is robust across the situations we investigate and provides a slightly conservative estimate for the prediction error. Even with small samples, it does not suffer from large upward bias as the leave-one-out bootstrap and the 0.632+ bootstrap, and it does not suffer from large variability as the leave-one-out cross-validation in microarray applications. Copyright (c) 2007 John Wiley & Sons, Ltd.
Variability of drought characteristics in Europe over the last 250 years
NASA Astrophysics Data System (ADS)
Hanel, Martin; Rakovec, Oldrich; Máca, Petr; Markonis, Yannis; Samaniego, Luis; Kumar, Rohini
2017-04-01
The mesoscale hydrological model (mHM) with spatial resolution 0.5deg is applied to simulate water balance across large part of continental Europe (excluding Scandinavia and Russia) for the period 1766-2015. The model is driven by available European gridded monthly temperature and precipitation reconstructions (Casty et al, 2007), which are disaggregated into daily time step using k-nearest neighbour resampling (Lall and Sharma, 1996). To quantify the uncertainty due to temporal disaggregation, several replicates of precipitation and temperature fields for the whole period are considered. In parallel, model parameter uncertainty is addressed by an ensemble of parameter realizations provided by Rakovec et al (2016). Deficit periods with respect to total runoff and soil moisture are identified at each grid cell using the variable threshold method. We assess the severity and intensity of drought, spatial extent of area under drought as well as the length of deficit periods. In addition, we also determine the occurrence of multi-year droughts during the period and evaluate the extremity of recent droughts in Europe (i.e., 2003, 2015) in the context of the whole multi-decadal record. References: Casty, C., Raible, C.C., Stocker, T.F., Luterbacher, J. and H. Wanner (2007), A European pattern climatology 1766-2000, Climate Dynamics, 29(7), DOI:10.1007/s00382-007-0257-6. Lall, U., and A. Sharma (1996), A Nearest neighbor bootstrap for resampling hydrologic time series, Water Resour. Res., 32(3), 679-693, DOI:10.1029/95WR02966. Rakovec, O., Kumar, R., Attinger, S. and Samaniego, L. (2016), Improving the realism of hydrologic model functioning through multivariate parameter estimation, Water Resour. Res., 52, DOI:10.1002/2016WR019430
Estimation of urban runoff and water quality using remote sensing and artificial intelligence.
Ha, S R; Park, S Y; Park, D H
2003-01-01
Water quality and quantity of runoff are strongly dependent on the landuse and landcover (LULC) criteria. In this study, we developed a more improved parameter estimation procedure for the environmental model using remote sensing (RS) and artificial intelligence (AI) techniques. Landsat TM multi-band (7bands) and Korea Multi-Purpose Satellite (KOMPSAT) panchromatic data were selected for input data processing. We employed two kinds of artificial intelligence techniques, RBF-NN (radial-basis-function neural network) and ANN (artificial neural network), to classify LULC of the study area. A bootstrap resampling method, a statistical technique, was employed to generate the confidence intervals and distribution of the unit load. SWMM was used to simulate the urban runoff and water quality and applied to the study watershed. The condition of urban flow and non-point contaminations was simulated with rainfall-runoff and measured water quality data. The estimated total runoff, peak time, and pollutant generation varied considerably according to the classification accuracy and percentile unit load applied. The proposed procedure would efficiently be applied to water quality and runoff simulation in a rapidly changing urban area.
The seven deadly sins of DNA barcoding.
Collins, R A; Cruickshank, R H
2013-11-01
Despite the broad benefits that DNA barcoding can bring to a diverse range of biological disciplines, a number of shortcomings still exist in terms of the experimental design of studies incorporating this approach. One underlying reason for this lies in the confusion that often exists between species discovery and specimen identification, and this is reflected in the way that hypotheses are generated and tested. Although these aims can be associated, they are quite distinct and require different methodological approaches, but their conflation has led to the frequently inappropriate use of commonly used analytical methods such as neighbour-joining trees, bootstrap resampling and fixed distance thresholds. Furthermore, the misidentification of voucher specimens can also have serious implications for end users of reference libraries such as the Barcode of Life Data Systems, and in this regard we advocate increased diligence in the a priori identification of specimens to be used for this purpose. This commentary provides an assessment of seven deficiencies that we identify as common in the DNA barcoding literature, and outline some potential improvements for its adaptation and adoption towards more reliable and accurate outcomes. © 2012 John Wiley & Sons Ltd.
Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.
2014-01-01
The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736
Assessment of Person Fit Using Resampling-Based Approaches
ERIC Educational Resources Information Center
Sinharay, Sandip
2016-01-01
De la Torre and Deng suggested a resampling-based approach for person-fit assessment (PFA). The approach involves the use of the [math equation unavailable] statistic, a corrected expected a posteriori estimate of the examinee ability, and the Monte Carlo (MC) resampling method. The Type I error rate of the approach was closer to the nominal level…
Austin, Peter C.; van Klaveren, David; Vergouwe, Yvonne; Nieboer, Daan; Lee, Douglas S.; Steyerberg, Ewout W.
2017-01-01
Objective Validation of clinical prediction models traditionally refers to the assessment of model performance in new patients. We studied different approaches to geographic and temporal validation in the setting of multicenter data from two time periods. Study Design and Setting We illustrated different analytic methods for validation using a sample of 14,857 patients hospitalized with heart failure at 90 hospitals in two distinct time periods. Bootstrap resampling was used to assess internal validity. Meta-analytic methods were used to assess geographic transportability. Each hospital was used once as a validation sample, with the remaining hospitals used for model derivation. Hospital-specific estimates of discrimination (c-statistic) and calibration (calibration intercepts and slopes) were pooled using random effects meta-analysis methods. I2 statistics and prediction interval width quantified geographic transportability. Temporal transportability was assessed using patients from the earlier period for model derivation and patients from the later period for model validation. Results Estimates of reproducibility, pooled hospital-specific performance, and temporal transportability were on average very similar, with c-statistics of 0.75. Between-hospital variation was moderate according to I2 statistics and prediction intervals for c-statistics. Conclusion This study illustrates how performance of prediction models can be assessed in settings with multicenter data at different time periods. PMID:27262237
Application of change-point problem to the detection of plant patches.
López, I; Gámez, M; Garay, J; Standovár, T; Varga, Z
2010-03-01
In ecology, if the considered area or space is large, the spatial distribution of individuals of a given plant species is never homogeneous; plants form different patches. The homogeneity change in space or in time (in particular, the related change-point problem) is an important research subject in mathematical statistics. In the paper, for a given data system along a straight line, two areas are considered, where the data of each area come from different discrete distributions, with unknown parameters. In the paper a method is presented for the estimation of the distribution change-point between both areas and an estimate is given for the distributions separated by the obtained change-point. The solution of this problem will be based on the maximum likelihood method. Furthermore, based on an adaptation of the well-known bootstrap resampling, a method for the estimation of the so-called change-interval is also given. The latter approach is very general, since it not only applies in the case of the maximum-likelihood estimation of the change-point, but it can be also used starting from any other change-point estimation known in the ecological literature. The proposed model is validated against typical ecological situations, providing at the same time a verification of the applied algorithms.
Modular reweighting software for statistical mechanical analysis of biased equilibrium data
NASA Astrophysics Data System (ADS)
Sindhikara, Daniel J.
2012-07-01
Here a simple, useful, modular approach and software suite designed for statistical reweighting and analysis of equilibrium ensembles is presented. Statistical reweighting is useful and sometimes necessary for analysis of equilibrium enhanced sampling methods, such as umbrella sampling or replica exchange, and also in experimental cases where biasing factors are explicitly known. Essentially, statistical reweighting allows extrapolation of data from one or more equilibrium ensembles to another. Here, the fundamental separable steps of statistical reweighting are broken up into modules - allowing for application to the general case and avoiding the black-box nature of some “all-inclusive” reweighting programs. Additionally, the programs included are, by-design, written with little dependencies. The compilers required are either pre-installed on most systems, or freely available for download with minimal trouble. Examples of the use of this suite applied to umbrella sampling and replica exchange molecular dynamics simulations will be shown along with advice on how to apply it in the general case. New version program summaryProgram title: Modular reweighting version 2 Catalogue identifier: AEJH_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJH_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 179 118 No. of bytes in distributed program, including test data, etc.: 8 518 178 Distribution format: tar.gz Programming language: C++, Python 2.6+, Perl 5+ Computer: Any Operating system: Any RAM: 50-500 MB Supplementary material: An updated version of the original manuscript (Comput. Phys. Commun. 182 (2011) 2227) is available Classification: 4.13 Catalogue identifier of previous version: AEJH_v1_0 Journal reference of previous version: Comput. Phys. Commun. 182 (2011) 2227 Does the new version supersede the previous version?: Yes Nature of problem: While equilibrium reweighting is ubiquitous, there are no public programs available to perform the reweighting in the general case. Further, specific programs often suffer from many library dependencies and numerical instability. Solution method: This package is written in a modular format that allows for easy applicability of reweighting in the general case. Modules are small, numerically stable, and require minimal libraries. Reasons for new version: Some minor bugs, some upgrades needed, error analysis added. analyzeweight.py/analyzeweight.py2 has been replaced by “multihist.py”. This new program performs all the functions of its predecessor while being versatile enough to handle other types of histograms and probability analysis. “bootstrap.py” was added. This script performs basic bootstrap resampling allowing for error analysis of data. “avg_dev_distribution.py” was added. This program computes the averages and standard deviations of multiple distributions, making error analysis (e.g. from bootstrap resampling) easier to visualize. WRE.cpp was slightly modified purely for cosmetic reasons. The manual was updated for clarity and to reflect version updates. Examples were removed from the manual in favor of online tutorials (packaged examples remain). Examples were updated to reflect the new format. An additional example is included to demonstrate error analysis. Running time: Preprocessing scripts 1-5 minutes, WHAM engine <1 minute, postprocess script ∼1-5 minutes.
The prevalence of terraced treescapes in analyses of phylogenetic data sets.
Dobrin, Barbara H; Zwickl, Derrick J; Sanderson, Michael J
2018-04-04
The pattern of data availability in a phylogenetic data set may lead to the formation of terraces, collections of equally optimal trees. Terraces can arise in tree space if trees are scored with parsimony or with partitioned, edge-unlinked maximum likelihood. Theory predicts that terraces can be large, but their prevalence in contemporary data sets has never been surveyed. We selected 26 data sets and phylogenetic trees reported in recent literature and investigated the terraces to which the trees would belong, under a common set of inference assumptions. We examined terrace size as a function of the sampling properties of the data sets, including taxon coverage density (the proportion of taxon-by-gene positions with any data present) and a measure of gene sampling "sufficiency". We evaluated each data set in relation to the theoretical minimum gene sampling depth needed to reduce terrace size to a single tree, and explored the impact of the terraces found in replicate trees in bootstrap methods. Terraces were identified in nearly all data sets with taxon coverage densities < 0.90. They were not found, however, in high-coverage-density (i.e., ≥ 0.94) transcriptomic and genomic data sets. The terraces could be very large, and size varied inversely with taxon coverage density and with gene sampling sufficiency. Few data sets achieved a theoretical minimum gene sampling depth needed to reduce terrace size to a single tree. Terraces found during bootstrap resampling reduced overall support. If certain inference assumptions apply, trees estimated from empirical data sets often belong to large terraces of equally optimal trees. Terrace size correlates to data set sampling properties. Data sets seldom include enough genes to reduce terrace size to one tree. When bootstrap replicate trees lie on a terrace, statistical support for phylogenetic hypotheses may be reduced. Although some of the published analyses surveyed were conducted with edge-linked inference models (which do not induce terraces), unlinked models have been used and advocated. The present study describes the potential impact of that inference assumption on phylogenetic inference in the context of the kinds of multigene data sets now widely assembled for large-scale tree construction.
Wahl, Simone; Boulesteix, Anne-Laure; Zierer, Astrid; Thorand, Barbara; van de Wiel, Mark A
2016-10-26
Missing values are a frequent issue in human studies. In many situations, multiple imputation (MI) is an appropriate missing data handling strategy, whereby missing values are imputed multiple times, the analysis is performed in every imputed data set, and the obtained estimates are pooled. If the aim is to estimate (added) predictive performance measures, such as (change in) the area under the receiver-operating characteristic curve (AUC), internal validation strategies become desirable in order to correct for optimism. It is not fully understood how internal validation should be combined with multiple imputation. In a comprehensive simulation study and in a real data set based on blood markers as predictors for mortality, we compare three combination strategies: Val-MI, internal validation followed by MI on the training and test parts separately, MI-Val, MI on the full data set followed by internal validation, and MI(-y)-Val, MI on the full data set omitting the outcome followed by internal validation. Different validation strategies, including bootstrap und cross-validation, different (added) performance measures, and various data characteristics are considered, and the strategies are evaluated with regard to bias and mean squared error of the obtained performance estimates. In addition, we elaborate on the number of resamples and imputations to be used, and adopt a strategy for confidence interval construction to incomplete data. Internal validation is essential in order to avoid optimism, with the bootstrap 0.632+ estimate representing a reliable method to correct for optimism. While estimates obtained by MI-Val are optimistically biased, those obtained by MI(-y)-Val tend to be pessimistic in the presence of a true underlying effect. Val-MI provides largely unbiased estimates, with a slight pessimistic bias with increasing true effect size, number of covariates and decreasing sample size. In Val-MI, accuracy of the estimate is more strongly improved by increasing the number of bootstrap draws rather than the number of imputations. With a simple integrated approach, valid confidence intervals for performance estimates can be obtained. When prognostic models are developed on incomplete data, Val-MI represents a valid strategy to obtain estimates of predictive performance measures.
Rubio-Tapia, Alberto; Malamut, Georgia; Verbeek, Wieke H.M.; van Wanrooij, Roy L.J.; Leffler, Daniel A.; Niveloni, Sonia I.; Arguelles-Grande, Carolina; Lahr, Brian D.; Zinsmeister, Alan R.; Murray, Joseph A.; Kelly, Ciaran P.; Bai, Julio C.; Green, Peter H.; Daum, Severin; Mulder, Chris J.J.; Cellier, Christophe
2016-01-01
Background Refractory coeliac disease is a severe complication of coeliac disease with heterogeneous outcome. Aim To create a prognostic model to estimate survival of patients with refractory coeliac disease. Methods We evaluated predictors of 5-year mortality using Cox proportional hazards regression on subjects from a multinational registry. Bootstrap re-sampling was used to internally validate the individual factors and overall model performance. The mean of the estimated regression coefficients from 400 bootstrap models was used to derive a risk score for 5-year mortality. Results The multinational cohort was composed of 232 patients diagnosed with refractory coeliac disease across 7 centers (range of 11–63 cases per center). The median age was 53 years and 150 (64%) were women. A total of 51 subjects died during 5-year follow-up (cumulative 5-year all-cause mortality = 30%). From a multiple variable Cox proportional hazards model, the following variables were significantly associated with 5-year mortality: age at refractory coeliac disease diagnosis (per 20 year increase, hazard ratio = 2.21; 95% confidence interval: 1.38, 3.55), abnormal intraepithelial lymphocytes (hazard ratio = 2.85; 95% confidence interval: 1.22, 6.62), and albumin (per 0.5 unit increase, hazard ratio = 0.72; 95% confidence interval: 0.61, 0.85). A simple weighted 3-factor risk score was created to estimate 5-year survival. Conclusions Using data from a multinational registry and previously-reported risk factors, we create a prognostic model to predict 5-year mortality among patients with refractory coeliac disease. This new model may help clinicians to guide treatment and follow-up. PMID:27485029
Tests of Independence for Ordinal Data Using Bootstrap.
ERIC Educational Resources Information Center
Chan, Wai; Yung, Yiu-Fai; Bentler, Peter M.; Tang, Man-Lai
1998-01-01
Two bootstrap tests are proposed to test the independence hypothesis in a two-way cross table. Monte Carlo studies are used to compare the traditional asymptotic test with these bootstrap methods, and the bootstrap methods are found superior in two ways: control of Type I error and statistical power. (SLD)
Fast, Exact Bootstrap Principal Component Analysis for p > 1 million
Fisher, Aaron; Caffo, Brian; Schwartz, Brian; Zipunnikov, Vadim
2015-01-01
Many have suggested a bootstrap procedure for estimating the sampling variability of principal component analysis (PCA) results. However, when the number of measurements per subject (p) is much larger than the number of subjects (n), calculating and storing the leading principal components from each bootstrap sample can be computationally infeasible. To address this, we outline methods for fast, exact calculation of bootstrap principal components, eigenvalues, and scores. Our methods leverage the fact that all bootstrap samples occupy the same n-dimensional subspace as the original sample. As a result, all bootstrap principal components are limited to the same n-dimensional subspace and can be efficiently represented by their low dimensional coordinates in that subspace. Several uncertainty metrics can be computed solely based on the bootstrap distribution of these low dimensional coordinates, without calculating or storing the p-dimensional bootstrap components. Fast bootstrap PCA is applied to a dataset of sleep electroencephalogram recordings (p = 900, n = 392), and to a dataset of brain magnetic resonance images (MRIs) (p ≈ 3 million, n = 352). For the MRI dataset, our method allows for standard errors for the first 3 principal components based on 1000 bootstrap samples to be calculated on a standard laptop in 47 minutes, as opposed to approximately 4 days with standard methods. PMID:27616801
Introduction to Permutation and Resampling-Based Hypothesis Tests
ERIC Educational Resources Information Center
LaFleur, Bonnie J.; Greevy, Robert A.
2009-01-01
A resampling-based method of inference--permutation tests--is often used when distributional assumptions are questionable or unmet. Not only are these methods useful for obvious departures from parametric assumptions (e.g., normality) and small sample sizes, but they are also more robust than their parametric counterparts in the presences of…
Coefficient Alpha Bootstrap Confidence Interval under Nonnormality
ERIC Educational Resources Information Center
Padilla, Miguel A.; Divers, Jasmin; Newton, Matthew
2012-01-01
Three different bootstrap methods for estimating confidence intervals (CIs) for coefficient alpha were investigated. In addition, the bootstrap methods were compared with the most promising coefficient alpha CI estimation methods reported in the literature. The CI methods were assessed through a Monte Carlo simulation utilizing conditions…
Application of the Bootstrap Methods in Factor Analysis.
ERIC Educational Resources Information Center
Ichikawa, Masanori; Konishi, Sadanori
1995-01-01
A Monte Carlo experiment was conducted to investigate the performance of bootstrap methods in normal theory maximum likelihood factor analysis when the distributional assumption was satisfied or unsatisfied. Problems arising with the use of bootstrap methods are highlighted. (SLD)
Coker, Kendell L.; Ikpe, Uduakobong N.; Brooks, Jeannie S.; Page, Brian; Sobell, Mark B.
2014-01-01
This study examined the relationship between traumatic stress, social problem solving, and moral disengagement among African American inner-city high school students. Participants consisted of 45 (25 males and 20 females) African American students enrolled in grades 10 through 12. Mediation was assessed by testing for the indirect effect using the confidence interval derived from 10,000 bootstrapped resamples. The results revealed that social problem-solving skills have an indirect effect on the relationship between traumatic stress and moral disengagement. The findings suggest that African American youth that are negatively impacted by trauma evidence deficits in their social problem solving skills and are likely to be at an increased risk to morally disengage. Implications for culturally sensitive and trauma-based intervention programs are also provided. PMID:25071874
Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D
2015-02-20
The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Ickert, R. B.; Mundil, R.
2012-12-01
Dateable minerals (especially zircon U-Pb) that crystallized at high temperatures but have been redeposited, pose both unique opportunities and challenges for geochronology. Although they have the potential to provide useful information on the depositional age of their host rocks, their relationship to the host is not always well constrained. For example, primary volcanic deposits will often have a lag time (time between eruption and deposition) that is smaller than can be resolved using radiometric techniques, and the age of eruption and of deposition will be coincident within uncertainty. Alternatively, ordinary clastic sedimentary rocks will usually have a long and variable lag time, even for the youngest minerals. Intermediate cases, for example moderately reworked volcanogenic material, will have a short, but unknown lag time. A compounding problem with U-Pb zircon is that the residence time of crystals in their host magma chamber (time between crystallization and eruption) can be high and is variable, even within the products of a single eruption. In cases where the lag and/or residence time suspected to be large relative to the precision of the date, a common objective is to determine the minimum age of a sample of dates, in order to constrain the maximum age of the deposition of the host rock. However, both the extraction of that age as well as assignment of a meaningful uncertainty is not straightforward. A number of ad hoc techniques have been employed in the literature, which may be appropriate for particular data sets or specific problems, but may yield biased or misleading results. Ludwig (2012) has developed an objective, statistically justified method for the determination of the distribution of the minimum age, but it has not been widely adopted. Here we extend this algorithm with a bootstrap (which can show the effect - if any - of the sampling distribution itself). This method has a number of desirable characteristics: It can incorporate all data points while being resistant to outliers, it utilizes the measurement uncertainties, and it does not require the assumption that any given cluster of data represents a single geological event. In brief, the technique generates a synthetic distribution from the input data by resampling with replacement (a bootstrap). Each resample is a random selection from a Gaussian distribution defined by the mean and uncertainty of the data point. For this distribution, the minimum value is calculated. This procedure is repeated many times (>1000) and a distribution of minimum values is generated, from which a confidence interval can be constructed. We demonstrate the application of this technique using natural and synthetic datasets, show the advantages and limitations, and relate it to other methods. We emphasize that this estimate remains strictly a minimum age - as with any other estimate that does not explicitly incorporate lag or residence time, it will not reflect a depositional age if the lag/residence time is larger than the uncertainty of the estimate. We recommend that this or similar techniques be considered by geochronologists. Ludwig, K.R., 2012. Isoplot 3.75, A geochronological toolkit for Microsoft Excel; Berkeley Geochronology Center Special Publication no. 5
Trends and variability in the hydrological regime of the Mackenzie River Basin
NASA Astrophysics Data System (ADS)
Abdul Aziz, Omar I.; Burn, Donald H.
2006-03-01
Trends and variability in the hydrological regime were analyzed for the Mackenzie River Basin in northern Canada. The procedure utilized the Mann-Kendall non-parametric test to detect trends, the Trend Free Pre-Whitening (TFPW) approach for correcting time-series data for autocorrelation and a bootstrap resampling method to account for the cross-correlation structure of the data. A total of 19 hydrological and six meteorological variables were selected for the study. Analysis was conducted on hydrological data from a network of 54 hydrometric stations and meteorological data from a network of 10 stations. The results indicated that several hydrological variables exhibit a greater number of significant trends than are expected to occur by chance. Noteworthy were strong increasing trends over the winter month flows of December to April as well as in the annual minimum flow and weak decreasing trends in the early summer and late fall flows as well as in the annual mean flow. An earlier onset of the spring freshet is noted over the basin. The results are expected to assist water resources managers and policy makers in making better planning decisions in the Mackenzie River Basin.
Wang, Yu-Jie; Dou, Kai; Tang, Zhi-Wen
2017-01-01
Organizational citizenship behavior (OCB) is important to the development of an organization. Research into factors that foster OCB and the underlying processes are therefore substantially crucial. The current study aimed to test the association between trait self-control and OCB and the mediating role of consideration for future consequence. Four hundred and ninety-four Chinese employees (275 men, 219 women) took part in the study. Participants completed a battery of self-report measures online that assessed trait self-control, tendencies of consideration of future consequence, and organizational citizenship behavior. Path analysis was conducted and bootstrapping technique (N = 5000), a resampling method that is asymptotically more accurate than the standard intervals using sample variance and assumptions of normality, was used to judge the significance of the mediation. Results of path analysis showed that trait self-control was positively related to OCB. More importantly, the "trait self-control-OCB" link was mediated by consideration of future consequence-future, but not by consideration of future consequence-immediate. Employees with high trait self-control engage in more organizational citizenship behavior and this link can be partly explained by consideration of future consequence-future.
Resampling and Distribution of the Product Methods for Testing Indirect Effects in Complex Models
ERIC Educational Resources Information Center
Williams, Jason; MacKinnon, David P.
2008-01-01
Recent advances in testing mediation have found that certain resampling methods and tests based on the mathematical distribution of 2 normal random variables substantially outperform the traditional "z" test. However, these studies have primarily focused only on models with a single mediator and 2 component paths. To address this limitation, a…
Baron, Charles A.; Awan, Musaddiq J.; Mohamed, Abdallah S. R.; Akel, Imad; Rosenthal, David I.; Gunn, G. Brandon; Garden, Adam S.; Dyer, Brandon A.; Court, Laurence; Sevak, Parag R; Kocak-Uzel, Esengul; Fuller, Clifton D.
2016-01-01
Larynx may alternatively serve as a target or organ-at-risk (OAR) in head and neck cancer (HNC) image-guided radiotherapy (IGRT). The objective of this study was to estimate IGRT parameters required for larynx positional error independent of isocentric alignment and suggest population–based compensatory margins. Ten HNC patients receiving radiotherapy (RT) with daily CT-on-rails imaging were assessed. Seven landmark points were placed on each daily scan. Taking the most superior anterior point of the C5 vertebra as a reference isocenter for each scan, residual displacement vectors to the other 6 points were calculated post-isocentric alignment. Subsequently, using the first scan as a reference, the magnitude of vector differences for all 6 points for all scans over the course of treatment were calculated. Residual systematic and random error, and the necessary compensatory CTV-to-PTV and OAR-to-PRV margins were calculated, using both observational cohort data and a bootstrap-resampled population estimator. The grand mean displacements for all anatomical points was 5.07mm, with mean systematic error of 1.1mm and mean random setup error of 2.63mm, while bootstrapped POIs grand mean displacement was 5.09mm, with mean systematic error of 1.23mm and mean random setup error of 2.61mm. Required margin for CTV-PTV expansion was 4.6mm for all cohort points, while the bootstrap estimator of the equivalent margin was 4.9mm. The calculated OAR-to-PRV expansion for the observed residual set-up error was 2.7mm, and bootstrap estimated expansion of 2.9mm. We conclude that the interfractional larynx setup error is a significant source of RT set-up/delivery error in HNC both when the larynx is considered as a CTV or OAR. We estimate the need for a uniform expansion of 5mm to compensate for set up error if the larynx is a target or 3mm if the larynx is an OAR when using a non-laryngeal bony isocenter. PMID:25679151
System health monitoring using multiple-model adaptive estimation techniques
NASA Astrophysics Data System (ADS)
Sifford, Stanley Ryan
Monitoring system health for fault detection and diagnosis by tracking system parameters concurrently with state estimates is approached using a new multiple-model adaptive estimation (MMAE) method. This novel method is called GRid-based Adaptive Parameter Estimation (GRAPE). GRAPE expands existing MMAE methods by using new techniques to sample the parameter space. GRAPE expands on MMAE with the hypothesis that sample models can be applied and resampled without relying on a predefined set of models. GRAPE is initially implemented in a linear framework using Kalman filter models. A more generalized GRAPE formulation is presented using extended Kalman filter (EKF) models to represent nonlinear systems. GRAPE can handle both time invariant and time varying systems as it is designed to track parameter changes. Two techniques are presented to generate parameter samples for the parallel filter models. The first approach is called selected grid-based stratification (SGBS). SGBS divides the parameter space into equally spaced strata. The second approach uses Latin Hypercube Sampling (LHS) to determine the parameter locations and minimize the total number of required models. LHS is particularly useful when the parameter dimensions grow. Adding more parameters does not require the model count to increase for LHS. Each resample is independent of the prior sample set other than the location of the parameter estimate. SGBS and LHS can be used for both the initial sample and subsequent resamples. Furthermore, resamples are not required to use the same technique. Both techniques are demonstrated for both linear and nonlinear frameworks. The GRAPE framework further formalizes the parameter tracking process through a general approach for nonlinear systems. These additional methods allow GRAPE to either narrow the focus to converged values within a parameter range or expand the range in the appropriate direction to track the parameters outside the current parameter range boundary. Customizable rules define the specific resample behavior when the GRAPE parameter estimates converge. Convergence itself is determined from the derivatives of the parameter estimates using a simple moving average window to filter out noise. The system can be tuned to match the desired performance goals by making adjustments to parameters such as the sample size, convergence criteria, resample criteria, initial sampling method, resampling method, confidence in prior sample covariances, sample delay, and others.
ERIC Educational Resources Information Center
Cui, Zhongmin; Kolen, Michael J.
2008-01-01
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Yang, Yang; DeGruttola, Victor
2016-01-01
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients. PMID:22740584
Yang, Yang; DeGruttola, Victor
2012-06-22
Traditional resampling-based tests for homogeneity in covariance matrices across multiple groups resample residuals, that is, data centered by group means. These residuals do not share the same second moments when the null hypothesis is false, which makes them difficult to use in the setting of multiple testing. An alternative approach is to resample standardized residuals, data centered by group sample means and standardized by group sample covariance matrices. This approach, however, has been observed to inflate type I error when sample size is small or data are generated from heavy-tailed distributions. We propose to improve this approach by using robust estimation for the first and second moments. We discuss two statistics: the Bartlett statistic and a statistic based on eigen-decomposition of sample covariance matrices. Both statistics can be expressed in terms of standardized errors under the null hypothesis. These methods are extended to test homogeneity in correlation matrices. Using simulation studies, we demonstrate that the robust resampling approach provides comparable or superior performance, relative to traditional approaches, for single testing and reasonable performance for multiple testing. The proposed methods are applied to data collected in an HIV vaccine trial to investigate possible determinants, including vaccine status, vaccine-induced immune response level and viral genotype, of unusual correlation pattern between HIV viral load and CD4 count in newly infected patients.
Goldstein, Darlene R
2006-10-01
Studies of gene expression using high-density short oligonucleotide arrays have become a standard in a variety of biological contexts. Of the expression measures that have been proposed to quantify expression in these arrays, multi-chip-based measures have been shown to perform well. As gene expression studies increase in size, however, utilizing multi-chip expression measures is more challenging in terms of computing memory requirements and time. A strategic alternative to exact multi-chip quantification on a full large chip set is to approximate expression values based on subsets of chips. This paper introduces an extrapolation method, Extrapolation Averaging (EA), and a resampling method, Partition Resampling (PR), to approximate expression in large studies. An examination of properties indicates that subset-based methods can perform well compared with exact expression quantification. The focus is on short oligonucleotide chips, but the same ideas apply equally well to any array type for which expression is quantified using an entire set of arrays, rather than for only a single array at a time. Software implementing Partition Resampling and Extrapolation Averaging is under development as an R package for the BioConductor project.
Spatial Point Pattern Analysis of Neurons Using Ripley's K-Function in 3D
Jafari-Mamaghani, Mehrdad; Andersson, Mikael; Krieger, Patrik
2010-01-01
The aim of this paper is to apply a non-parametric statistical tool, Ripley's K-function, to analyze the 3-dimensional distribution of pyramidal neurons. Ripley's K-function is a widely used tool in spatial point pattern analysis. There are several approaches in 2D domains in which this function is executed and analyzed. Drawing consistent inferences on the underlying 3D point pattern distributions in various applications is of great importance as the acquisition of 3D biological data now poses lesser of a challenge due to technological progress. As of now, most of the applications of Ripley's K-function in 3D domains do not focus on the phenomenon of edge correction, which is discussed thoroughly in this paper. The main goal is to extend the theoretical and practical utilization of Ripley's K-function and corresponding tests based on bootstrap resampling from 2D to 3D domains. PMID:20577588
Performance analysis of deciduous morphology for detecting biological siblings.
Paul, Kathleen S; Stojanowski, Christopher M
2015-08-01
Family-centered burial practices influence cemetery structure and can represent social group composition in both modern and ancient contexts. In ancient sites dental phenotypic data are often used as proxies for underlying genotypes to identify potential biological relatives. Here, we test the performance of deciduous dental morphological traits for differentiating sibling pairs from unrelated individuals from the same population. We collected 46 deciduous morphological traits for 69 sibling pairs from the Burlington Growth Centre's long term Family Study. Deciduous crown features were recorded following published standards. After variable winnowing, inter-individual Euclidean distances were generated using 20 morphological traits. To determine whether sibling pairs are more phenotypically similar than expected by chance we used bootstrap resampling of distances to generate P values. Multidimensional scaling (MDS) plots were used to evaluate the degree of clustering among sibling pairs. Results indicate an average distance between siblings of 0.252, which is significantly less than 9,999 replicated averages of 69 resampled pseudo-distances generated from: 1) a sample of non-relative pairs (P < 0.001), and 2) a sample of relative and non-relative pairs (P < 0.001). MDS plots indicate moderate to strong clustering among siblings; families occupied 3.83% of the multidimensional space on average (versus 63.10% for the total sample). Deciduous crown morphology performed well in identifying related sibling pairs. However, there was considerable variation in the extent to which different families exhibited similarly low levels of phenotypic divergence. © 2015 Wiley Periodicals, Inc.
Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.
Wang, Zuozhen
2018-01-01
Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.
Poder, Thomas G; Kouakou, Christian R C; Bouchard, Pierre-Alexandre; Tremblay, Véronique; Blais, Sébastien; Maltais, François; Lellouche, François
2018-01-01
Objective Conduct a cost-effectiveness analysis of FreeO2 technology versus manual oxygen-titration technology for patients with chronic obstructive pulmonary disease (COPD) hospitalised for acute exacerbations. Setting Tertiary acute care hospital in Quebec, Canada. Participants 47 patients with COPD hospitalised for acute exacerbations. Intervention An automated oxygen-titration and oxygen-weaning technology. Methods and outcomes The costs for hospitalisation and follow-up for 180 days were calculated using a microcosting approach and included the cost of FreeO2 technology. Incremental cost-effectiveness ratios (ICERs) were calculated using bootstrap resampling with 5000 replications. The main effect variable was the percentage of time spent at the target oxygen saturation (SpO2). The other two effect variables were the time spent in hyperoxia (target SpO2+5%) and in severe hypoxaemia (SpO2 <85%). The resamplings were based on data from a randomised controlled trial with 47 patients with COPD hospitalised for acute exacerbations. Results FreeO2 generated savings of 20.7% of the per-patient costs at 180 days (ie, −$C2959.71). This decrease is nevertheless not significant at the 95% threshold (P=0.13), but the effect variables all improved (P<0.001). The improvement in the time spent at the target SpO2 was 56.3%. The ICERs indicate that FreeO2 technology is more cost-effective than manual oxygen titration with a savings of −$C96.91 per percentage point of time spent at the target SpO2 (95% CI −301.26 to 116.96). Conclusion FreeO2 technology could significantly enhance the efficiency of the health system by reducing per-patient costs at 180 days. A study with a larger patient sample needs to be carried out to confirm these preliminary results. Trial registration number NCT01393015; Post-results. PMID:29362258
Bootstrap confidence levels for phylogenetic trees.
Efron, B; Halloran, E; Holmes, S
1996-07-09
Evolutionary trees are often estimated from DNA or RNA sequence data. How much confidence should we have in the estimated trees? In 1985, Felsenstein [Felsenstein, J. (1985) Evolution 39, 783-791] suggested the use of the bootstrap to answer this question. Felsenstein's method, which in concept is a straightforward application of the bootstrap, is widely used, but has been criticized as biased in the genetics literature. This paper concerns the use of the bootstrap in the tree problem. We show that Felsenstein's method is not biased, but that it can be corrected to better agree with standard ideas of confidence levels and hypothesis testing. These corrections can be made by using the more elaborate bootstrap method presented here, at the expense of considerably more computation.
Katayama, R; Sakai, S; Sakaguchi, T; Maeda, T; Takada, K; Hayabuchi, N; Morishita, J
2008-07-20
PURPOSE/AIM OF THE EXHIBIT: The purpose of this exhibit is: 1. To explain "resampling", an image data processing, performed by the digital radiographic system based on flat panel detector (FPD). 2. To show the influence of "resampling" on the basic imaging properties. 3. To present accurate measurement methods of the basic imaging properties of the FPD system. 1. The relationship between the matrix sizes of the output image and the image data acquired on FPD that automatically changes depending on a selected image size (FOV). 2. The explanation of the image data processing of "resampling". 3. The evaluation results of the basic imaging properties of the FPD system using two types of DICOM image to which "resampling" was performed: characteristic curves, presampled MTFs, noise power spectra, detective quantum efficiencies. CONCLUSION/SUMMARY: The major points of the exhibit are as follows: 1. The influence of "resampling" should not be disregarded in the evaluation of the basic imaging properties of the flat panel detector system. 2. It is necessary for the basic imaging properties to be measured by using DICOM image to which no "resampling" is performed.
Anisotropic scene geometry resampling with occlusion filling for 3DTV applications
NASA Astrophysics Data System (ADS)
Kim, Jangheon; Sikora, Thomas
2006-02-01
Image and video-based rendering technologies are receiving growing attention due to their photo-realistic rendering capability in free-viewpoint. However, two major limitations are ghosting and blurring due to their sampling-based mechanism. The scene geometry which supports to select accurate sampling positions is proposed using global method (i.e. approximate depth plane) and local method (i.e. disparity estimation). This paper focuses on the local method since it can yield more accurate rendering quality without large number of cameras. The local scene geometry has two difficulties which are the geometrical density and the uncovered area including hidden information. They are the serious drawback to reconstruct an arbitrary viewpoint without aliasing artifacts. To solve the problems, we propose anisotropic diffusive resampling method based on tensor theory. Isotropic low-pass filtering accomplishes anti-aliasing in scene geometry and anisotropic diffusion prevents filtering from blurring the visual structures. Apertures in coarse samples are estimated following diffusion on the pre-filtered space, the nonlinear weighting of gradient directions suppresses the amount of diffusion. Aliasing artifacts from low density are efficiently removed by isotropic filtering and the edge blurring can be solved by the anisotropic method at one process. Due to difference size of sampling gap, the resampling condition is defined considering causality between filter-scale and edge. Using partial differential equation (PDE) employing Gaussian scale-space, we iteratively achieve the coarse-to-fine resampling. In a large scale, apertures and uncovered holes can be overcoming because only strong and meaningful boundaries are selected on the resolution. The coarse-level resampling with a large scale is iteratively refined to get detail scene structure. Simulation results show the marked improvements of rendering quality.
NASA Astrophysics Data System (ADS)
Zhang, Yonggen; Schaap, Marcel G.
2017-04-01
Pedotransfer functions (PTFs) have been widely used to predict soil hydraulic parameters in favor of expensive laboratory or field measurements. Rosetta (Schaap et al., 2001, denoted as Rosetta1) is one of many PTFs and is based on artificial neural network (ANN) analysis coupled with the bootstrap re-sampling method which allows the estimation of van Genuchten water retention parameters (van Genuchten, 1980, abbreviated here as VG), saturated hydraulic conductivity (Ks), and their uncertainties. In this study, we present an improved set of hierarchical pedotransfer functions (Rosetta3) that unify the water retention and Ks submodels into one. Parameter uncertainty of the fit of the VG curve to the original retention data is used in the ANN calibration procedure to reduce bias of parameters predicted by the new PTF. One thousand bootstrap replicas were used to calibrate the new models compared to 60 or 100 in Rosetta1, thus allowing the uni-variate and bi-variate probability distributions of predicted parameters to be quantified in greater detail. We determined the optimal weights for VG parameters and Ks, the optimal number of hidden nodes in ANN, and the number of bootstrap replicas required for statistically stable estimates. Results show that matric potential-dependent bias was reduced significantly while root mean square error (RMSE) for water content were reduced modestly; RMSE for Ks was increased by 0.9% (H3w) to 3.3% (H5w) in the new models on log scale of Ks compared with the Rosetta1 model. It was found that estimated distributions of parameters were mildly non-Gaussian and could instead be described rather well with heavy-tailed α-stable distributions. On the other hand, arithmetic means had only a small estimation bias for most textures when compared with the mean-like "shift" parameter of the α-stable distributions. Arithmetic means and (co-)variances are therefore still recommended as summary statistics of the estimated distributions. However, it may be necessary to parameterize the distributions in different ways if the new estimates are used in stochastic analyses of vadose zone flow and transport. Rosetta1 and Posetta3 were implemented in the python programming language, and the source code as well as additional documentation is available at: http://www.cals.arizona.edu/research/rosettav3.html.
Computing camera heading: A study
NASA Astrophysics Data System (ADS)
Zhang, John Jiaxiang
2000-08-01
An accurate estimate of the motion of a camera is a crucial first step for the 3D reconstruction of sites, objects, and buildings from video. Solutions to the camera heading problem can be readily applied to many areas, such as robotic navigation, surgical operation, video special effects, multimedia, and lately even in internet commerce. From image sequences of a real world scene, the problem is to calculate the directions of the camera translations. The presence of rotations makes this problem very hard. This is because rotations and translations can have similar effects on the images, and are thus hard to tell apart. However, the visual angles between the projection rays of point pairs are unaffected by rotations, and their changes over time contain sufficient information to determine the direction of camera translation. We developed a new formulation of the visual angle disparity approach, first introduced by Tomasi, to the camera heading problem. Our new derivation makes theoretical analysis possible. Most notably, a theorem is obtained that locates all possible singularities of the residual function for the underlying optimization problem. This allows identifying all computation trouble spots beforehand, and to design reliable and accurate computational optimization methods. A bootstrap-jackknife resampling method simultaneously reduces complexity and tolerates outliers well. Experiments with image sequences show accurate results when compared with the true camera motion as measured with mechanical devices.
Evaluating the efficiency of environmental monitoring programs
Levine, Carrie R.; Yanai, Ruth D.; Lampman, Gregory G.; Burns, Douglas A.; Driscoll, Charles T.; Lawrence, Gregory B.; Lynch, Jason; Schoch, Nina
2014-01-01
Statistical uncertainty analyses can be used to improve the efficiency of environmental monitoring, allowing sampling designs to maximize information gained relative to resources required for data collection and analysis. In this paper, we illustrate four methods of data analysis appropriate to four types of environmental monitoring designs. To analyze a long-term record from a single site, we applied a general linear model to weekly stream chemistry data at Biscuit Brook, NY, to simulate the effects of reducing sampling effort and to evaluate statistical confidence in the detection of change over time. To illustrate a detectable difference analysis, we analyzed a one-time survey of mercury concentrations in loon tissues in lakes in the Adirondack Park, NY, demonstrating the effects of sampling intensity on statistical power and the selection of a resampling interval. To illustrate a bootstrapping method, we analyzed the plot-level sampling intensity of forest inventory at the Hubbard Brook Experimental Forest, NH, to quantify the sampling regime needed to achieve a desired confidence interval. Finally, to analyze time-series data from multiple sites, we assessed the number of lakes and the number of samples per year needed to monitor change over time in Adirondack lake chemistry using a repeated-measures mixed-effects model. Evaluations of time series and synoptic long-term monitoring data can help determine whether sampling should be re-allocated in space or time to optimize the use of financial and human resources.
Shen, Chung-Wei; Chen, Yi-Hau
2018-03-13
We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Beckers, J.; Weerts, A.; Tijdeman, E.; Welles, E.; McManamon, A.
2013-12-01
To provide reliable and accurate seasonal streamflow forecasts for water resources management several operational hydrologic agencies and hydropower companies around the world use the Extended Streamflow Prediction (ESP) procedure. The ESP in its original implementation does not accommodate for any additional information that the forecaster may have about expected deviations from climatology in the near future. Several attempts have been conducted to improve the skill of the ESP forecast, especially for areas which are affected by teleconnetions (e,g. ENSO, PDO) via selection (Hamlet and Lettenmaier, 1999) or weighting schemes (Werner et al., 2004; Wood and Lettenmaier, 2006; Najafi et al., 2012). A disadvantage of such schemes is that they lead to a reduction of the signal to noise ratio of the probabilistic forecast. To overcome this, we propose a resampling method conditional on climate indices to generate meteorological time series to be used in the ESP. The method can be used to generate a large number of meteorological ensemble members in order to improve the statistical properties of the ensemble. The effectiveness of the method was demonstrated in a real-time operational hydrologic seasonal forecasts system for the Columbia River basin operated by the Bonneville Power Administration. The forecast skill of the k-nn resampler was tested against the original ESP for three basins at the long-range seasonal time scale. The BSS and CRPSS were used to compare the results to those of the original ESP method. Positive forecast skill scores were found for the resampler method conditioned on different indices for the prediction of spring peak flows in the Dworshak and Hungry Horse basin. For the Libby Dam basin however, no improvement of skill was found. The proposed resampling method is a promising practical approach that can add skill to ESP forecasts at the seasonal time scale. Further improvement is possible by fine tuning the method and selecting the most informative climate indices for the region of interest.
ERIC Educational Resources Information Center
Spinella, Sarah
2011-01-01
As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
Identifying reliable independent components via split-half comparisons
Groppe, David M.; Makeig, Scott; Kutas, Marta
2011-01-01
Independent component analysis (ICA) is a family of unsupervised learning algorithms that have proven useful for the analysis of the electroencephalogram (EEG) and magnetoencephalogram (MEG). ICA decomposes an EEG/MEG data set into a basis of maximally temporally independent components (ICs) that are learned from the data. As with any statistic, a concern with using ICA is the degree to which the estimated ICs are reliable. An IC may not be reliable if ICA was trained on insufficient data, if ICA training was stopped prematurely or at a local minimum (for some algorithms), or if multiple global minima were present. Consequently, evidence of ICA reliability is critical for the credibility of ICA results. In this paper, we present a new algorithm for assessing the reliability of ICs based on applying ICA separately to split-halves of a data set. This algorithm improves upon existing methods in that it considers both IC scalp topographies and activations, uses a probabilistically interpretable threshold for accepting ICs as reliable, and requires applying ICA only three times per data set. As evidence of the method’s validity, we show that the method can perform comparably to more time intensive bootstrap resampling and depends in a reasonable manner on the amount of training data. Finally, using the method we illustrate the importance of checking the reliability of ICs by demonstrating that IC reliability is dramatically increased by removing the mean EEG at each channel for each epoch of data rather than the mean EEG in a prestimulus baseline. PMID:19162199
Miller, Donald L.; Kwon, Deukwoo; Bonavia, Grant H.
2009-01-01
Purpose: To propose initial values for patient reference levels for fluoroscopically guided procedures in the United States. Materials and Methods: This secondary analysis of data from the Radiation Doses in Interventional Radiology Procedures (RAD-IR) study was conducted under a protocol approved by the institutional review board and was HIPAA compliant. Dose distributions (percentiles) were calculated for each type of procedure in the RAD-IR study where there were data from at least 30 cases. Confidence intervals for the dose distributions were determined by using bootstrap resampling. Weight banding and size correction methods for normalizing dose to patient body habitus were tested. Results: The different methods for normalizing patient radiation dose according to patient weight gave results that were not significantly different (P > .05). The 75th percentile patient radiation doses normalized with weight banding were not significantly different from those that were uncorrected for body habitus. Proposed initial reference levels for various interventional procedures are provided for reference air kerma, kerma-area product, fluoroscopy time, and number of images. Conclusion: Sufficient data exist to permit an initial proposal of values for reference levels for interventional radiologic procedures in the United States. For ease of use, reference levels without correction for body habitus are recommended. A national registry of radiation-dose data for interventional radiologic procedures is a necessary next step to refine these reference levels. © RSNA, 2009 Supplemental material: http://radiology.rsna.org/lookup/suppl/doi:10.1148/radiol.2533090354/-/DC1 PMID:19789226
Confidence intervals in Flow Forecasting by using artificial neural networks
NASA Astrophysics Data System (ADS)
Panagoulia, Dionysia; Tsekouras, George
2014-05-01
One of the major inadequacies in implementation of Artificial Neural Networks (ANNs) for flow forecasting is the development of confidence intervals, because the relevant estimation cannot be implemented directly, contrasted to the classical forecasting methods. The variation in the ANN output is a measure of uncertainty in the model predictions based on the training data set. Different methods for uncertainty analysis, such as bootstrap, Bayesian, Monte Carlo, have already proposed for hydrologic and geophysical models, while methods for confidence intervals, such as error output, re-sampling, multi-linear regression adapted to ANN have been used for power load forecasting [1-2]. The aim of this paper is to present the re-sampling method for ANN prediction models and to develop this for flow forecasting of the next day. The re-sampling method is based on the ascending sorting of the errors between real and predicted values for all input vectors. The cumulative sample distribution function of the prediction errors is calculated and the confidence intervals are estimated by keeping the intermediate value, rejecting the extreme values according to the desired confidence levels, and holding the intervals symmetrical in probability. For application of the confidence intervals issue, input vectors are used from the Mesochora catchment in western-central Greece. The ANN's training algorithm is the stochastic training back-propagation process with decreasing functions of learning rate and momentum term, for which an optimization process is conducted regarding the crucial parameters values, such as the number of neurons, the kind of activation functions, the initial values and time parameters of learning rate and momentum term etc. Input variables are historical data of previous days, such as flows, nonlinearly weather related temperatures and nonlinearly weather related rainfalls based on correlation analysis between the under prediction flow and each implicit input variable of different ANN structures [3]. The performance of each ANN structure is evaluated by the voting analysis based on eleven criteria, which are the root mean square error (RMSE), the correlation index (R), the mean absolute percentage error (MAPE), the mean percentage error (MPE), the mean percentage error (ME), the percentage volume in errors (VE), the percentage error in peak (MF), the normalized mean bias error (NMBE), the normalized root mean bias error (NRMSE), the Nash-Sutcliffe model efficiency coefficient (E) and the modified Nash-Sutcliffe model efficiency coefficient (E1). The next day flow for the test set is calculated using the best ANN structure's model. Consequently, the confidence intervals of various confidence levels for training, evaluation and test sets are compared in order to explore the generalisation dynamics of confidence intervals from training and evaluation sets. [1] H.S. Hippert, C.E. Pedreira, R.C. Souza, "Neural networks for short-term load forecasting: A review and evaluation," IEEE Trans. on Power Systems, vol. 16, no. 1, 2001, pp. 44-55. [2] G. J. Tsekouras, N.E. Mastorakis, F.D. Kanellos, V.T. Kontargyri, C.D. Tsirekis, I.S. Karanasiou, Ch.N. Elias, A.D. Salis, P.A. Kontaxis, A.A. Gialketsi: "Short term load forecasting in Greek interconnected power system using ANN: Confidence Interval using a novel re-sampling technique with corrective Factor", WSEAS International Conference on Circuits, Systems, Electronics, Control & Signal Processing, (CSECS '10), Vouliagmeni, Athens, Greece, December 29-31, 2010. [3] D. Panagoulia, I. Trichakis, G. J. Tsekouras: "Flow Forecasting via Artificial Neural Networks - A Study for Input Variables conditioned on atmospheric circulation", European Geosciences Union, General Assembly 2012 (NH1.1 / AS1.16 - Extreme meteorological and hydrological events induced by severe weather and climate change), Vienna, Austria, 22-27 April 2012.
A Primer on Bootstrap Factor Analysis as Applied to Health Studies Research
ERIC Educational Resources Information Center
Lu, Wenhua; Miao, Jingang; McKyer, E. Lisako J.
2014-01-01
Objectives: To demonstrate how the bootstrap method could be conducted in exploratory factor analysis (EFA) with a syntax written in SPSS. Methods: The data obtained from the Texas Childhood Obesity Prevention Policy Evaluation project (T-COPPE project) were used for illustration. A 5-step procedure to conduct bootstrap factor analysis (BFA) was…
NASA Astrophysics Data System (ADS)
Yuan, Shenfang; Chen, Jian; Yang, Weibo; Qiu, Lei
2017-08-01
Fatigue crack growth prognosis is important for prolonging service time, improving safety, and reducing maintenance cost in many safety-critical systems, such as in aircraft, wind turbines, bridges, and nuclear plants. Combining fatigue crack growth models with the particle filter (PF) method has proved promising to deal with the uncertainties during fatigue crack growth and reach a more accurate prognosis. However, research on prognosis methods integrating on-line crack monitoring with the PF method is still lacking, as well as experimental verifications. Besides, the PF methods adopted so far are almost all sequential importance resampling-based PFs, which usually encounter sample impoverishment problems, and hence performs poorly. To solve these problems, in this paper, the piezoelectric transducers (PZTs)-based active Lamb wave method is adopted for on-line crack monitoring. The deterministic resampling PF (DRPF) is proposed to be used in fatigue crack growth prognosis, which can overcome the sample impoverishment problem. The proposed method is verified through fatigue tests of attachment lugs, which are a kind of important joint component in aerospace systems.
A Sequential Ensemble Prediction System at Convection Permitting Scales
NASA Astrophysics Data System (ADS)
Milan, M.; Simmer, C.
2012-04-01
A Sequential Assimilation Method (SAM) following some aspects of particle filtering with resampling, also called SIR (Sequential Importance Resampling), is introduced and applied in the framework of an Ensemble Prediction System (EPS) for weather forecasting on convection permitting scales, with focus to precipitation forecast. At this scale and beyond, the atmosphere increasingly exhibits chaotic behaviour and non linear state space evolution due to convectively driven processes. One way to take full account of non linear state developments are particle filter methods, their basic idea is the representation of the model probability density function by a number of ensemble members weighted by their likelihood with the observations. In particular particle filter with resampling abandons ensemble members (particles) with low weights restoring the original number of particles adding multiple copies of the members with high weights. In our SIR-like implementation we substitute the likelihood way to define weights and introduce a metric which quantifies the "distance" between the observed atmospheric state and the states simulated by the ensemble members. We also introduce a methodology to counteract filter degeneracy, i.e. the collapse of the simulated state space. To this goal we propose a combination of resampling taking account of simulated state space clustering and nudging. By keeping cluster representatives during resampling and filtering, the method maintains the potential for non linear system state development. We assume that a particle cluster with initially low likelihood may evolve in a state space with higher likelihood in a subsequent filter time thus mimicking non linear system state developments (e.g. sudden convection initiation) and remedies timing errors for convection due to model errors and/or imperfect initial condition. We apply a simplified version of the resampling, the particles with highest weights in each cluster are duplicated; for the model evolution for each particle pair one particle evolves using the forward model; the second particle, however, is nudged to the radar and satellite observation during its evolution based on the forward model.
Automated modal parameter estimation using correlation analysis and bootstrap sampling
NASA Astrophysics Data System (ADS)
Yaghoubi, Vahid; Vakilzadeh, Majid K.; Abrahamsson, Thomas J. S.
2018-02-01
The estimation of modal parameters from a set of noisy measured data is a highly judgmental task, with user expertise playing a significant role in distinguishing between estimated physical and noise modes of a test-piece. Various methods have been developed to automate this procedure. The common approach is to identify models with different orders and cluster similar modes together. However, most proposed methods based on this approach suffer from high-dimensional optimization problems in either the estimation or clustering step. To overcome this problem, this study presents an algorithm for autonomous modal parameter estimation in which the only required optimization is performed in a three-dimensional space. To this end, a subspace-based identification method is employed for the estimation and a non-iterative correlation-based method is used for the clustering. This clustering is at the heart of the paper. The keys to success are correlation metrics that are able to treat the problems of spatial eigenvector aliasing and nonunique eigenvectors of coalescent modes simultaneously. The algorithm commences by the identification of an excessively high-order model from frequency response function test data. The high number of modes of this model provides bases for two subspaces: one for likely physical modes of the tested system and one for its complement dubbed the subspace of noise modes. By employing the bootstrap resampling technique, several subsets are generated from the same basic dataset and for each of them a model is identified to form a set of models. Then, by correlation analysis with the two aforementioned subspaces, highly correlated modes of these models which appear repeatedly are clustered together and the noise modes are collected in a so-called Trashbox cluster. Stray noise modes attracted to the mode clusters are trimmed away in a second step by correlation analysis. The final step of the algorithm is a fuzzy c-means clustering procedure applied to a three-dimensional feature space to assign a degree of physicalness to each cluster. The proposed algorithm is applied to two case studies: one with synthetic data and one with real test data obtained from a hammer impact test. The results indicate that the algorithm successfully clusters similar modes and gives a reasonable quantification of the extent to which each cluster is physical.
NASA Astrophysics Data System (ADS)
Yepez, Santiago; Laraque, Alain; Martinez, Jean-Michel; De Sa, Jose; Carrera, Juan Manuel; Castellanos, Bartolo; Gallay, Marjorie; Lopez, Jose L.
2018-01-01
In this study, 81 Landsat-8 scenes acquired from 2013 to 2015 were used to estimate the suspended sediment concentration (SSC) in the Orinoco River at its main hydrological station at Ciudad Bolivar, Venezuela. This gauging station monitors an upstream area corresponding to 89% of the total catchment area where the mean discharge is of 33,000 m3·s-1. SSC spatial and temporal variabilities were analyzed in relation to the hydrological cycle and to local geomorphological characteristics of the river mainstream. Three types of atmospheric correction models were evaluated to correct the Landsat-8 images: DOS, FLAASH, and L8SR. Surface reflectance was compared with monthly water sampling to calibrate a SSC retrieval model using a bootstrapping resampling. A regression model based on surface reflectance at the Near-Infrared wavelengths showed the best performance: R2 = 0.92 (N = 27) for the whole range of SSC (18 to 203 mg·l-1) measured at this station during the studied period. The method offers a simple new approach to estimate the SSC along the lower Orinoco River and demonstrates the feasibility and reliability of remote sensing images to map the spatiotemporal variability in sediment transport over large rivers.
Toma, Tudor; Bosman, Robert-Jan; Siebes, Arno; Peek, Niels; Abu-Hanna, Ameen
2010-08-01
An important problem in the Intensive Care is how to predict on a given day of stay the eventual hospital mortality for a specific patient. A recent approach to solve this problem suggested the use of frequent temporal sequences (FTSs) as predictors. Methods following this approach were evaluated in the past by inducing a model from a training set and validating the prognostic performance on an independent test set. Although this evaluative approach addresses the validity of the specific models induced in an experiment, it falls short of evaluating the inductive method itself. To achieve this, one must account for the inherent sources of variation in the experimental design. The main aim of this work is to demonstrate a procedure based on bootstrapping, specifically the .632 bootstrap procedure, for evaluating inductive methods that discover patterns, such as FTSs. A second aim is to apply this approach to find out whether a recently suggested inductive method that discovers FTSs of organ functioning status is superior over a traditional method that does not use temporal sequences when compared on each successive day of stay at the Intensive Care Unit. The use of bootstrapping with logistic regression using pre-specified covariates is known in the statistical literature. Using inductive methods of prognostic models based on temporal sequence discovery within the bootstrap procedure is however novel at least in predictive models in the Intensive Care. Our results of applying the bootstrap-based evaluative procedure demonstrate the superiority of the FTS-based inductive method over the traditional method in terms of discrimination as well as accuracy. In addition we illustrate the insights gained by the analyst into the discovered FTSs from the bootstrap samples. Copyright 2010 Elsevier Inc. All rights reserved.
Maximum a posteriori resampling of noisy, spatially correlated data
NASA Astrophysics Data System (ADS)
Goff, John A.; Jenkins, Chris; Calder, Brian
2006-08-01
In any geologic application, noisy data are sources of consternation for researchers, inhibiting interpretability and marring images with unsightly and unrealistic artifacts. Filtering is the typical solution to dealing with noisy data. However, filtering commonly suffers from ad hoc (i.e., uncalibrated, ungoverned) application. We present here an alternative to filtering: a newly developed method for correcting noise in data by finding the "best" value given available information. The motivating rationale is that data points that are close to each other in space cannot differ by "too much," where "too much" is governed by the field covariance. Data with large uncertainties will frequently violate this condition and therefore ought to be corrected, or "resampled." Our solution for resampling is determined by the maximum of the a posteriori density function defined by the intersection of (1) the data error probability density function (pdf) and (2) the conditional pdf, determined by the geostatistical kriging algorithm applied to proximal data values. A maximum a posteriori solution can be computed sequentially going through all the data, but the solution depends on the order in which the data are examined. We approximate the global a posteriori solution by randomizing this order and taking the average. A test with a synthetic data set sampled from a known field demonstrates quantitatively and qualitatively the improvement provided by the maximum a posteriori resampling algorithm. The method is also applied to three marine geology/geophysics data examples, demonstrating the viability of the method for diverse applications: (1) three generations of bathymetric data on the New Jersey shelf with disparate data uncertainties; (2) mean grain size data from the Adriatic Sea, which is a combination of both analytic (low uncertainty) and word-based (higher uncertainty) sources; and (3) side-scan backscatter data from the Martha's Vineyard Coastal Observatory which are, as is typical for such data, affected by speckle noise. Compared to filtering, maximum a posteriori resampling provides an objective and optimal method for reducing noise, and better preservation of the statistical properties of the sampled field. The primary disadvantage is that maximum a posteriori resampling is a computationally expensive procedure.
Han, Guanghui; Liu, Xiabi; Zheng, Guangyuan; Wang, Murong; Huang, Shan
2018-06-06
Ground-glass opacity (GGO) is a common CT imaging sign on high-resolution CT, which means the lesion is more likely to be malignant compared to common solid lung nodules. The automatic recognition of GGO CT imaging signs is of great importance for early diagnosis and possible cure of lung cancers. The present GGO recognition methods employ traditional low-level features and system performance improves slowly. Considering the high-performance of CNN model in computer vision field, we proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling is performed on multi-views and multi-receptive fields, which reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has the ability to obtain the optimal fine-tuning model. Multi-CNN models fusion strategy obtains better performance than any single trained model. We evaluated our method on the GGO nodule samples in publicly available LIDC-IDRI dataset of chest CT scans. The experimental results show that our method yields excellent results with 96.64% sensitivity, 71.43% specificity, and 0.83 F1 score. Our method is a promising approach to apply deep learning method to computer-aided analysis of specific CT imaging signs with insufficient labeled images. Graphical abstract We proposed an automatic recognition method of 3D GGO CT imaging signs through the fusion of hybrid resampling and layer-wise fine-tuning CNN models in this paper. Our hybrid resampling reduces the risk of missing small or large GGOs by adopting representative sampling panels and processing GGOs with multiple scales simultaneously. The layer-wise fine-tuning strategy has ability to obtain the optimal fine-tuning model. Our method is a promising approach to apply deep learning method to computer-aided analysis of specific CT imaging signs with insufficient labeled images.
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection
NASA Technical Reports Server (NTRS)
Kumar, Sricharan; Srivistava, Ashok N.
2012-01-01
Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
Nonparametric bootstrap analysis with applications to demographic effects in demand functions.
Gozalo, P L
1997-12-01
"A new bootstrap proposal, labeled smooth conditional moment (SCM) bootstrap, is introduced for independent but not necessarily identically distributed data, where the classical bootstrap procedure fails.... A good example of the benefits of using nonparametric and bootstrap methods is the area of empirical demand analysis. In particular, we will be concerned with their application to the study of two important topics: what are the most relevant effects of household demographic variables on demand behavior, and to what extent present parametric specifications capture these effects." excerpt
Oblinsky, Daniel G; Vanschouwen, Bryan M B; Gordon, Heather L; Rothstein, Stuart M
2009-12-14
Given the principal component analysis (PCA) of a molecular dynamics (MD) conformational trajectory for a model protein, we perform orthogonal Procrustean rotation to "best fit" the PCA squared-loading matrix to that of a target matrix computed for a related but different molecular system. The sum of squared deviations of the elements of the rotated matrix from those of the target, known as the error of fit (EOF), provides a quantitative measure of the dissimilarity between the two conformational samples. To estimate precision of the EOF, we perform bootstrap resampling of the molecular conformations within the trajectories, generating a distribution of EOF values for the system and target. The average EOF per variable is determined and visualized to ascertain where, locally, system and target sample properties differ. We illustrate this approach by analyzing MD trajectories for the wild-type and four selected mutants of the beta1 domain of protein G.
NASA Astrophysics Data System (ADS)
Oblinsky, Daniel G.; VanSchouwen, Bryan M. B.; Gordon, Heather L.; Rothstein, Stuart M.
2009-12-01
Given the principal component analysis (PCA) of a molecular dynamics (MD) conformational trajectory for a model protein, we perform orthogonal Procrustean rotation to "best fit" the PCA squared-loading matrix to that of a target matrix computed for a related but different molecular system. The sum of squared deviations of the elements of the rotated matrix from those of the target, known as the error of fit (EOF), provides a quantitative measure of the dissimilarity between the two conformational samples. To estimate precision of the EOF, we perform bootstrap resampling of the molecular conformations within the trajectories, generating a distribution of EOF values for the system and target. The average EOF per variable is determined and visualized to ascertain where, locally, system and target sample properties differ. We illustrate this approach by analyzing MD trajectories for the wild-type and four selected mutants of the β1 domain of protein G.
A neurogenetics approach to defining differential susceptibility to institutional care
Brett, Zoe H.; Sheridan, Margaret; Humphreys, Kate; Smyke, Anna; Gleason, Mary Margaret; Fox, Nathan; Zeanah, Charles; Nelson, Charles; Drury, Stacy
2014-01-01
An individual's neurodevelopmental and cognitive sequelae to negative early experiences may, in part, be explained by genetic susceptibility. We examined whether extreme differences in the early caregiving environment, defined as exposure to severe psychosocial deprivation associated with institutional care compared to normative rearing, interacted with a biologically informed genoset comprising BDNF (rs6265), COMT (rs4680), and SIRT1 (rs3758391) to predict distinct outcomes of neurodevelopment at age 8 (N = 193, 97 males and 96 females). Ethnicity was categorized as Romanian (71%), Roma (21%), unknown (7%), or other (1%). We identified a significant interaction between early caregiving environment (i.e., institutionalized versus never institutionalized children) and the a priori defined genoset for full-scale IQ, two spatial working memory tasks, and prefrontal cortex gray matter volume. Model validation was performed using a bootstrap resampling procedure. Although we hypothesized that the effect of this genoset would operate in a manner consistent with differential susceptibility, our results demonstrate a complex interaction where vantage susceptibility, diathesis stress, and differential susceptibility are implicated. PMID:25663728
Irwin, Kara C; Konnert, Candace; Wong, May; O'Neill, Thomas A
2014-04-01
Symptoms of posttraumatic stress disorder (PTSD) and pain are often comorbid among veterans. The purpose of this study was to investigate to what extent symptoms of anxiety, depression, and alcohol use mediated the relationship between PTSD symptoms and pain among 113 treated male Canadian veterans. Measures of PTSD, pain, anxiety symptoms, depression symptoms, and alcohol use were collected as part of the initial assessment. The bootstrapped resampling analyses were consistent with the hypothesis of mediation for anxiety and depression, but not alcohol use. The confidence intervals did not include zero and the indirect effect of PTSD on pain through anxiety was .04, CI [.03, .07]. The indirect effect of PTSD on pain through depression was .04, CI [.02, .07]. These findings suggest that PTSD and pain symptoms among veterans may be related through the underlying symptoms of anxiety and depression, thus emphasizing the importance of targeting anxiety and depression symptoms when treating comorbid PTSD and pain patients. © 2014 International Society for Traumatic Stress Studies.
Alcántara, Carmela; Giorgio Cosenzo, Luciana Andrea; Fan, Weijia; Doyle, David Matthew; Shaffer, Jonathan A
2017-05-01
Although Blacks sleep between 37 and 75min less per night than non-Hispanic Whites, research into what drives racial differences in sleep duration is limited. We examined the association of anxiety sensitivity, a cognitive vulnerability, and race (Blacks vs. White) with short sleep duration (<7h of sleep/night), and whether anxiety sensitivity mediated race differences in sleep duration in a nationally representative sample of adults with cardiovascular disease. Overall, 1289 adults (115 Black, 1174 White) with a self-reported physician/health professional diagnosis of ≥1 myocardial infarction completed an online survey. Weighted multivariable logistic regressions and mediation analyses with bootstrapping and case resampling were conducted. Anxiety sensitivity and Black vs. White race were associated with 4%-84% increased odds, respectively, of short sleep duration. Anxiety sensitivity mediated Black-White differences in sleep duration. Each anxiety sensitivity subscale was also a significant mediator. Implications for future intervention science to address sleep disparities are discussed. Copyright © 2016 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kürbis, K.; Mudelsee, M.; Tetzlaff, G.; Brázdil, R.
2009-09-01
For the analysis of trends in weather extremes, we introduce a diagnostic index variable, the exceedance product, which combines intensity and frequency of extremes. We separate trends in higher moments from trends in mean or standard deviation and use bootstrap resampling to evaluate statistical significances. The application of the concept of the exceedance product to daily meteorological time series from Potsdam (1893 to 2005) and Prague-Klementinum (1775 to 2004) reveals that extremely cold winters occurred only until the mid-20th century, whereas warm winters show upward trends. These changes were significant in higher moments of the temperature distribution. In contrast, trends in summer temperature extremes (e.g., the 2003 European heatwave) can be explained by linear changes in mean or standard deviation. While precipitation at Potsdam does not show pronounced trends, dew point does exhibit a change from maximum extremes during the 1960s to minimum extremes during the 1970s.
A neurogenetics approach to defining differential susceptibility to institutional care.
Brett, Zoe H; Sheridan, Margaret; Humphreys, Kate; Smyke, Anna; Gleason, Mary Margaret; Fox, Nathan; Zeanah, Charles; Nelson, Charles; Drury, Stacy
2015-03-01
An individual's neurodevelopmental and cognitive sequelae to negative early experiences may, in part, be explained by genetic susceptibility. We examined whether extreme differences in the early caregiving environment, defined as exposure to severe psychosocial deprivation associated with institutional care compared to normative rearing, interacted with a biologically informed genoset comprising BDNF (rs6265), COMT (rs4680), and SIRT1 (rs3758391) to predict distinct outcomes of neurodevelopment at age 8 ( N = 193, 97 males and 96 females). Ethnicity was categorized as Romanian (71%), Roma (21%), unknown (7%), or other (1%). We identified a significant interaction between early caregiving environment (i.e., institutionalized versus never institutionalized children) and the a priori defined genoset for full-scale IQ, two spatial working memory tasks, and prefrontal cortex gray matter volume. Model validation was performed using a bootstrap resampling procedure. Although we hypothesized that the effect of this genoset would operate in a manner consistent with differential susceptibility, our results demonstrate a complex interaction where vantage susceptibility, diathesis stress, and differential susceptibility are implicated.
The role of interpersonal sensitivity, social support, and quality of life in rural older adults.
Wedgeworth, Monika; LaRocca, Michael A; Chaplin, William F; Scogin, Forrest
The mental health of elderly individuals in rural areas is increasingly relevant as populations age and social structures change. While social support satisfaction is a well-established predictor of quality of life, interpersonal sensitivity symptoms may diminish this relation. The current study extends the findings of Scogin et al by investigating the relationship among interpersonal sensitivity, social support satisfaction, and quality of life among rural older adults and exploring the mediating role of social support in the relation between interpersonal sensitivity and quality of life (N = 128). Hierarchical regression revealed that interpersonal sensitivity and social support satisfaction predicted quality of life. In addition, bootstrapping resampling supported the role of social support satisfaction as a mediator between interpersonal sensitivity symptoms and quality of life. These results underscore the importance of nurses and allied health providers in assessing and attending to negative self-perceptions of clients, as well as the perceived quality of their social networks. Copyright © 2016 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Guo, Jun; Lu, Siliang; Zhai, Chao; He, Qingbo
2018-02-01
An automatic bearing fault diagnosis method is proposed for permanent magnet synchronous generators (PMSGs), which are widely installed in wind turbines subjected to low rotating speeds, speed fluctuations, and electrical device noise interferences. The mechanical rotating angle curve is first extracted from the phase current of a PMSG by sequentially applying a series of algorithms. The synchronous sampled vibration signal of the fault bearing is then resampled in the angular domain according to the obtained rotating phase information. Considering that the resampled vibration signal is still overwhelmed by heavy background noise, an adaptive stochastic resonance filter is applied to the resampled signal to enhance the fault indicator and facilitate bearing fault identification. Two types of fault bearings with different fault sizes in a PMSG test rig are subjected to experiments to test the effectiveness of the proposed method. The proposed method is fully automated and thus shows potential for convenient, highly efficient and in situ bearing fault diagnosis for wind turbines subjected to harsh environments.
Quasi-Epipolar Resampling of High Resolution Satellite Stereo Imagery for Semi Global Matching
NASA Astrophysics Data System (ADS)
Tatar, N.; Saadatseresht, M.; Arefi, H.; Hadavand, A.
2015-12-01
Semi-global matching is a well-known stereo matching algorithm in photogrammetric and computer vision society. Epipolar images are supposed as input of this algorithm. Epipolar geometry of linear array scanners is not a straight line as in case of frame camera. Traditional epipolar resampling algorithms demands for rational polynomial coefficients (RPCs), physical sensor model or ground control points. In this paper we propose a new solution for epipolar resampling method which works without the need for these information. In proposed method, automatic feature extraction algorithms are employed to generate corresponding features for registering stereo pairs. Also original images are divided into small tiles. In this way by omitting the need for extra information, the speed of matching algorithm increased and the need for high temporal memory decreased. Our experiments on GeoEye-1 stereo pair captured over Qom city in Iran demonstrates that the epipolar images are generated with sub-pixel accuracy.
NASA Astrophysics Data System (ADS)
Ross, Z. E.; Ben-Zion, Y.; Zhu, L.
2015-02-01
We analyse source tensor properties of seven Mw > 4.2 earthquakes in the complex trifurcation area of the San Jacinto Fault Zone, CA, with a focus on isotropic radiation that may be produced by rock damage in the source volumes. The earthquake mechanisms are derived with generalized `Cut and Paste' (gCAP) inversions of three-component waveforms typically recorded by >70 stations at regional distances. The gCAP method includes parameters ζ and χ representing, respectively, the relative strength of the isotropic and CLVD source terms. The possible errors in the isotropic and CLVD components due to station variability is quantified with bootstrap resampling for each event. The results indicate statistically significant explosive isotropic components for at least six of the events, corresponding to ˜0.4-8 per cent of the total potency/moment of the sources. In contrast, the CLVD components for most events are not found to be statistically significant. Trade-off and correlation between the isotropic and CLVD components are studied using synthetic tests with realistic station configurations. The associated uncertainties are found to be generally smaller than the observed isotropic components. Two different tests with velocity model perturbation are conducted to quantify the uncertainty due to inaccuracies in the Green's functions. Applications of the Mann-Whitney U test indicate statistically significant explosive isotropic terms for most events consistent with brittle damage production at the source.
Maeda, Jin; Suzuki, Tatsuya; Takayama, Kozo
2012-12-01
A large-scale design space was constructed using a Bayesian estimation method with a small-scale design of experiments (DoE) and small sets of large-scale manufacturing data without enforcing a large-scale DoE. The small-scale DoE was conducted using various Froude numbers (X(1)) and blending times (X(2)) in the lubricant blending process for theophylline tablets. The response surfaces, design space, and their reliability of the compression rate of the powder mixture (Y(1)), tablet hardness (Y(2)), and dissolution rate (Y(3)) on a small scale were calculated using multivariate spline interpolation, a bootstrap resampling technique, and self-organizing map clustering. The constant Froude number was applied as a scale-up rule. Three experiments under an optimal condition and two experiments under other conditions were performed on a large scale. The response surfaces on the small scale were corrected to those on a large scale by Bayesian estimation using the large-scale results. Large-scale experiments under three additional sets of conditions showed that the corrected design space was more reliable than that on the small scale, even if there was some discrepancy in the pharmaceutical quality between the manufacturing scales. This approach is useful for setting up a design space in pharmaceutical development when a DoE cannot be performed at a commercial large manufacturing scale.
Gridless, pattern-driven point cloud completion and extension
NASA Astrophysics Data System (ADS)
Gravey, Mathieu; Mariethoz, Gregoire
2016-04-01
While satellites offer Earth observation with a wide coverage, other remote sensing techniques such as terrestrial LiDAR can acquire very high-resolution data on an area that is limited in extension and often discontinuous due to shadow effects. Here we propose a numerical approach to merge these two types of information, thereby reconstructing high-resolution data on a continuous large area. It is based on a pattern matching process that completes the areas where only low-resolution data is available, using bootstrapped high-resolution patterns. Currently, the most common approach to pattern matching is to interpolate the point data on a grid. While this approach is computationally efficient, it presents major drawbacks for point clouds processing because a significant part of the information is lost in the point-to-grid resampling, and that a prohibitive amount of memory is needed to store large grids. To address these issues, we propose a gridless method that compares point clouds subsets without the need to use a grid. On-the-fly interpolation involves a heavy computational load, which is met by using a GPU high-optimized implementation and a hierarchical pattern searching strategy. The method is illustrated using data from the Val d'Arolla, Swiss Alps, where high-resolution terrestrial LiDAR data are fused with lower-resolution Landsat and WorldView-3 acquisitions, such that the density of points is homogeneized (data completion) and that it is extend to a larger area (data extension).
Grinde, Kelsey E.; Arbet, Jaron; Green, Alden; O'Connell, Michael; Valcarcel, Alessandra; Westra, Jason; Tintle, Nathan
2017-01-01
To date, gene-based rare variant testing approaches have focused on aggregating information across sets of variants to maximize statistical power in identifying genes showing significant association with diseases. Beyond identifying genes that are associated with diseases, the identification of causal variant(s) in those genes and estimation of their effect is crucial for planning replication studies and characterizing the genetic architecture of the locus. However, we illustrate that straightforward single-marker association statistics can suffer from substantial bias introduced by conditioning on gene-based test significance, due to the phenomenon often referred to as “winner's curse.” We illustrate the ramifications of this bias on variant effect size estimation and variant prioritization/ranking approaches, outline parameters of genetic architecture that affect this bias, and propose a bootstrap resampling method to correct for this bias. We find that our correction method significantly reduces the bias due to winner's curse (average two-fold decrease in bias, p < 2.2 × 10−6) and, consequently, substantially improves mean squared error and variant prioritization/ranking. The method is particularly helpful in adjustment for winner's curse effects when the initial gene-based test has low power and for relatively more common, non-causal variants. Adjustment for winner's curse is recommended for all post-hoc estimation and ranking of variants after a gene-based test. Further work is necessary to continue seeking ways to reduce bias and improve inference in post-hoc analysis of gene-based tests under a wide variety of genetic architectures. PMID:28959274
The index gage method to develop a flow duration curve from short-term streamflow records
NASA Astrophysics Data System (ADS)
Zhang, Zhenxing
2017-10-01
The flow duration curve (FDC) is one of the most commonly used graphical tools in hydrology and provides a comprehensive graphical view of streamflow variability at a particular site. For a gaged site, an FDC can be easily estimated with frequency analysis. When no streamflow records are available, regional FDCs are used to synthesize FDCs. However, studies on how to develop FDCs for sites with short-term records have been very limited. Deriving representative FDC when there are short-term hydrologic records is important. For instance, 43% of the 394 streamflow gages in Illinois have records of 20 years or fewer, and these short-term gages are often distributed in headwaters and contain valuable hydrologic information. In this study, the index gage method is proposed to develop FDCs using short-term hydrologic records via an information transfer technique from a nearby hydrologically similar index gage. There are three steps: (1) select an index gage; (2) determine changes of FDC; and (3) develop representative FDCs. The approach is tested using records from 92 U.S. Geological Survey streamflow gages in Illinois. A jackknife experiment is conducted to assess the performance. Bootstrap resampling is used to simulate various periods of records, i.e., 1, 2, 5, 10, 15, and 20 years of records. The results demonstrated that the index gage method is capable of developing a representative FDC using short-term records. Generally, the approach performance is improved when more hydrologic records are available, but the improvement appears to level off when the short-term gage has 10 years or more records.
A search for evidence of solar rotation in Super-Kamiokande solar neutrino dataset
NASA Astrophysics Data System (ADS)
Desai, Shantanu; Liu, Dawei W.
2016-09-01
We apply the generalized Lomb-Scargle (LS) periodogram, proposed by Zechmeister and Kurster, to the solar neutrino data from Super-Kamiokande (Super-K) using data from its first five years. For each peak in the LS periodogram, we evaluate the statistical significance in two different ways. The first method involves calculating the False Alarm Probability (FAP) using non-parametric bootstrap resampling, and the second method is by calculating the difference in Bayesian Information Criterion (BIC) between the null hypothesis, viz. the data contains only noise, compared to the hypothesis that the data contains a peak at a given frequency. Using these methods, we scan the frequency range between 7-14 cycles per year to look for any peaks caused by solar rotation, since this is the proposed explanation for the statistically significant peaks found by Sturrock and collaborators in the Super-K dataset. From our analysis, we do confirm that similar to Sturrock et al, the maximum peak occurs at a frequency of 9.42/year, corresponding to a period of 38.75 days. The FAP for this peak is about 1.5% and the difference in BIC (between pure white noise and this peak) is about 4.8. We note that the significance depends on the frequency band used to search for peaks and hence it is important to use a search band appropriate for solar rotation. However, The significance of this peak based on the value of BIC is marginal and more data is needed to confirm if the peak persists and is real.
Variable selection under multiple imputation using the bootstrap in a prognostic study
Heymans, Martijn W; van Buuren, Stef; Knol, Dirk L; van Mechelen, Willem; de Vet, Henrica CW
2007-01-01
Background Missing data is a challenging problem in many prognostic studies. Multiple imputation (MI) accounts for imputation uncertainty that allows for adequate statistical testing. We developed and tested a methodology combining MI with bootstrapping techniques for studying prognostic variable selection. Method In our prospective cohort study we merged data from three different randomized controlled trials (RCTs) to assess prognostic variables for chronicity of low back pain. Among the outcome and prognostic variables data were missing in the range of 0 and 48.1%. We used four methods to investigate the influence of respectively sampling and imputation variation: MI only, bootstrap only, and two methods that combine MI and bootstrapping. Variables were selected based on the inclusion frequency of each prognostic variable, i.e. the proportion of times that the variable appeared in the model. The discriminative and calibrative abilities of prognostic models developed by the four methods were assessed at different inclusion levels. Results We found that the effect of imputation variation on the inclusion frequency was larger than the effect of sampling variation. When MI and bootstrapping were combined at the range of 0% (full model) to 90% of variable selection, bootstrap corrected c-index values of 0.70 to 0.71 and slope values of 0.64 to 0.86 were found. Conclusion We recommend to account for both imputation and sampling variation in sets of missing data. The new procedure of combining MI with bootstrapping for variable selection, results in multivariable prognostic models with good performance and is therefore attractive to apply on data sets with missing values. PMID:17629912
Zhang, Bo; Liu, Wei; Zhang, Zhiwei; Qu, Yanping; Chen, Zhen; Albert, Paul S
2017-08-01
Joint modeling and within-cluster resampling are two approaches that are used for analyzing correlated data with informative cluster sizes. Motivated by a developmental toxicity study, we examined the performances and validity of these two approaches in testing covariate effects in generalized linear mixed-effects models. We show that the joint modeling approach is robust to the misspecification of cluster size models in terms of Type I and Type II errors when the corresponding covariates are not included in the random effects structure; otherwise, statistical tests may be affected. We also evaluate the performance of the within-cluster resampling procedure and thoroughly investigate the validity of it in modeling correlated data with informative cluster sizes. We show that within-cluster resampling is a valid alternative to joint modeling for cluster-specific covariates, but it is invalid for time-dependent covariates. The two methods are applied to a developmental toxicity study that investigated the effect of exposure to diethylene glycol dimethyl ether.
Efficacy of Guided iCBT for Depression and Mediation of Change by Cognitive Skill Acquisition.
Forand, Nicholas R; Barnett, Jeffrey G; Strunk, Daniel R; Hindiyeh, Mohammed U; Feinberg, Jason E; Keefe, John R
2018-03-01
Guided internet CBT (iCBT) is a promising treatment for depression; however, it is less well known through what mechanisms iCBT works. Two possible mediators of change are the acquisition of cognitive skills and increases in behavioral activation. We report results of an 8-week waitlist controlled trial of guided iCBT, and test whether early change in cognitive skills or behavioral activation mediated subsequent change in depression. The sample was 89 individuals randomized to guided iCBT (n = 59) or waitlist (n = 30). Participants were 75% female, 72% Caucasian, and 33 years old on average. The PHQ9 was the primary outcome measure. Mediators were the Competencies of Cognitive Therapy Scale-Self Report and the Behavioral Activation Scale for Depression-Short Form. Treatment was Beating the Blues plus manualized coaching. Outcomes were analyzed using linear mixed models, and mediation with a bootstrap resampling approach. The iCBT group was superior to waitlist, with large effect sizes at posttreatment (Hedges' g = 1.45). Dropout of iCBT was 29% versus 10% for waitlist. In the mediation analyses, the acquisition of cognitive skills mediated subsequent depression change (indirect effect = -.61, 95% bootstrapped biased corrected CI: -1.47, -0.09), but increases in behavioral activation did not. iCBT is an effective treatment for depression, but dropout rates remain high. Change in iCBT appears to be mediated by improvements in the use of cognitive skills, such as critically evaluating and restructuring negative thoughts. Copyright © 2017. Published by Elsevier Ltd.
Application of the Bootstrap Statistical Method in Deriving Vibroacoustic Specifications
NASA Technical Reports Server (NTRS)
Hughes, William O.; Paez, Thomas L.
2006-01-01
This paper discusses the Bootstrap Method for specification of vibroacoustic test specifications. Vibroacoustic test specifications are necessary to properly accept or qualify a spacecraft and its components for the expected acoustic, random vibration and shock environments seen on an expendable launch vehicle. Traditionally, NASA and the U.S. Air Force have employed methods of Normal Tolerance Limits to derive these test levels based upon the amount of data available, and the probability and confidence levels desired. The Normal Tolerance Limit method contains inherent assumptions about the distribution of the data. The Bootstrap is a distribution-free statistical subsampling method which uses the measured data themselves to establish estimates of statistical measures of random sources. This is achieved through the computation of large numbers of Bootstrap replicates of a data measure of interest and the use of these replicates to derive test levels consistent with the probability and confidence desired. The comparison of the results of these two methods is illustrated via an example utilizing actual spacecraft vibroacoustic data.
NASA Astrophysics Data System (ADS)
Collins, Jarrod A.; Heiselman, Jon S.; Weis, Jared A.; Clements, Logan W.; Simpson, Amber L.; Jarnagin, William R.; Miga, Michael I.
2017-03-01
In image-guided liver surgery (IGLS), sparse representations of the anterior organ surface may be collected intraoperatively to drive image-to-physical space registration. Soft tissue deformation represents a significant source of error for IGLS techniques. This work investigates the impact of surface data quality on current surface based IGLS registration methods. In this work, we characterize the robustness of our IGLS registration methods to noise in organ surface digitization. We study this within a novel human-to-phantom data framework that allows a rapid evaluation of clinically realistic data and noise patterns on a fully characterized hepatic deformation phantom. Additionally, we implement a surface data resampling strategy that is designed to decrease the impact of differences in surface acquisition. For this analysis, n=5 cases of clinical intraoperative data consisting of organ surface and salient feature digitizations from open liver resection were collected and analyzed within our human-to-phantom validation framework. As expected, results indicate that increasing levels of noise in surface acquisition cause registration fidelity to deteriorate. With respect to rigid registration using the raw and resampled data at clinically realistic levels of noise (i.e. a magnitude of 1.5 mm), resampling improved TRE by 21%. In terms of nonrigid registration, registrations using resampled data outperformed the raw data result by 14% at clinically realistic levels and were less susceptible to noise across the range of noise investigated. These results demonstrate the types of analyses our novel human-to-phantom validation framework can provide and indicate the considerable benefits of resampling strategies.
An optical systems analysis approach to image resampling
NASA Technical Reports Server (NTRS)
Lyon, Richard G.
1997-01-01
All types of image registration require some type of resampling, either during the registration or as a final step in the registration process. Thus the image(s) must be regridded into a spatially uniform, or angularly uniform, coordinate system with some pre-defined resolution. Frequently the ending resolution is not the resolution at which the data was observed with. The registration algorithm designer and end product user are presented with a multitude of possible resampling methods each of which modify the spatial frequency content of the data in some way. The purpose of this paper is threefold: (1) to show how an imaging system modifies the scene from an end to end optical systems analysis approach, (2) to develop a generalized resampling model, and (3) empirically apply the model to simulated radiometric scene data and tabulate the results. A Hanning windowed sinc interpolator method will be developed based upon the optical characterization of the system. It will be discussed in terms of the effects and limitations of sampling, aliasing, spectral leakage, and computational complexity. Simulated radiometric scene data will be used to demonstrate each of the algorithms. A high resolution scene will be "grown" using a fractal growth algorithm based on mid-point recursion techniques. The result scene data will be convolved with a point spread function representing the optical response. The resultant scene will be convolved with the detection systems response and subsampled to the desired resolution. The resultant data product will be subsequently resampled to the correct grid using the Hanning windowed sinc interpolator and the results and errors tabulated and discussed.
Surface Fitting for Quasi Scattered Data from Coordinate Measuring Systems.
Mao, Qing; Liu, Shugui; Wang, Sen; Ma, Xinhui
2018-01-13
Non-uniform rational B-spline (NURBS) surface fitting from data points is wildly used in the fields of computer aided design (CAD), medical imaging, cultural relic representation and object-shape detection. Usually, the measured data acquired from coordinate measuring systems is neither gridded nor completely scattered. The distribution of this kind of data is scattered in physical space, but the data points are stored in a way consistent with the order of measurement, so it is named quasi scattered data in this paper. Therefore they can be organized into rows easily but the number of points in each row is random. In order to overcome the difficulty of surface fitting from this kind of data, a new method based on resampling is proposed. It consists of three major steps: (1) NURBS curve fitting for each row, (2) resampling on the fitted curve and (3) surface fitting from the resampled data. Iterative projection optimization scheme is applied in the first and third step to yield advisable parameterization and reduce the time cost of projection. A resampling approach based on parameters, local peaks and contour curvature is proposed to overcome the problems of nodes redundancy and high time consumption in the fitting of this kind of scattered data. Numerical experiments are conducted with both simulation and practical data, and the results show that the proposed method is fast, effective and robust. What's more, by analyzing the fitting results acquired form data with different degrees of scatterness it can be demonstrated that the error introduced by resampling is negligible and therefore it is feasible.
Comparison of Methods for Estimating Low Flow Characteristics of Streams
Tasker, Gary D.
1987-01-01
Four methods for estimating the 7-day, 10-year and 7-day, 20-year low flows for streams are compared by the bootstrap method. The bootstrap method is a Monte Carlo technique in which random samples are drawn from an unspecified sampling distribution defined from observed data. The nonparametric nature of the bootstrap makes it suitable for comparing methods based on a flow series for which the true distribution is unknown. Results show that the two methods based on hypothetical distribution (Log-Pearson III and Weibull) had lower mean square errors than did the G. E. P. Box-D. R. Cox transformation method or the Log-W. C. Boughton method which is based on a fit of plotting positions.
A Bootstrap Generalization of Modified Parallel Analysis for IRT Dimensionality Assessment
ERIC Educational Resources Information Center
Finch, Holmes; Monahan, Patrick
2008-01-01
This article introduces a bootstrap generalization to the Modified Parallel Analysis (MPA) method of test dimensionality assessment using factor analysis. This methodology, based on the use of Marginal Maximum Likelihood nonlinear factor analysis, provides for the calculation of a test statistic based on a parametric bootstrap using the MPA…
Forensic identification of resampling operators: A semi non-intrusive approach.
Cao, Gang; Zhao, Yao; Ni, Rongrong
2012-03-10
Recently, several new resampling operators have been proposed and successfully invalidate the existing resampling detectors. However, the reliability of such anti-forensic techniques is unaware and needs to be investigated. In this paper, we focus on the forensic identification of digital image resampling operators including the traditional type and the anti-forensic type which hides the trace of traditional resampling. Various resampling algorithms involving geometric distortion (GD)-based, dual-path-based and postprocessing-based are investigated. The identification is achieved in the manner of semi non-intrusive, supposing the resampling software could be accessed. Given an input pattern of monotone signal, polarity aberration of GD-based resampled signal's first derivative is analyzed theoretically and measured by effective feature metric. Dual-path-based and postprocessing-based resampling can also be identified by feeding proper test patterns. Experimental results on various parameter settings demonstrate the effectiveness of the proposed approach. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Gaussian Process Interpolation for Uncertainty Estimation in Image Registration
Wachinger, Christian; Golland, Polina; Reuter, Martin; Wells, William
2014-01-01
Intensity-based image registration requires resampling images on a common grid to evaluate the similarity function. The uncertainty of interpolation varies across the image, depending on the location of resampled points relative to the base grid. We propose to perform Bayesian inference with Gaussian processes, where the covariance matrix of the Gaussian process posterior distribution estimates the uncertainty in interpolation. The Gaussian process replaces a single image with a distribution over images that we integrate into a generative model for registration. Marginalization over resampled images leads to a new similarity measure that includes the uncertainty of the interpolation. We demonstrate that our approach increases the registration accuracy and propose an efficient approximation scheme that enables seamless integration with existing registration methods. PMID:25333127
Bootstrapping Methods Applied for Simulating Laboratory Works
ERIC Educational Resources Information Center
Prodan, Augustin; Campean, Remus
2005-01-01
Purpose: The aim of this work is to implement bootstrapping methods into software tools, based on Java. Design/methodology/approach: This paper presents a category of software e-tools aimed at simulating laboratory works and experiments. Findings: Both students and teaching staff use traditional statistical methods to infer the truth from sample…
The empirical Gaia G-band extinction coefficient
NASA Astrophysics Data System (ADS)
Danielski, C.; Babusiaux, C.; Ruiz-Dern, L.; Sartoretti, P.; Arenou, F.
2018-06-01
Context. The first Gaia data release unlocked the access to photometric information for 1.1 billion sources in the G-band. Yet, given the high level of degeneracy between extinction and spectral energy distribution for large passbands such as the Gaia G-band, a correction for the interstellar reddening is needed in order to exploit Gaia data. Aims: The purpose of this manuscript is to provide the empirical estimation of the Gaia G-band extinction coefficient kG for both the red giants and main sequence stars in order to be able to exploit the first data release DR1. Methods: We selected two samples of single stars: one for the red giants and one for the main sequence. Both samples are the result of a cross-match between Gaia DR1 and 2MASS catalogues; they consist of high-quality photometry in the G-, J- and KS-bands. These samples were complemented by temperature and metallicity information retrieved from APOGEE DR13 and LAMOST DR2 surveys, respectively. We implemented a Markov chain Monte Carlo method where we used (G - KS)0 versus Teff and (J - KS)0 versus (G - KS)0, calibration relations to estimate the extinction coefficient kG and we quantify its corresponding confidence interval via bootstrap resampling. We tested our method on samples of red giants and main sequence stars, finding consistent solutions. Results: We present here the determination of the Gaia extinction coefficient through a completely empirical method. Furthermore we provide the scientific community with a formula for measuring the extinction coefficient as a function of stellar effective temperature, the intrinsic colour (G - KS)0, and absorption.
Benchmarking of a T-wave alternans detection method based on empirical mode decomposition.
Blanco-Velasco, Manuel; Goya-Esteban, Rebeca; Cruz-Roldán, Fernando; García-Alberola, Arcadi; Rojo-Álvarez, José Luis
2017-07-01
T-wave alternans (TWA) is a fluctuation of the ST-T complex occurring on an every-other-beat basis of the surface electrocardiogram (ECG). It has been shown to be an informative risk stratifier for sudden cardiac death, though the lack of gold standard to benchmark detection methods has promoted the use of synthetic signals. This work proposes a novel signal model to study the performance of a TWA detection. Additionally, the methodological validation of a denoising technique based on empirical mode decomposition (EMD), which is used here along with the spectral method, is also tackled. The proposed test bed system is based on the following guidelines: (1) use of open source databases to enable experimental replication; (2) use of real ECG signals and physiological noise; (3) inclusion of randomized TWA episodes. Both sensitivity (Se) and specificity (Sp) are separately analyzed. Also a nonparametric hypothesis test, based on Bootstrap resampling, is used to determine whether the presence of the EMD block actually improves the performance. The results show an outstanding specificity when the EMD block is used, even in very noisy conditions (0.96 compared to 0.72 for SNR = 8 dB), being always superior than that of the conventional SM alone. Regarding the sensitivity, using the EMD method also outperforms in noisy conditions (0.57 compared to 0.46 for SNR=8 dB), while it decreases in noiseless conditions. The proposed test setting designed to analyze the performance guarantees that the actual physiological variability of the cardiac system is reproduced. The use of the EMD-based block in noisy environment enables the identification of most patients with fatal arrhythmias. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Jiali; Han, Yuefeng; Stein, Michael L.
2016-02-10
The Weather Research and Forecast (WRF) model downscaling skill in extreme maximum daily temperature is evaluated by using the generalized extreme value (GEV) distribution. While the GEV distribution has been used extensively in climatology and meteorology for estimating probabilities of extreme events, accurately estimating GEV parameters based on data from a single pixel can be difficult, even with fairly long data records. This work proposes a simple method assuming that the shape parameter, the most difficult of the three parameters to estimate, does not vary over a relatively large region. This approach is applied to evaluate 31-year WRF-downscaled extreme maximummore » temperature through comparison with North American Regional Reanalysis (NARR) data. Uncertainty in GEV parameter estimates and the statistical significance in the differences of estimates between WRF and NARR are accounted for by conducting bootstrap resampling. Despite certain biases over parts of the United States, overall, WRF shows good agreement with NARR in the spatial pattern and magnitudes of GEV parameter estimates. Both WRF and NARR show a significant increase in extreme maximum temperature over the southern Great Plains and southeastern United States in January and over the western United States in July. The GEV model shows clear benefits from the regionally constant shape parameter assumption, for example, leading to estimates of the location and scale parameters of the model that show coherent spatial patterns.« less
Personalized cumulative UV tracking on mobiles & wearables.
Dey, S; Sahoo, S; Agrawal, H; Mondal, A; Bhowmik, T; Tiwari, V N
2017-07-01
Maintaining a balanced Ultra Violet (UV) exposure level is vital for a healthy living as the excess of UV dose can lead to critical diseases such as skin cancer while the absence can cause vitamin D deficiency which has recently been linked to onset of cardiac abnormalities. Here, we propose a personalized cumulative UV dose (CUVD) estimation system for smartwatch and smartphone devices having the following novelty factors; (a) sensor orientation invariant measurement of UV exposure using a bootstrap resampling technique, (b) estimation of UV exposure using only light intensity (lux) sensor (c) optimal UV exposure dose estimation. Our proposed method will eliminate the need for a dedicated UV sensor thus widen the user base of the proposed solution, render it unobtrusive by eliminating the critical requirement of orienting the device in a direction facing the sun. The system is implemented on android mobile platform and validated on 1200 minutes of lux and UV index (UVI) data collected across several days covering morning to evening time frames. The result shows very impressive final UVI estimation accuracy. We believe our proposed solution will enable the future wearable and smartphone users to obtain a seamless personalized UV exposure dose across a day paving a way for simple yet very useful recommendations such as right skin protective measure for reducing risk factors of long term UV exposure related diseases like skin cancer and, cardiac abnormality.
Delorme, Arnaud; Makeig, Scott
2004-03-15
We have developed a toolbox and graphic user interface, EEGLAB, running under the crossplatform MATLAB environment (The Mathworks, Inc.) for processing collections of single-trial and/or averaged EEG data of any number of channels. Available functions include EEG data, channel and event information importing, data visualization (scrolling, scalp map and dipole model plotting, plus multi-trial ERP-image plots), preprocessing (including artifact rejection, filtering, epoch selection, and averaging), independent component analysis (ICA) and time/frequency decompositions including channel and component cross-coherence supported by bootstrap statistical methods based on data resampling. EEGLAB functions are organized into three layers. Top-layer functions allow users to interact with the data through the graphic interface without needing to use MATLAB syntax. Menu options allow users to tune the behavior of EEGLAB to available memory. Middle-layer functions allow users to customize data processing using command history and interactive 'pop' functions. Experienced MATLAB users can use EEGLAB data structures and stand-alone signal processing functions to write custom and/or batch analysis scripts. Extensive function help and tutorial information are included. A 'plug-in' facility allows easy incorporation of new EEG modules into the main menu. EEGLAB is freely available (http://www.sccn.ucsd.edu/eeglab/) under the GNU public license for noncommercial use and open source development, together with sample data, user tutorial and extensive documentation.
Applying causal mediation analysis to personality disorder research.
Walters, Glenn D
2018-01-01
This article is designed to address fundamental issues in the application of causal mediation analysis to research on personality disorders. Causal mediation analysis is used to identify mechanisms of effect by testing variables as putative links between the independent and dependent variables. As such, it would appear to have relevance to personality disorder research. It is argued that proper implementation of causal mediation analysis requires that investigators take several factors into account. These factors are discussed under 5 headings: variable selection, model specification, significance evaluation, effect size estimation, and sensitivity testing. First, care must be taken when selecting the independent, dependent, mediator, and control variables for a mediation analysis. Some variables make better mediators than others and all variables should be based on reasonably reliable indicators. Second, the mediation model needs to be properly specified. This requires that the data for the analysis be prospectively or historically ordered and possess proper causal direction. Third, it is imperative that the significance of the identified pathways be established, preferably with a nonparametric bootstrap resampling approach. Fourth, effect size estimates should be computed or competing pathways compared. Finally, investigators employing the mediation method are advised to perform a sensitivity analysis. Additional topics covered in this article include parallel and serial multiple mediation designs, moderation, and the relationship between mediation and moderation. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Bakun, W.H.; Scotti, O.
2006-01-01
Intensity assignments for 33 calibration earthquakes were used to develop intensity attenuation models for the Alps, Armorican, Provence, Pyrenees and Rhine regions of France. Intensity decreases with ?? most rapidly in the French Alps, Provence and Pyrenees regions, and least rapidly in the Armorican and Rhine regions. The comparable Armorican and Rhine region attenuation models are aggregated into a French stable continental region model and the comparable Provence and Pyrenees region models are aggregated into a Southern France model. We analyse MSK intensity assignments using the technique of Bakun & Wentworth, which provides an objective method for estimating epicentral location and intensity magnitude MI. MI for the 1356 October 18 earthquake in the French stable continental region is 6.6 for a location near Basle, Switzerland, and moment magnitude M is 5.9-7.2 at the 95 per cent (??2??) confidence level. MI for the 1909 June 11 Trevaresse (Lambesc) earthquake near Marseilles in the Southern France region is 5.5, and M is 4.9-6.0 at the 95 per cent confidence level. Bootstrap resampling techniques are used to calculate objective, reproducible 67 per cent and 95 per cent confidence regions for the locations of historical earthquakes. These confidence regions for location provide an attractive alternative to the macroseismic epicentre and qualitative location uncertainties used heretofore. ?? 2006 The Authors Journal compilation ?? 2006 RAS.
Examining mediators of child sexual abuse and sexually transmitted infections.
Sutherland, Melissa A
2011-01-01
Interpersonal violence has increasingly been identified as a risk factor for sexually transmitted infections. Understanding the pathways between violence and sexually transmitted infections is essential to designing effective interventions. The aim of this study was to examine dissociative symptoms, alcohol use, and intimate partner physical violence and sexual coercion as mediators of child sexual abuse and lifetime sexually transmitted infection diagnosis among a sample of women. A convenience sample of 202 women was recruited from healthcare settings, with 189 complete cases for analysis. A multiple mediation model tested the proposed mediators of child sexual abuse and lifetime sexually transmitted infection diagnosis. Bootstrapping, a resampling method, was used to test for mediation. Key variables included child sexual abuse, dissociative symptoms, alcohol use, and intimate partner violence. Child sexual abuse was reported by 46% of the study participants (n = 93). Child sexual abuse was found to have an indirect effect on lifetime sexually transmitted infection diagnosis, with the effect occurring through dissociative symptoms (95% confidence interval = 0.0033-0.4714) and sexual coercion (95% confidence interval = 0.0359-0.7694). Alcohol use and physical violence were not found to be significant mediators. This study suggests that dissociation and intimate partner sexual coercion are important mediators of child sexual abuse and sexually transmitted infection diagnosis. Therefore, interventions that consider the roles of dissociative symptoms and interpersonal violence may be effective in preventing sexually transmitted infections among women.
Lawrence, Gregory B.; Fernandez, Ivan J.; Richter, Daniel D.; Ross, Donald S.; Hazlett, Paul W.; Bailey, Scott W.; Oiumet, Rock; Warby, Richard A.F.; Johnson, Arthur H.; Lin, Henry; Kaste, James M.; Lapenis, Andrew G.; Sullivan, Timothy J.
2013-01-01
Environmental change is monitored in North America through repeated measurements of weather, stream and river flow, air and water quality, and most recently, soil properties. Some skepticism remains, however, about whether repeated soil sampling can effectively distinguish between temporal and spatial variability, and efforts to document soil change in forest ecosystems through repeated measurements are largely nascent and uncoordinated. In eastern North America, repeated soil sampling has begun to provide valuable information on environmental problems such as air pollution. This review synthesizes the current state of the science to further the development and use of soil resampling as an integral method for recording and understanding environmental change in forested settings. The origins of soil resampling reach back to the 19th century in England and Russia. The concepts and methodologies involved in forest soil resampling are reviewed and evaluated through a discussion of how temporal and spatial variability can be addressed with a variety of sampling approaches. Key resampling studies demonstrate the type of results that can be obtained through differing approaches. Ongoing, large-scale issues such as recovery from acidification, long-term N deposition, C sequestration, effects of climate change, impacts from invasive species, and the increasing intensification of soil management all warrant the use of soil resampling as an essential tool for environmental monitoring and assessment. Furthermore, with better awareness of the value of soil resampling, studies can be designed with a long-term perspective so that information can be efficiently obtained well into the future to address problems that have not yet surfaced.
Baron, Charles A.; Awan, Musaddiq J.; Mohamed, Abdallah S.R.; Akel, Imad; Rosenthal, David I.; Gunn, G. Brandon; Garden, Adam S.; Dyer, Brandon A.; Court, Laurence; Sevak, Parag R.; Kocak‐Uzel, Esengul
2014-01-01
Larynx may alternatively serve as a target or organs at risk (OAR) in head and neck cancer (HNC) image‐guided radiotherapy (IGRT). The objective of this study was to estimate IGRT parameters required for larynx positional error independent of isocentric alignment and suggest population‐based compensatory margins. Ten HNC patients receiving radiotherapy (RT) with daily CT on‐rails imaging were assessed. Seven landmark points were placed on each daily scan. Taking the most superior‐anterior point of the C5 vertebra as a reference isocenter for each scan, residual displacement vectors to the other six points were calculated postisocentric alignment. Subsequently, using the first scan as a reference, the magnitude of vector differences for all six points for all scans over the course of treatment was calculated. Residual systematic and random error and the necessary compensatory CTV‐to‐PTV and OAR‐to‐PRV margins were calculated, using both observational cohort data and a bootstrap‐resampled population estimator. The grand mean displacements for all anatomical points was 5.07 mm, with mean systematic error of 1.1 mm and mean random setup error of 2.63 mm, while bootstrapped POIs grand mean displacement was 5.09 mm, with mean systematic error of 1.23 mm and mean random setup error of 2.61 mm. Required margin for CTV‐PTV expansion was 4.6 mm for all cohort points, while the bootstrap estimator of the equivalent margin was 4.9 mm. The calculated OAR‐to‐PRV expansion for the observed residual setup error was 2.7 mm and bootstrap estimated expansion of 2.9 mm. We conclude that the interfractional larynx setup error is a significant source of RT setup/delivery error in HNC, both when the larynx is considered as a CTV or OAR. We estimate the need for a uniform expansion of 5 mm to compensate for setup error if the larynx is a target, or 3 mm if the larynx is an OAR, when using a nonlaryngeal bony isocenter. PACS numbers: 87.55.D‐, 87.55.Qr
Point Set Denoising Using Bootstrap-Based Radial Basis Function.
Liew, Khang Jie; Ramli, Ahmad; Abd Majid, Ahmad
2016-01-01
This paper examines the application of a bootstrap test error estimation of radial basis functions, specifically thin-plate spline fitting, in surface smoothing. The presence of noisy data is a common issue of the point set model that is generated from 3D scanning devices, and hence, point set denoising is one of the main concerns in point set modelling. Bootstrap test error estimation, which is applied when searching for the smoothing parameters of radial basis functions, is revisited. The main contribution of this paper is a smoothing algorithm that relies on a bootstrap-based radial basis function. The proposed method incorporates a k-nearest neighbour search and then projects the point set to the approximated thin-plate spline surface. Therefore, the denoising process is achieved, and the features are well preserved. A comparison of the proposed method with other smoothing methods is also carried out in this study.
Four Bootstrap Confidence Intervals for the Binomial-Error Model.
ERIC Educational Resources Information Center
Lin, Miao-Hsiang; Hsiung, Chao A.
1992-01-01
Four bootstrap methods are identified for constructing confidence intervals for the binomial-error model. The extent to which similar results are obtained and the theoretical foundation of each method and its relevance and ranges of modeling the true score uncertainty are discussed. (SLD)
ERIC Educational Resources Information Center
Choi, Sae Il
2009-01-01
This study used simulation (a) to compare the kernel equating method to traditional equipercentile equating methods under the equivalent-groups (EG) design and the nonequivalent-groups with anchor test (NEAT) design and (b) to apply the parametric bootstrap method for estimating standard errors of equating. A two-parameter logistic item response…
Reference interval computation: which method (not) to choose?
Pavlov, Igor Y; Wilson, Andrew R; Delgado, Julio C
2012-07-11
When different methods are applied to reference interval (RI) calculation the results can sometimes be substantially different, especially for small reference groups. If there are no reliable RI data available, there is no way to confirm which method generates results closest to the true RI. We randomly drawn samples obtained from a public database for 33 markers. For each sample, RIs were calculated by bootstrapping, parametric, and Box-Cox transformed parametric methods. Results were compared to the values of the population RI. For approximately half of the 33 markers, results of all 3 methods were within 3% of the true reference value. For other markers, parametric results were either unavailable or deviated considerably from the true values. The transformed parametric method was more accurate than bootstrapping for sample size of 60, very close to bootstrapping for sample size 120, but in some cases unavailable. We recommend against using parametric calculations to determine RIs. The transformed parametric method utilizing Box-Cox transformation would be preferable way of RI calculation, if it satisfies normality test. If not, the bootstrapping is always available, and is almost as accurate and precise as the transformed parametric method. Copyright © 2012 Elsevier B.V. All rights reserved.
Confidence Interval Coverage for Cohen's Effect Size Statistic
ERIC Educational Resources Information Center
Algina, James; Keselman, H. J.; Penfield, Randall D.
2006-01-01
Kelley compared three methods for setting a confidence interval (CI) around Cohen's standardized mean difference statistic: the noncentral-"t"-based, percentile (PERC) bootstrap, and biased-corrected and accelerated (BCA) bootstrap methods under three conditions of nonnormality, eight cases of sample size, and six cases of population…
The conditional resampling model STARS: weaknesses of the modeling concept and development
NASA Astrophysics Data System (ADS)
Menz, Christoph
2016-04-01
The Statistical Analogue Resampling Scheme (STARS) is based on a modeling concept of Werner and Gerstengarbe (1997). The model uses a conditional resampling technique to create a simulation time series from daily observations. Unlike other time series generators (such as stochastic weather generators) STARS only needs a linear regression specification of a single variable as the target condition for the resampling. Since its first implementation the algorithm was further extended in order to allow for a spatially distributed trend signal, to preserve the seasonal cycle and the autocorrelation of the observation time series (Orlovsky, 2007; Orlovsky et al., 2008). This evolved version was successfully used in several climate impact studies. However a detaild evaluation of the simulations revealed two fundamental weaknesses of the utilized resampling technique. 1. The restriction of the resampling condition on a single individual variable can lead to a misinterpretation of the change signal of other variables when the model is applied to a mulvariate time series. (F. Wechsung and M. Wechsung, 2014). As one example, the short-term correlations between precipitation and temperature (cooling of the near-surface air layer after a rainfall event) can be misinterpreted as a climatic change signal in the simulation series. 2. The model restricts the linear regression specification to the annual mean time series, refusing the specification of seasonal varying trends. To overcome these fundamental weaknesses a redevelopment of the whole algorithm was done. The poster discusses the main weaknesses of the earlier model implementation and the methods applied to overcome these in the new version. Based on the new model idealized simulations were conducted to illustrate the enhancement.
ERIC Educational Resources Information Center
Larwin, Karen H.; Larwin, David A.
2011-01-01
Bootstrapping methods and random distribution methods are increasingly recommended as better approaches for teaching students about statistical inference in introductory-level statistics courses. The authors examined the effect of teaching undergraduate business statistics students using random distribution and bootstrapping simulations. It is the…
Bi, Xiwen; Yang, Hang; An, Xin; Wang, Fenghua; Jiang, Wenqi
2016-01-01
Background We propose a novel prognostic parameter for esophageal squamous cell carcinoma (ESCC)—hemoglobin/red cell distribution width (HB/RDW) ratio. Its clinical prognostic value and relationship with other clinicopathological characteristics were investigated in ESCC patients. Results The optimal cut-off value was 0.989 for the HB/RDW ratio. The HB/RDW ratio (P= 0.035), tumor depth (P = 0.020) and lymph node status (P<0.001) were identified to be an independent prognostic factors of OS by multivariate analysis, which was validated by bootstrap resampling. Patients with a low HB/RDW ratio had a 1.416 times greater risk of dying during follow-up compared with those with a high HB/RDW (95% CI = 1.024–1.958, P = 0.035). Materials and Methods We retrospectively analyzed 362 patients who underwent curative treatment at a single institution between January 2007 and December 2008. The chi-square test was used to evaluate relationships between the HB/RDW ratio and other clinicopathological variables; the Kaplan–Meier method was used to analyze the 5-year overall survival (OS); and the Cox proportional hazards models were used for univariate and multivariate analyses of variables related to OS. Conclusion A significant association was found between the HB/RDW ratio and clinical characteristics and survival outcomes in ESCC patients. Based on these findings, we believe that the HB/RDW ratio is a novel and promising prognostic parameter for ESCC patients. PMID:27223088
CTER-rapid estimation of CTF parameters with error assessment.
Penczek, Pawel A; Fang, Jia; Li, Xueming; Cheng, Yifan; Loerke, Justus; Spahn, Christian M T
2014-05-01
In structural electron microscopy, the accurate estimation of the Contrast Transfer Function (CTF) parameters, particularly defocus and astigmatism, is of utmost importance for both initial evaluation of micrograph quality and for subsequent structure determination. Due to increases in the rate of data collection on modern microscopes equipped with new generation cameras, it is also important that the CTF estimation can be done rapidly and with minimal user intervention. Finally, in order to minimize the necessity for manual screening of the micrographs by a user it is necessary to provide an assessment of the errors of fitted parameters values. In this work we introduce CTER, a CTF parameters estimation method distinguished by its computational efficiency. The efficiency of the method makes it suitable for high-throughput EM data collection, and enables the use of a statistical resampling technique, bootstrap, that yields standard deviations of estimated defocus and astigmatism amplitude and angle, thus facilitating the automation of the process of screening out inferior micrograph data. Furthermore, CTER also outputs the spatial frequency limit imposed by reciprocal space aliasing of the discrete form of the CTF and the finite window size. We demonstrate the efficiency and accuracy of CTER using a data set collected on a 300kV Tecnai Polara (FEI) using the K2 Summit DED camera in super-resolution counting mode. Using CTER we obtained a structure of the 80S ribosome whose large subunit had a resolution of 4.03Å without, and 3.85Å with, inclusion of astigmatism parameters. Copyright © 2014 Elsevier B.V. All rights reserved.
Exemplary Care as a Mediator of the Effects of Caregiver Subjective Appraisal and Emotional Outcomes
Harris, Grant M.; Durkin, Daniel W.; Allen, Rebecca S.; DeCoster, Jamie; Burgio, Louis D.
2011-01-01
Purpose: Exemplary care (EC) is a new construct encompassing care behaviors that warrants further study within stress process models of dementia caregiving. Previous research has examined EC within the context of cognitively intact older adult care recipients (CRs) and their caregivers (CGs). This study sought to expand our knowledge of quality of care by investigating EC within a diverse sample of dementia CGs. Design and Methods: We examined the relation between CG subjective appraisal (daily care bother, burden, and behavioral bother), EC, and CG emotional outcomes (depression and positive aspects of caregiving [PAC]). Specifically, EC was examined as a possible mediator of the effects of CG subjective appraisals on emotional outcomes. Using a bootstrapping method and an SPSS macro developed by Preacher and Hayes (2008 Asymptotic and resampling strategies for assessing and comparing indirect effects in multiple mediator models), we tested the indirect effect of EC on the relation between CG subjective appraisals and CG emotional outcomes. Results: Overall, EC partially mediates the relation between the subjective appraisal variables (daily care bother, burden, and behavioral bother) and PAC. Results for depression were similar except that EC did not mediate the relation between burden and depression. This pattern of results varied by race/ethnicity. Implications: Overall, CGs’ perception of providing EC to individuals with dementia partially explains the relation between subjective appraisal and symptoms of depression and PAC. Results of this study suggest that interventions may benefit from training CGs to engage in EC to improve their emotional outcomes and quality of care. PMID:21350038
Sturgeon, Jared D; Cox, John A; Mayo, Lauren L; Gunn, G Brandon; Zhang, Lifei; Balter, Peter A; Dong, Lei; Awan, Musaddiq; Kocak-Uzel, Esengul; Mohamed, Abdallah Sherif Radwan; Rosenthal, David I; Fuller, Clifton David
2015-10-01
Digitally reconstructed radiographs (DRRs) are routinely used as an a priori reference for setup correction in radiotherapy. The spatial resolution of DRRs may be improved to reduce setup error in fractionated radiotherapy treatment protocols. The influence of finer CT slice thickness reconstruction (STR) and resultant increased resolution DRRs on physician setup accuracy was prospectively evaluated. Four head and neck patient CT-simulation images were acquired and used to create DRR cohorts by varying STRs at 0.5, 1, 2, 2.5, and 3 mm. DRRs were displaced relative to a fixed isocenter using 0-5 mm random shifts in the three cardinal axes. Physician observers reviewed DRRs of varying STRs and displacements and then aligned reference and test DRRs replicating daily KV imaging workflow. A total of 1,064 images were reviewed by four blinded physicians. Observer errors were analyzed using nonparametric statistics (Friedman's test) to determine whether STR cohorts had detectably different displacement profiles. Post hoc bootstrap resampling was applied to evaluate potential generalizability. The observer-based trial revealed a statistically significant difference between cohort means for observer displacement vector error ([Formula: see text]) and for [Formula: see text]-axis [Formula: see text]. Bootstrap analysis suggests a 15% gain in isocenter translational setup error with reduction of STR from 3 mm to [Formula: see text]2 mm, though interobserver variance was a larger feature than STR-associated measurement variance. Higher resolution DRRs generated using finer CT scan STR resulted in improved observer performance at shift detection and could decrease operator-dependent geometric error. Ideally, CT STRs [Formula: see text]2 mm should be utilized for DRR generation in the head and neck.
Mohammadi, Mohammad Hossein; Molavi, Behnam; Mohammadi, Saeed; Nikbakht, Mohsen; Mohammadi, Ashraf Malek; Mostafaei, Shayan; Norooznezhad, Amir Hossein; Ghorbani Abdegah, Ali; Ghavamzadeh, Ardeshir
2017-04-01
The aim of the present study was to evaluate the effectiveness of using autologous platelet-rich plasma (PRP) gel for treatment of diabetic foot ulcer (DFU) during the first 4 weeks of the treatment. In this longitudinal and single-arm trial, 100 patients were randomly selected after meeting certain inclusion and exclusion criteria; of these 100 patients, 70 (70%) were enrolled in the trial. After the primary care actions such as wound debridement, the area of each wound was calculated and recorded. The PRP therapy (2mL/cm 2 of ulcers) was performed weekly until the healing time for each patient. We used one sample T-test for healing wounds and Bootstrap resampling approach for reporting confidence interval with 1000 Bootstrap samples. The p-value<0.05 were considered statistically significant. The mean (SD) of DFU duration was 19.71 weeks (4.94) for units sampling. The ratio of subjects who withdrew from the study was calculated to be 2 (2.8%). Average area of 71 ulcers in the mentioned number of cases was calculated to be 6.11cm 2 (SD: 4.37). Also, the mean, median (SD) of healing time was 8.7, 8 weeks (SD: 3.93) except for 2 mentioned cases. According to one sample T-test, wound area (cm 2 ), on average, significantly decreased to 51.9% (CI: 46.7-57.1) through the first four weeks of therapy. Furthermore, significant correlation (0.22) was not found between area of ulcers and healing duration (p-value>0.5). According to the results, PRP could be considered as a candidate treatment for non-healing DFUs as it may prevent future complications such as amputation or death in this pathological phenomenon. Copyright © 2016 Elsevier Ltd. All rights reserved.
Chu, Hui-May; Ette, Ene I
2005-09-02
his study was performed to develop a new nonparametric approach for the estimation of robust tissue-to-plasma ratio from extremely sparsely sampled paired data (ie, one sample each from plasma and tissue per subject). Tissue-to-plasma ratio was estimated from paired/unpaired experimental data using independent time points approach, area under the curve (AUC) values calculated with the naïve data averaging approach, and AUC values calculated using sampling based approaches (eg, the pseudoprofile-based bootstrap [PpbB] approach and the random sampling approach [our proposed approach]). The random sampling approach involves the use of a 2-phase algorithm. The convergence of the sampling/resampling approaches was investigated, as well as the robustness of the estimates produced by different approaches. To evaluate the latter, new data sets were generated by introducing outlier(s) into the real data set. One to 2 concentration values were inflated by 10% to 40% from their original values to produce the outliers. Tissue-to-plasma ratios computed using the independent time points approach varied between 0 and 50 across time points. The ratio obtained from AUC values acquired using the naive data averaging approach was not associated with any measure of uncertainty or variability. Calculating the ratio without regard to pairing yielded poorer estimates. The random sampling and pseudoprofile-based bootstrap approaches yielded tissue-to-plasma ratios with uncertainty and variability. However, the random sampling approach, because of the 2-phase nature of its algorithm, yielded more robust estimates and required fewer replications. Therefore, a 2-phase random sampling approach is proposed for the robust estimation of tissue-to-plasma ratio from extremely sparsely sampled data.
Rubio-Tapia, A; Malamut, G; Verbeek, W H M; van Wanrooij, R L J; Leffler, D A; Niveloni, S I; Arguelles-Grande, C; Lahr, B D; Zinsmeister, A R; Murray, J A; Kelly, C P; Bai, J C; Green, P H; Daum, S; Mulder, C J J; Cellier, C
2016-10-01
Refractory coeliac disease is a severe complication of coeliac disease with heterogeneous outcome. To create a prognostic model to estimate survival of patients with refractory coeliac disease. We evaluated predictors of 5-year mortality using Cox proportional hazards regression on subjects from a multinational registry. Bootstrap resampling was used to internally validate the individual factors and overall model performance. The mean of the estimated regression coefficients from 400 bootstrap models was used to derive a risk score for 5-year mortality. The multinational cohort was composed of 232 patients diagnosed with refractory coeliac disease across seven centres (range of 11-63 cases per centre). The median age was 53 years and 150 (64%) were women. A total of 51 subjects died during a 5-year follow-up (cumulative 5-year all-cause mortality = 30%). From a multiple variable Cox proportional hazards model, the following variables were significantly associated with 5-year mortality: age at refractory coeliac disease diagnosis (per 20 year increase, hazard ratio = 2.21; 95% confidence interval, CI: 1.38-3.55), abnormal intraepithelial lymphocytes (hazard ratio = 2.85; 95% CI: 1.22-6.62), and albumin (per 0.5 unit increase, hazard ratio = 0.72; 95% CI: 0.61-0.85). A simple weighted three-factor risk score was created to estimate 5-year survival. Using data from a multinational registry and previously reported risk factors, we create a prognostic model to predict 5-year mortality among patients with refractory coeliac disease. This new model may help clinicians to guide treatment and follow-up. © 2016 John Wiley & Sons Ltd.
A scale-invariant change detection method for land use/cover change research
NASA Astrophysics Data System (ADS)
Xing, Jin; Sieber, Renee; Caelli, Terrence
2018-07-01
Land Use/Cover Change (LUCC) detection relies increasingly on comparing remote sensing images with different spatial and spectral scales. Based on scale-invariant image analysis algorithms in computer vision, we propose a scale-invariant LUCC detection method to identify changes from scale heterogeneous images. This method is composed of an entropy-based spatial decomposition, two scale-invariant feature extraction methods, Maximally Stable Extremal Region (MSER) and Scale-Invariant Feature Transformation (SIFT) algorithms, a spatial regression voting method to integrate MSER and SIFT results, a Markov Random Field-based smoothing method, and a support vector machine classification method to assign LUCC labels. We test the scale invariance of our new method with a LUCC case study in Montreal, Canada, 2005-2012. We found that the scale-invariant LUCC detection method provides similar accuracy compared with the resampling-based approach but this method avoids the LUCC distortion incurred by resampling.
Elkomy, Mohammed H; Elmenshawe, Shahira F; Eid, Hussein M; Ali, Ahmed M A
2016-11-01
This work aimed at investigating the potential of solid lipid nanoparticles (SLN) as carriers for topical delivery of Ketoprofen (KP); evaluating a novel technique incorporating Artificial Neural Network (ANN) and clustered bootstrap for optimization of KP-loaded SLN (KP-SLN); and demonstrating a longitudinal dose response (LDR) modeling-based approach to compare the activity of topical non-steroidal anti-inflammatory drug formulations. KP-SLN was fabricated by a modified emulsion/solvent evaporation method. Box-Behnken design was implemented to study the influence of glycerylpalmitostearate-to-KP ratio, Tween 80, and lecithin concentrations on particle size, entrapment efficiency, and amount of drug permeated through rat skin in 24 hours. Following clustered bootstrap ANN optimization, the optimized KP-SLN was incorporated into an aqueous gel and evaluated for rheology, in vitro release, permeability, skin irritation and in vivo activity using carrageenan-induced rat paw edema model and LDR mathematical model to analyze the time course of anti-inflammatory effect at various application durations. Lipid-to-drug ratio of 7.85 [bootstrap 95%CI: 7.63-8.51], Tween 80 of 1.27% [bootstrap 95%CI: 0.601-2.40%], and Lecithin of 0.263% [bootstrap 95%CI: 0.263-0.328%] were predicted to produce optimal characteristics. Compared with profenid® gel, the optimized KP-SLN gel exhibited slower release, faster permeability, better texture properties, greater efficacy, and similar potency. SLNs are safe and effective permeation enhancers. ANN coupled with clustered bootstrap is a useful method for finding optimal solutions and estimating uncertainty associated with them. LDR models allow mechanistic understanding of comparative in vivo performances of different topical formulations, and help design efficient dermatological bioequivalence assessment methods.
NASA Astrophysics Data System (ADS)
Muller, Dirk; Monetti, Roberto A.; Bohm, Holger F.; Bauer, Jan; Rummeny, Ernst J.; Link, Thomas M.; Rath, Christoph W.
2004-05-01
The scaling index method (SIM) is a recently proposed non-linear technique to extract texture measures for the quantitative characterisation of the trabecular bone structure in high resolution magnetic resonance imaging (HR-MRI). The three-dimensional tomographic images are interpreted as a point distribution in a state space where each point (voxel) is defined by its x, y, z coordinates and the grey value. The SIM estimates local scaling properties to describe the nonlinear morphological features in this four-dimensional point distribution. Thus, it can be used for differentiating between cluster-, rod-, sheet-like and unstructured (background) image components, which makes it suitable for quantifying the microstructure of human cancellous bone. The SIM was applied to high resolution magnetic resonance images of the distal radius in patients with and without osteoporotic spine fractures in order to quantify the deterioration of bone structure. Using the receiver operator characteristic (ROC) analysis the diagnostic performance of this texture measure in differentiating patients with and without fractures was compared with bone mineral density (BMD). The SIM demonstrated the best area under the curve (AUC) value for discriminating the two groups. The reliability of our new texture measure and the validity of our results were assessed by applying bootstrapping resampling methods. The results of this study show that trabecular structure measures derived from HR-MRI of the radius in a clinical setting using a recently proposed algorithm based on a local 3D scaling index method can significantly improve the diagnostic performance in differentiating postmenopausal women with and without osteoporotic spine fractures.
Positive Traits Linked to Less Pain through Lower Pain Catastrophizing
Hood, Anna; Pulvers, Kim; Carrillo, Janet; Merchant, Gina; Thomas, Marie
2011-01-01
The present study examined the association between positive traits, pain catastrophizing, and pain perceptions. We hypothesized that pain catastrophizing would mediate the relationship between positive traits and pain. First, participants (n = 114) completed the Trait Hope Scale, the Life Orientation Test- Revised, and the Pain Catastrophizing Scale. Participants then completed the experimental pain stimulus, a cold pressor task, by submerging their hand in a circulating water bath (0º Celsius) for as long as tolerable. Immediately following the task, participants completed the Short-Form McGill Pain Questionnaire (MPQ-SF). Pearson correlation found associations between hope and pain catastrophizing (r = −.41, p < .01) and MPQ-SF scores (r = −.20, p < .05). Optimism was significantly associated with pain catastrophizing (r = −.44, p < .01) and MPQ-SF scores (r = −.19, p < .05). Bootstrapping, a non-parametric resampling procedure, tested for mediation and supported our hypothesis that pain catastrophizing mediated the relationship between positive traits and MPQ-SF pain report. To our knowledge, this investigation is the first to establish that the protective link between positive traits and experimental pain operates through lower pain catastrophizing. PMID:22199416
Webb, Jennifer B; Fiery, Mallory F; Jafari, Nadia
2016-09-01
The present investigation provided a theoretically-driven analysis testing whether body shame helped account for the predicted positive associations between explicit weight bias in the form of possessing anti-fat attitudes (i.e., dislike, fear of fat, and willpower beliefs) and engaging in fat talk among 309 weight-diverse college women. We also evaluated whether self-compassion served as a protective factor in these relationships. Robust non-parametric bootstrap resampling procedures adjusted for body mass index (BMI) revealed stronger indirect and conditional indirect effects for dislike and fear of fat attitudes and weaker, marginal effects for the models inclusive of willpower beliefs. In general, the indirect effect of anti-fat attitudes on fat talk via body shame declined with increasing levels of self-compassion. Our preliminary findings may point to useful process variables to target in mitigating the impact of endorsing anti-fat prejudice on fat talk in college women and may help clarify who is at higher risk. Copyright © 2016 Elsevier Ltd. All rights reserved.
Generating Virtual Patients by Multivariate and Discrete Re-Sampling Techniques.
Teutonico, D; Musuamba, F; Maas, H J; Facius, A; Yang, S; Danhof, M; Della Pasqua, O
2015-10-01
Clinical Trial Simulations (CTS) are a valuable tool for decision-making during drug development. However, to obtain realistic simulation scenarios, the patients included in the CTS must be representative of the target population. This is particularly important when covariate effects exist that may affect the outcome of a trial. The objective of our investigation was to evaluate and compare CTS results using re-sampling from a population pool and multivariate distributions to simulate patient covariates. COPD was selected as paradigm disease for the purposes of our analysis, FEV1 was used as response measure and the effects of a hypothetical intervention were evaluated in different populations in order to assess the predictive performance of the two methods. Our results show that the multivariate distribution method produces realistic covariate correlations, comparable to the real population. Moreover, it allows simulation of patient characteristics beyond the limits of inclusion and exclusion criteria in historical protocols. Both methods, discrete resampling and multivariate distribution generate realistic pools of virtual patients. However the use of a multivariate distribution enable more flexible simulation scenarios since it is not necessarily bound to the existing covariate combinations in the available clinical data sets.
NASA Astrophysics Data System (ADS)
Artemenko, M. V.; Chernetskaia, I. E.; Kalugina, N. M.; Shchekina, E. N.
2018-04-01
This article describes the solution of the actual problem of the productive formation of a cortege of informative measured features of the object of observation and / or control using author's algorithms for the use of bootstraps and counter-bootstraps technologies for processing the results of measurements of various states of the object on the basis of different volumes of the training sample. The work that is presented in this paper considers aggregation by specific indicators of informative capacity by linear, majority, logical and “greedy” methods, applied both individually and integrally. The results of the computational experiment are discussed, and in conclusion is drawn that the application of the proposed methods contributes to an increase in the efficiency of classification of the states of the object from the results of measurements.
A bootstrap based space-time surveillance model with an application to crime occurrences
NASA Astrophysics Data System (ADS)
Kim, Youngho; O'Kelly, Morton
2008-06-01
This study proposes a bootstrap-based space-time surveillance model. Designed to find emerging hotspots in near-real time, the bootstrap based model is characterized by its use of past occurrence information and bootstrap permutations. Many existing space-time surveillance methods, using population at risk data to generate expected values, have resulting hotspots bounded by administrative area units and are of limited use for near-real time applications because of the population data needed. However, this study generates expected values for local hotspots from past occurrences rather than population at risk. Also, bootstrap permutations of previous occurrences are used for significant tests. Consequently, the bootstrap-based model, without the requirement of population at risk data, (1) is free from administrative area restriction, (2) enables more frequent surveillance for continuously updated registry database, and (3) is readily applicable to criminology and epidemiology surveillance. The bootstrap-based model performs better for space-time surveillance than the space-time scan statistic. This is shown by means of simulations and an application to residential crime occurrences in Columbus, OH, year 2000.
Topics in Statistical Calibration
2014-03-27
on a parametric bootstrap where, instead of sampling directly from the residuals , samples are drawn from a normal distribution. This procedure will...addition to centering them (Davison and Hinkley, 1997). When there are outliers in the residuals , the bootstrap distribution of x̂0 can become skewed or...based and inversion methods using the linear mixed-effects model. Then, a simple parametric bootstrap algorithm is proposed that can be used to either
The new version of EPA’s positive matrix factorization (EPA PMF) software, 5.0, includes three error estimation (EE) methods for analyzing factor analytic solutions: classical bootstrap (BS), displacement of factor elements (DISP), and bootstrap enhanced by displacement (BS-DISP)...
Significance of the impact of motion compensation on the variability of PET image features
NASA Astrophysics Data System (ADS)
Carles, M.; Bach, T.; Torres-Espallardo, I.; Baltas, D.; Nestle, U.; Martí-Bonmatí, L.
2018-03-01
In lung cancer, quantification by positron emission tomography/computed tomography (PET/CT) imaging presents challenges due to respiratory movement. Our primary aim was to study the impact of motion compensation implied by retrospectively gated (4D)-PET/CT on the variability of PET quantitative parameters. Its significance was evaluated by comparison with the variability due to (i) the voxel size in image reconstruction and (ii) the voxel size in image post-resampling. The method employed for feature extraction was chosen based on the analysis of (i) the effect of discretization of the standardized uptake value (SUV) on complementarity between texture features (TF) and conventional indices, (ii) the impact of the segmentation method on the variability of image features, and (iii) the variability of image features across the time-frame of 4D-PET. Thirty-one PET-features were involved. Three SUV discretization methods were applied: a constant width (SUV resolution) of the resampling bin (method RW), a constant number of bins (method RN) and RN on the image obtained after histogram equalization (method EqRN). The segmentation approaches evaluated were 40% of SUVmax and the contrast oriented algorithm (COA). Parameters derived from 4D-PET images were compared with values derived from the PET image obtained for (i) the static protocol used in our clinical routine (3D) and (ii) the 3D image post-resampled to the voxel size of the 4D image and PET image derived after modifying the reconstruction of the 3D image to comprise the voxel size of the 4D image. Results showed that TF complementarity with conventional indices was sensitive to the SUV discretization method. In the comparison of COA and 40% contours, despite the values not being interchangeable, all image features showed strong linear correlations (r > 0.91, p\\ll 0.001 ). Across the time-frames of 4D-PET, all image features followed a normal distribution in most patients. For our patient cohort, the compensation of tumor motion did not have a significant impact on the quantitative PET parameters. The variability of PET parameters due to voxel size in image reconstruction was more significant than variability due to voxel size in image post-resampling. In conclusion, most of the parameters (apart from the contrast of neighborhood matrix) were robust to the motion compensation implied by 4D-PET/CT. The impact on parameter variability due to the voxel size in image reconstruction and in image post-resampling could not be assumed to be equivalent.
Cellular neural network-based hybrid approach toward automatic image registration
NASA Astrophysics Data System (ADS)
Arun, Pattathal VijayaKumar; Katiyar, Sunil Kumar
2013-01-01
Image registration is a key component of various image processing operations that involve the analysis of different image data sets. Automatic image registration domains have witnessed the application of many intelligent methodologies over the past decade; however, inability to properly model object shape as well as contextual information has limited the attainable accuracy. A framework for accurate feature shape modeling and adaptive resampling using advanced techniques such as vector machines, cellular neural network (CNN), scale invariant feature transform (SIFT), coreset, and cellular automata is proposed. CNN has been found to be effective in improving feature matching as well as resampling stages of registration and complexity of the approach has been considerably reduced using coreset optimization. The salient features of this work are cellular neural network approach-based SIFT feature point optimization, adaptive resampling, and intelligent object modelling. Developed methodology has been compared with contemporary methods using different statistical measures. Investigations over various satellite images revealed that considerable success was achieved with the approach. This system has dynamically used spectral and spatial information for representing contextual knowledge using CNN-prolog approach. This methodology is also illustrated to be effective in providing intelligent interpretation and adaptive resampling.
Bayesian network ensemble as a multivariate strategy to predict radiation pneumonitis risk
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Sangkyu, E-mail: sangkyu.lee@mail.mcgill.ca; Ybarra, Norma; Jeyaseelan, Krishinima
2015-05-15
Purpose: Prediction of radiation pneumonitis (RP) has been shown to be challenging due to the involvement of a variety of factors including dose–volume metrics and radiosensitivity biomarkers. Some of these factors are highly correlated and might affect prediction results when combined. Bayesian network (BN) provides a probabilistic framework to represent variable dependencies in a directed acyclic graph. The aim of this study is to integrate the BN framework and a systems’ biology approach to detect possible interactions among RP risk factors and exploit these relationships to enhance both the understanding and prediction of RP. Methods: The authors studied 54 nonsmall-cellmore » lung cancer patients who received curative 3D-conformal radiotherapy. Nineteen RP events were observed (common toxicity criteria for adverse events grade 2 or higher). Serum concentration of the following four candidate biomarkers were measured at baseline and midtreatment: alpha-2-macroglobulin, angiotensin converting enzyme (ACE), transforming growth factor, interleukin-6. Dose-volumetric and clinical parameters were also included as covariates. Feature selection was performed using a Markov blanket approach based on the Koller–Sahami filter. The Markov chain Monte Carlo technique estimated the posterior distribution of BN graphs built from the observed data of the selected variables and causality constraints. RP probability was estimated using a limited number of high posterior graphs (ensemble) and was averaged for the final RP estimate using Bayes’ rule. A resampling method based on bootstrapping was applied to model training and validation in order to control under- and overfit pitfalls. Results: RP prediction power of the BN ensemble approach reached its optimum at a size of 200. The optimized performance of the BN model recorded an area under the receiver operating characteristic curve (AUC) of 0.83, which was significantly higher than multivariate logistic regression (0.77), mean heart dose (0.69), and a pre-to-midtreatment change in ACE (0.66). When RP prediction was made only with pretreatment information, the AUC ranged from 0.76 to 0.81 depending on the ensemble size. Bootstrap validation of graph features in the ensemble quantified confidence of association between variables in the graphs where ten interactions were statistically significant. Conclusions: The presented BN methodology provides the flexibility to model hierarchical interactions between RP covariates, which is applied to probabilistic inference on RP. The authors’ preliminary results demonstrate that such framework combined with an ensemble method can possibly improve prediction of RP under real-life clinical circumstances such as missing data or treatment plan adaptation.« less
NASA Astrophysics Data System (ADS)
Meadors, Grant David; Krishnan, Badri; Papa, Maria Alessandra; Whelan, John T.; Zhang, Yuanhao
2018-02-01
Continuous-wave (CW) gravitational waves (GWs) call for computationally-intensive methods. Low signal-to-noise ratio signals need templated searches with long coherent integration times and thus fine parameter-space resolution. Longer integration increases sensitivity. Low-mass x-ray binaries (LMXBs) such as Scorpius X-1 (Sco X-1) may emit accretion-driven CWs at strains reachable by current ground-based observatories. Binary orbital parameters induce phase modulation. This paper describes how resampling corrects binary and detector motion, yielding source-frame time series used for cross-correlation. Compared to the previous, detector-frame, templated cross-correlation method, used for Sco X-1 on data from the first Advanced LIGO observing run (O1), resampling is about 20 × faster in the costliest, most-sensitive frequency bands. Speed-up factors depend on integration time and search setup. The speed could be reinvested into longer integration with a forecast sensitivity gain, 20 to 125 Hz median, of approximately 51%, or from 20 to 250 Hz, 11%, given the same per-band cost and setup. This paper's timing model enables future setup optimization. Resampling scales well with longer integration, and at 10 × unoptimized cost could reach respectively 2.83 × and 2.75 × median sensitivities, limited by spin-wandering. Then an O1 search could yield a marginalized-polarization upper limit reaching torque-balance at 100 Hz. Frequencies from 40 to 140 Hz might be probed in equal observing time with 2 × improved detectors.
Recommended GIS Analysis Methods for Global Gridded Population Data
NASA Astrophysics Data System (ADS)
Frye, C. E.; Sorichetta, A.; Rose, A.
2017-12-01
When using geographic information systems (GIS) to analyze gridded, i.e., raster, population data, analysts need a detailed understanding of several factors that affect raster data processing, and thus, the accuracy of the results. Global raster data is most often provided in an unprojected state, usually in the WGS 1984 geographic coordinate system. Most GIS functions and tools evaluate data based on overlay relationships (area) or proximity (distance). Area and distance for global raster data can be either calculated directly using the various earth ellipsoids or after transforming the data to equal-area/equidistant projected coordinate systems to analyze all locations equally. However, unlike when projecting vector data, not all projected coordinate systems can support such analyses equally, and the process of transforming raster data from one coordinate space to another often results unmanaged loss of data through a process called resampling. Resampling determines which values to use in the result dataset given an imperfect locational match in the input dataset(s). Cell size or resolution, registration, resampling method, statistical type, and whether the raster represents continuous or discreet information potentially influence the quality of the result. Gridded population data represent estimates of population in each raster cell, and this presentation will provide guidelines for accurately transforming population rasters for analysis in GIS. Resampling impacts the display of high resolution global gridded population data, and we will discuss how to properly handle pyramid creation using the Aggregate tool with the sum option to create overviews for mosaic datasets.
van Walraven, Carl
2017-04-01
Diagnostic codes used in administrative databases cause bias due to misclassification of patient disease status. It is unclear which methods minimize this bias. Serum creatinine measures were used to determine severe renal failure status in 50,074 hospitalized patients. The true prevalence of severe renal failure and its association with covariates were measured. These were compared to results for which renal failure status was determined using surrogate measures including the following: (1) diagnostic codes; (2) categorization of probability estimates of renal failure determined from a previously validated model; or (3) bootstrap methods imputation of disease status using model-derived probability estimates. Bias in estimates of severe renal failure prevalence and its association with covariates were minimal when bootstrap methods were used to impute renal failure status from model-based probability estimates. In contrast, biases were extensive when renal failure status was determined using codes or methods in which model-based condition probability was categorized. Bias due to misclassification from inaccurate diagnostic codes can be minimized using bootstrap methods to impute condition status using multivariable model-derived probability estimates. Copyright © 2017 Elsevier Inc. All rights reserved.
Nixon, Richard M; Wonderling, David; Grieve, Richard D
2010-03-01
Cost-effectiveness analyses (CEA) alongside randomised controlled trials commonly estimate incremental net benefits (INB), with 95% confidence intervals, and compute cost-effectiveness acceptability curves and confidence ellipses. Two alternative non-parametric methods for estimating INB are to apply the central limit theorem (CLT) or to use the non-parametric bootstrap method, although it is unclear which method is preferable. This paper describes the statistical rationale underlying each of these methods and illustrates their application with a trial-based CEA. It compares the sampling uncertainty from using either technique in a Monte Carlo simulation. The experiments are repeated varying the sample size and the skewness of costs in the population. The results showed that, even when data were highly skewed, both methods accurately estimated the true standard errors (SEs) when sample sizes were moderate to large (n>50), and also gave good estimates for small data sets with low skewness. However, when sample sizes were relatively small and the data highly skewed, using the CLT rather than the bootstrap led to slightly more accurate SEs. We conclude that while in general using either method is appropriate, the CLT is easier to implement, and provides SEs that are at least as accurate as the bootstrap. (c) 2009 John Wiley & Sons, Ltd.
Exploring the Replicability of a Study's Results: Bootstrap Statistics for the Multivariate Case.
ERIC Educational Resources Information Center
Thompson, Bruce
1995-01-01
Use of the bootstrap method in a canonical correlation analysis to evaluate the replicability of a study's results is illustrated. More confidence may be vested in research results that replicate. (SLD)
NASA Technical Reports Server (NTRS)
Benner, R.; Young, W.
1977-01-01
The results of an experimental study conducted to determine the geometric and radiometric effects of double resampling (bi-resampling) performed on image data in the process of performing map projection transformations are reported.
Multi-site Stochastic Simulation of Daily Streamflow with Markov Chain and KNN Algorithm
NASA Astrophysics Data System (ADS)
Mathai, J.; Mujumdar, P.
2017-12-01
A key focus of this study is to develop a method which is physically consistent with the hydrologic processes that can capture short-term characteristics of daily hydrograph as well as the correlation of streamflow in temporal and spatial domains. In complex water resource systems, flow fluctuations at small time intervals require that discretisation be done at small time scales such as daily scales. Also, simultaneous generation of synthetic flows at different sites in the same basin are required. We propose a method to equip water managers with a streamflow generator within a stochastic streamflow simulation framework. The motivation for the proposed method is to generate sequences that extend beyond the variability represented in the historical record of streamflow time series. The method has two steps: In step 1, daily flow is generated independently at each station by a two-state Markov chain, with rising limb increments randomly sampled from a Gamma distribution and the falling limb modelled as exponential recession and in step 2, the streamflow generated in step 1 is input to a nonparametric K-nearest neighbor (KNN) time series bootstrap resampler. The KNN model, being data driven, does not require assumptions on the dependence structure of the time series. A major limitation of KNN based streamflow generators is that they do not produce new values, but merely reshuffle the historical data to generate realistic streamflow sequences. However, daily flow generated using the Markov chain approach is capable of generating a rich variety of streamflow sequences. Furthermore, the rising and falling limbs of daily hydrograph represent different physical processes, and hence they need to be modelled individually. Thus, our method combines the strengths of the two approaches. We show the utility of the method and improvement over the traditional KNN by simulating daily streamflow sequences at 7 locations in the Godavari River basin in India.
Method for Pre-Conditioning a Measured Surface Height Map for Model Validation
NASA Technical Reports Server (NTRS)
Sidick, Erkin
2012-01-01
This software allows one to up-sample or down-sample a measured surface map for model validation, not only without introducing any re-sampling errors, but also eliminating the existing measurement noise and measurement errors. Because the re-sampling of a surface map is accomplished based on the analytical expressions of Zernike-polynomials and a power spectral density model, such re-sampling does not introduce any aliasing and interpolation errors as is done by the conventional interpolation and FFT-based (fast-Fourier-transform-based) spatial-filtering method. Also, this new method automatically eliminates the measurement noise and other measurement errors such as artificial discontinuity. The developmental cycle of an optical system, such as a space telescope, includes, but is not limited to, the following two steps: (1) deriving requirements or specs on the optical quality of individual optics before they are fabricated through optical modeling and simulations, and (2) validating the optical model using the measured surface height maps after all optics are fabricated. There are a number of computational issues related to model validation, one of which is the "pre-conditioning" or pre-processing of the measured surface maps before using them in a model validation software tool. This software addresses the following issues: (1) up- or down-sampling a measured surface map to match it with the gridded data format of a model validation tool, and (2) eliminating the surface measurement noise or measurement errors such that the resulted surface height map is continuous or smoothly-varying. So far, the preferred method used for re-sampling a surface map is two-dimensional interpolation. The main problem of this method is that the same pixel can take different values when the method of interpolation is changed among the different methods such as the "nearest," "linear," "cubic," and "spline" fitting in Matlab. The conventional, FFT-based spatial filtering method used to eliminate the surface measurement noise or measurement errors can also suffer from aliasing effects. During re-sampling of a surface map, this software preserves the low spatial-frequency characteristic of a given surface map through the use of Zernike-polynomial fit coefficients, and maintains mid- and high-spatial-frequency characteristics of the given surface map by the use of a PSD model derived from the two-dimensional PSD data of the mid- and high-spatial-frequency components of the original surface map. Because this new method creates the new surface map in the desired sampling format from analytical expressions only, it does not encounter any aliasing effects and does not cause any discontinuity in the resultant surface map.
Yen, Glorian P.; Davey, Adam
2015-01-01
Objective: Biorepositories have been key resources in examining genetically-linked diseases, particularly cancer. Asian Americans contribute to biorepositories at lower rates than other racial groups, but the reasons for this are unclear. We hypothesized that attitudes toward biospecimen research mediate the relationship between demographic and healthcare access factors, and willingness to donate blood for research purposes among individuals of Korean heritage. Methods: Descriptive statistics and bivariate analyses were utilized to characterize the sample with respect to demographic, psychosocial, and behavioral variables. Structural equation modeling with 5000 re-sample bootstrapping was used to assess each component of the proposed simple mediation models. Results: Attitudes towards biospecimen research fully mediate associations between age, income, number of years lived in the United States, and having a regular physician and willingness to donate blood for the purpose of research. Conclusion: Participants were willing to donate blood for the purpose of research despite having neutral feelings towards biospecimen research as a whole. Participants reported higher willingness to donate blood for research purposes when they were older, had lived in the United States longer, had higher income, and had a regular doctor that they visited. Many of the significant relationships between demographic and health care access factors, attitudes towards biospecimen research, and willingness to donate blood for the purpose of research may be explained by the extent of acculturation of the participants in the United States. PMID:25853387
Shape analysis of H II regions - I. Statistical clustering
NASA Astrophysics Data System (ADS)
Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred
2018-07-01
We present here our shape analysis method for a sample of 76 Galactic H II regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation are linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorize H II regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionized by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilizing synthetic observations from numerical simulations of H II regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
Chaffee, Benjamin W.; Feldens, Carlos Alberto; Vítolo, Márcia Regina
2014-01-01
Purpose Estimate the association between breastfeeding ≥24 months and severe early childhood caries (ECC). Methods Within a birth cohort (n=715) from low-income families in Porto Alegre, Brazil, the age 38-month prevalence of severe-ECC (≥4 affected tooth surfaces or ≥1 affected maxillary anterior teeth) was compared over breastfeeding duration categories using marginal structural models to account for time-dependent confounding by other feeding habits and child growth. Additional analyses assessed whether daily breastfeeding frequency modified the association of breastfeeding duration and severe-ECC. Multiple imputation and censoring weights were used to address incomplete covariate information and missing outcomes, respectively. Confidence intervals (CI) were estimated using bootstrap re-sampling. Results Breastfeeding ≥24 months was associated with the highest adjusted population-average severe-ECC prevalence (0.45, 95% CI: 0.36, 0.54) compared with breastfeeding <6 months (0.22, 95% CI: 0.15, 0.28), 6–11 months (0.38, 95% CI: 0.25, 0.53), or 12–23 months (0.39, 95% CI: 0.20, 0.56). High frequency breastfeeding enhanced the association between long-duration breastfeeding and caries (excess prevalence due to interaction: 0.13, 80% CI: −0.03, 0.30). Conclusions In this population, breastfeeding ≥24 months, particularly if frequent, was associated with severe-ECC. Dental health should be one consideration, among many, in evaluating health outcomes associated with breastfeeding ≥24 months. PMID:24636616
Shape Analysis of HII Regions - I. Statistical Clustering
NASA Astrophysics Data System (ADS)
Campbell-White, Justyn; Froebrich, Dirk; Kume, Alfred
2018-04-01
We present here our shape analysis method for a sample of 76 Galactic HII regions from MAGPIS 1.4 GHz data. The main goal is to determine whether physical properties and initial conditions of massive star cluster formation is linked to the shape of the regions. We outline a systematic procedure for extracting region shapes and perform hierarchical clustering on the shape data. We identified six groups that categorise HII regions by common morphologies. We confirmed the validity of these groupings by bootstrap re-sampling and the ordinance technique multidimensional scaling. We then investigated associations between physical parameters and the assigned groups. Location is mostly independent of group, with a small preference for regions of similar longitudes to share common morphologies. The shapes are homogeneously distributed across Galactocentric distance and latitude. One group contains regions that are all younger than 0.5 Myr and ionised by low- to intermediate-mass sources. Those in another group are all driven by intermediate- to high-mass sources. One group was distinctly separated from the other five and contained regions at the surface brightness detection limit for the survey. We find that our hierarchical procedure is most sensitive to the spatial sampling resolution used, which is determined for each region from its distance. We discuss how these errors can be further quantified and reduced in future work by utilising synthetic observations from numerical simulations of HII regions. We also outline how this shape analysis has further applications to other diffuse astronomical objects.
Estimation and correction of visibility bias in aerial surveys of wintering ducks
Pearse, A.T.; Gerard, P.D.; Dinsmore, S.J.; Kaminski, R.M.; Reinecke, K.J.
2008-01-01
Incomplete detection of all individuals leading to negative bias in abundance estimates is a pervasive source of error in aerial surveys of wildlife, and correcting that bias is a critical step in improving surveys. We conducted experiments using duck decoys as surrogates for live ducks to estimate bias associated with surveys of wintering ducks in Mississippi, USA. We found detection of decoy groups was related to wetland cover type (open vs. forested), group size (1?100 decoys), and interaction of these variables. Observers who detected decoy groups reported counts that averaged 78% of the decoys actually present, and this counting bias was not influenced by either covariate cited above. We integrated this sightability model into estimation procedures for our sample surveys with weight adjustments derived from probabilities of group detection (estimated by logistic regression) and count bias. To estimate variances of abundance estimates, we used bootstrap resampling of transects included in aerial surveys and data from the bias-correction experiment. When we implemented bias correction procedures on data from a field survey conducted in January 2004, we found bias-corrected estimates of abundance increased 36?42%, and associated standard errors increased 38?55%, depending on species or group estimated. We deemed our method successful for integrating correction of visibility bias in an existing sample survey design for wintering ducks in Mississippi, and we believe this procedure could be implemented in a variety of sampling problems for other locations and species.
NASA Astrophysics Data System (ADS)
Kang, Minho; Kim, Suam; Low, Loh-Lee
2016-03-01
Genetic stock identification studies have been widely applied to Pacific salmon species to estimate stock composition of complex mixed-stock fisheries. In a September-October 2004 survey, 739 chum salmon ( Oncorhynchus keta) specimens were collected from 23 stations in the western Bering Sea. We determined the genetic stock composition of immature chum salmon based on the previous mitochondria DNA baseline. Each regional estimate was computed based on the conditional maximum likelihood method using 1,000 bootstrap resampling and then pooled to the major regional groups: Korea - Japan - Primorie (KJP) / Russia (RU) / Northwest Alaska (NWA) / Alaska Peninsula - Southcentral Alaska - Southeast Alaska - British Columbia - Washington (ONA). The stock composition of immature chum salmon in the western Bering Sea was a mix of 0.424 KJP, 0.421 RU, 0.116 NWA, and 0.039 ONA stocks. During the study period, the contribution of Asian chum salmon stocks gradually changed from RU to KJP stock. In addition, North American populations from NWA and ONA were small but present near the vicinity of the Russian coast and the Commander Islands, suggesting that the study areas in the western Bering Sea were an important migration route for Pacific chum salmon originating both from Asia and North America during the months of September and October. These results make it possible to better understand the chum salmon stock composition of the mixed-stock fisheries in the western Bering Sea and the stock-specific distribution pattern of chum salmon on the high-seas.
Maeda, Jin; Suzuki, Tatsuya; Takayama, Kozo
2012-01-01
Design spaces for multiple dose strengths of tablets were constructed using a Bayesian estimation method with one set of design of experiments (DoE) of only the highest dose-strength tablet. The lubricant blending process for theophylline tablets with dose strengths of 100, 50, and 25 mg is used as a model manufacturing process in order to construct design spaces. The DoE was conducted using various Froude numbers (X(1)) and blending times (X(2)) for theophylline 100-mg tablet. The response surfaces, design space, and their reliability of the compression rate of the powder mixture (Y(1)), tablet hardness (Y(2)), and dissolution rate (Y(3)) of the 100-mg tablet were calculated using multivariate spline interpolation, a bootstrap resampling technique, and self-organizing map clustering. Three experiments under an optimal condition and two experiments under other conditions were performed using 50- and 25-mg tablets, respectively. The response surfaces of the highest-strength tablet were corrected to those of the lower-strength tablets by Bayesian estimation using the manufacturing data of the lower-strength tablets. Experiments under three additional sets of conditions of lower-strength tablets showed that the corrected design space made it possible to predict the quality of lower-strength tablets more precisely than the design space of the highest-strength tablet. This approach is useful for constructing design spaces of tablets with multiple strengths.
Tumor gene expression and prognosis in breast cancer patients with 10 or more positive lymph nodes.
Cobleigh, Melody A; Tabesh, Bita; Bitterman, Pincas; Baker, Joffre; Cronin, Maureen; Liu, Mei-Lan; Borchik, Russell; Mosquera, Juan-Miguel; Walker, Michael G; Shak, Steven
2005-12-15
This study, along with two others, was done to develop the 21-gene Recurrence Score assay (Oncotype DX) that was validated in a subsequent independent study and is used to aid decision making about chemotherapy in estrogen receptor (ER)-positive, node-negative breast cancer patients. Patients with >or=10 nodes diagnosed from 1979 to 1999 were identified. RNA was extracted from paraffin blocks, and expression of 203 candidate genes was quantified using reverse transcription-PCR (RT-PCR). Seventy-eight patients were studied. As of August 2002, 77% of patients had distant recurrence or breast cancer death. Univariate Cox analysis of clinical and immunohistochemistry variables indicated that HER2/immunohistochemistry, number of involved nodes, progesterone receptor (PR)/immunohistochemistry (% cells), and ER/immunohistochemistry (% cells) were significantly associated with distant recurrence-free survival (DRFS). Univariate Cox analysis identified 22 genes associated with DRFS. Higher expression correlated with shorter DRFS for the HER2 adaptor GRB7 and the macrophage marker CD68. Higher expression correlated with longer DRFS for tumor protein p53-binding protein 2 (TP53BP2) and the ER axis genes PR and Bcl2. Multivariate methods, including stepwise variable selection and bootstrap resampling of the Cox proportional hazards regression model, identified several genes, including TP53BP2 and Bcl2, as significant predictors of DRFS. Tumor gene expression profiles of archival tissues, some more than 20 years old, provide significant information about risk of distant recurrence even among patients with 10 or more nodes.
Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi
2015-07-01
Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Uncertainty Estimation using Bootstrapped Kriging Predictions for Precipitation Isoscapes
NASA Astrophysics Data System (ADS)
Ma, C.; Bowen, G. J.; Vander Zanden, H.; Wunder, M.
2017-12-01
Isoscapes are spatial models representing the distribution of stable isotope values across landscapes. Isoscapes of hydrogen and oxygen in precipitation are now widely used in a diversity of fields, including geology, biology, hydrology, and atmospheric science. To generate isoscapes, geostatistical methods are typically applied to extend predictions from limited data measurements. Kriging is a popular method in isoscape modeling, but quantifying the uncertainty associated with the resulting isoscapes is challenging. Applications that use precipitation isoscapes to determine sample origin require estimation of uncertainty. Here we present a simple bootstrap method (SBM) to estimate the mean and uncertainty of the krigged isoscape and compare these results with a generalized bootstrap method (GBM) applied in previous studies. We used hydrogen isotopic data from IsoMAP to explore these two approaches for estimating uncertainty. We conducted 10 simulations for each bootstrap method and found that SBM results in more kriging predictions (9/10) compared to GBM (4/10). Prediction from SBM was closer to the original prediction generated without bootstrapping and had less variance than GBM. SBM was tested on different datasets from IsoMAP with different numbers of observation sites. We determined that predictions from the datasets with fewer than 40 observation sites using SBM were more variable than the original prediction. The approaches we used for estimating uncertainty will be compiled in an R package that is under development. We expect that these robust estimates of precipitation isoscape uncertainty can be applied in diagnosing the origin of samples ranging from various type of waters to migratory animals, food products, and humans.
More About the Phase-Synchronized Enhancement Method
NASA Technical Reports Server (NTRS)
Jong, Jen-Yi
2004-01-01
A report presents further details regarding the subject matter of "Phase-Synchronized Enhancement Method for Engine Diagnostics" (MFS-26435), NASA Tech Briefs, Vol. 22, No. 1 (January 1998), page 54. To recapitulate: The phase-synchronized enhancement method (PSEM) involves the digital resampling of a quasi-periodic signal in synchronism with the instantaneous phase of one of its spectral components. This resampling transforms the quasi-periodic signal into a periodic one more amenable to analysis. It is particularly useful for diagnosis of a rotating machine through analysis of vibration spectra that include components at the fundamental and harmonics of a slightly fluctuating rotation frequency. The report discusses the machinery-signal-analysis problem, outlines the PSEM algorithms, presents the mathematical basis of the PSEM, and presents examples of application of the PSEM in some computational simulations.
BootGraph: probabilistic fiber tractography using bootstrap algorithms and graph theory.
Vorburger, Robert S; Reischauer, Carolin; Boesiger, Peter
2013-02-01
Bootstrap methods have recently been introduced to diffusion-weighted magnetic resonance imaging to estimate the measurement uncertainty of ensuing diffusion parameters directly from the acquired data without the necessity to assume a noise model. These methods have been previously combined with deterministic streamline tractography algorithms to allow for the assessment of connection probabilities in the human brain. Thereby, the local noise induced disturbance in the diffusion data is accumulated additively due to the incremental progression of streamline tractography algorithms. Graph based approaches have been proposed to overcome this drawback of streamline techniques. For this reason, the bootstrap method is in the present work incorporated into a graph setup to derive a new probabilistic fiber tractography method, called BootGraph. The acquired data set is thereby converted into a weighted, undirected graph by defining a vertex in each voxel and edges between adjacent vertices. By means of the cone of uncertainty, which is derived using the wild bootstrap, a weight is thereafter assigned to each edge. Two path finding algorithms are subsequently applied to derive connection probabilities. While the first algorithm is based on the shortest path approach, the second algorithm takes all existing paths between two vertices into consideration. Tracking results are compared to an established algorithm based on the bootstrap method in combination with streamline fiber tractography and to another graph based algorithm. The BootGraph shows a very good performance in crossing situations with respect to false negatives and permits incorporating additional constraints, such as a curvature threshold. By inheriting the advantages of the bootstrap method and graph theory, the BootGraph method provides a computationally efficient and flexible probabilistic tractography setup to compute connection probability maps and virtual fiber pathways without the drawbacks of streamline tractography algorithms or the assumption of a noise distribution. Moreover, the BootGraph can be applied to common DTI data sets without further modifications and shows a high repeatability. Thus, it is very well suited for longitudinal studies and meta-studies based on DTI. Copyright © 2012 Elsevier Inc. All rights reserved.
A Comparison of Single Sample and Bootstrap Methods to Assess Mediation in Cluster Randomized Trials
ERIC Educational Resources Information Center
Pituch, Keenan A.; Stapleton, Laura M.; Kang, Joo Youn
2006-01-01
A Monte Carlo study examined the statistical performance of single sample and bootstrap methods that can be used to test and form confidence interval estimates of indirect effects in two cluster randomized experimental designs. The designs were similar in that they featured random assignment of clusters to one of two treatment conditions and…
Multilingual Phoneme Models for Rapid Speech Processing System Development
2006-09-01
processes are used to develop an Arabic speech recognition system starting from monolingual English models, In- ternational Phonetic Association (IPA...clusters. It was found that multilingual bootstrapping methods out- perform monolingual English bootstrapping methods on the Arabic evaluation data initially...International Phonetic Alphabet . . . . . . . . . 7 2.3.2 Multilingual vs. Monolingual Speech Recognition 7 2.3.3 Data-Driven Approaches
NASA Astrophysics Data System (ADS)
Komachi, Mamoru; Kudo, Taku; Shimbo, Masashi; Matsumoto, Yuji
Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of Espresso-style bootstrapping has the same root as the topic drift of Kleinberg's HITS, using a simplified graph-based reformulation of bootstrapping. We confirm that two graph-based algorithms, the von Neumann kernels and the regularized Laplacian, can reduce the effect of semantic drift in the task of word sense disambiguation (WSD) on Senseval-3 English Lexical Sample Task. Proposed algorithms achieve superior performance to Espresso and previous graph-based WSD methods, even though the proposed algorithms have less parameters and are easy to calibrate.
NASA Astrophysics Data System (ADS)
Rossi, M.; Luciani, S.; Valigi, D.; Kirschbaum, D.; Brunetti, M. T.; Peruccacci, S.; Guzzetti, F.
2017-05-01
Models for forecasting rainfall-induced landslides are mostly based on the identification of empirical rainfall thresholds obtained exploiting rain gauge data. Despite their increased availability, satellite rainfall estimates are scarcely used for this purpose. Satellite data should be useful in ungauged and remote areas, or should provide a significant spatial and temporal reference in gauged areas. In this paper, the analysis of the reliability of rainfall thresholds based on rainfall remote sensed and rain gauge data for the prediction of landslide occurrence is carried out. To date, the estimation of the uncertainty associated with the empirical rainfall thresholds is mostly based on a bootstrap resampling of the rainfall duration and the cumulated event rainfall pairs (D,E) characterizing rainfall events responsible for past failures. This estimation does not consider the measurement uncertainty associated with D and E. In the paper, we propose (i) a new automated procedure to reconstruct ED conditions responsible for the landslide triggering and their uncertainties, and (ii) three new methods to identify rainfall threshold for the possible landslide occurrence, exploiting rain gauge and satellite data. In particular, the proposed methods are based on Least Square (LS), Quantile Regression (QR) and Nonlinear Least Square (NLS) statistical approaches. We applied the new procedure and methods to define empirical rainfall thresholds and their associated uncertainties in the Umbria region (central Italy) using both rain-gauge measurements and satellite estimates. We finally validated the thresholds and tested the effectiveness of the different threshold definition methods with independent landslide information. The NLS method among the others performed better in calculating thresholds in the full range of rainfall durations. We found that the thresholds obtained from satellite data are lower than those obtained from rain gauge measurements. This is in agreement with the literature, where satellite rainfall data underestimate the "ground" rainfall registered by rain gauges.
NASA Technical Reports Server (NTRS)
Rossi, M.; Luciani, S.; Valigi, D.; Kirschbaum, D.; Brunetti, M. T.; Peruccacci, S.; Guzzetti, F.
2017-01-01
Models for forecasting rainfall-induced landslides are mostly based on the identification of empirical rainfall thresholds obtained exploiting rain gauge data. Despite their increased availability, satellite rainfall estimates are scarcely used for this purpose. Satellite data should be useful in ungauged and remote areas, or should provide a significant spatial and temporal reference in gauged areas. In this paper, the analysis of the reliability of rainfall thresholds based on rainfall remote sensed and rain gauge data for the prediction of landslide occurrence is carried out. To date, the estimation of the uncertainty associated with the empirical rainfall thresholds is mostly based on a bootstrap resampling of the rainfall duration and the cumulated event rainfall pairs (D,E) characterizing rainfall events responsible for past failures. This estimation does not consider the measurement uncertainty associated with D and E. In the paper, we propose (i) a new automated procedure to reconstruct ED conditions responsible for the landslide triggering and their uncertainties, and (ii) three new methods to identify rainfall threshold for the possible landslide occurrence, exploiting rain gauge and satellite data. In particular, the proposed methods are based on Least Square (LS), Quantile Regression (QR) and Nonlinear Least Square (NLS) statistical approaches. We applied the new procedure and methods to define empirical rainfall thresholds and their associated uncertainties in the Umbria region (central Italy) using both rain-gauge measurements and satellite estimates. We finally validated the thresholds and tested the effectiveness of the different threshold definition methods with independent landslide information. The NLS method among the others performed better in calculating thresholds in the full range of rainfall durations. We found that the thresholds obtained from satellite data are lower than those obtained from rain gauge measurements. This is in agreement with the literature, where satellite rainfall data underestimate the 'ground' rainfall registered by rain gauges.
NASA Astrophysics Data System (ADS)
Adjorlolo, Clement; Cho, Moses A.; Mutanga, Onisimo; Ismail, Riyad
2012-01-01
Hyperspectral remote-sensing approaches are suitable for detection of the differences in 3-carbon (C3) and four carbon (C4) grass species phenology and composition. However, the application of hyperspectral sensors to vegetation has been hampered by high-dimensionality, spectral redundancy, and multicollinearity problems. In this experiment, resampling of hyperspectral data to wider wavelength intervals, around a few band-centers, sensitive to the biophysical and biochemical properties of C3 or C4 grass species is proposed. The approach accounts for an inherent property of vegetation spectral response: the asymmetrical nature of the inter-band correlations between a waveband and its shorter- and longer-wavelength neighbors. It involves constructing a curve of weighting threshold of correlation (Pearson's r) between a chosen band-center and its neighbors, as a function of wavelength. In addition, data were resampled to some multispectral sensors-ASTER, GeoEye-1, IKONOS, QuickBird, RapidEye, SPOT 5, and WorldView-2 satellites-for comparative purposes, with the proposed method. The resulting datasets were analyzed, using the random forest algorithm. The proposed resampling method achieved improved classification accuracy (κ=0.82), compared to the resampled multispectral datasets (κ=0.78, 0.65, 0.62, 0.59, 0.65, 0.62, 0.76, respectively). Overall, results from this study demonstrated that spectral resolutions for C3 and C4 grasses can be optimized and controlled for high dimensionality and multicollinearity problems, yet yielding high classification accuracies. The findings also provide a sound basis for programming wavebands for future sensors.
Adaptive topographic mass correction for satellite gravity and gravity gradient data
NASA Astrophysics Data System (ADS)
Holzrichter, Nils; Szwillus, Wolfgang; Götze, Hans-Jürgen
2014-05-01
Subsurface modelling with gravity data includes a reliable topographic mass correction. Since decades, this mandatory step is a standard procedure. However, originally methods were developed for local terrestrial surveys. Therefore, these methods often include defaults like a limited correction area of 167 km around an observation point, resampling topography depending on the distance to the station or disregard the curvature of the earth. New satellite gravity data (e.g. GOCE) can be used for large scale lithospheric modelling with gravity data. The investigation areas can include thousands of kilometres. In addition, measurements are located in the flight height of the satellite (e.g. ~250 km for GOCE). The standard definition of the correction area and the specific grid spacing around an observation point was not developed for stations located in these heights and areas of these dimensions. This asks for a revaluation of the defaults used for topographic correction. We developed an algorithm which resamples the topography based on an adaptive approach. Instead of resampling topography depending on the distance to the station, the grids will be resampled depending on its influence at the station. Therefore, the only value the user has to define is the desired accuracy of the topographic correction. It is not necessary to define the grid spacing and a limited correction area. Furthermore, the algorithm calculates the topographic mass response with a spherical shaped polyhedral body. We show examples for local and global gravity datasets and compare the results of the topographic mass correction to existing approaches. We provide suggestions how satellite gravity and gradient data should be corrected.
A novel fruit shape classification method based on multi-scale analysis
NASA Astrophysics Data System (ADS)
Gui, Jiangsheng; Ying, Yibin; Rao, Xiuqin
2005-11-01
Shape is one of the major concerns and which is still a difficult problem in automated inspection and sorting of fruits. In this research, we proposed the multi-scale energy distribution (MSED) for object shape description, the relationship between objects shape and its boundary energy distribution at multi-scale was explored for shape extraction. MSED offers not only the mainly energy which represent primary shape information at the lower scales, but also subordinate energy which represent local shape information at higher differential scales. Thus, it provides a natural tool for multi resolution representation and can be used as a feature for shape classification. We addressed the three main processing steps in the MSED-based shape classification. They are namely, 1) image preprocessing and citrus shape extraction, 2) shape resample and shape feature normalization, 3) energy decomposition by wavelet and classification by BP neural network. Hereinto, shape resample is resample 256 boundary pixel from a curve which is approximated original boundary by using cubic spline in order to get uniform raw data. A probability function was defined and an effective method to select a start point was given through maximal expectation, which overcame the inconvenience of traditional methods in order to have a property of rotation invariants. The experiment result is relatively well normal citrus and serious abnormality, with a classification rate superior to 91.2%. The global correct classification rate is 89.77%, and our method is more effective than traditional method. The global result can meet the request of fruit grading.
2013-01-01
Background Colorectal cancer is the third leading cause of cancer deaths in the United States. The initial assessment of colorectal cancer involves clinical staging that takes into account the extent of primary tumor invasion, determining the number of lymph nodes with metastatic cancer and the identification of metastatic sites in other organs. Advanced clinical stage indicates metastatic cancer, either in regional lymph nodes or in distant organs. While the genomic and genetic basis of colorectal cancer has been elucidated to some degree, less is known about the identity of specific cancer genes that are associated with advanced clinical stage and metastasis. Methods We compiled multiple genomic data types (mutations, copy number alterations, gene expression and methylation status) as well as clinical meta-data from The Cancer Genome Atlas (TCGA). We used an elastic-net regularized regression method on the combined genomic data to identify genetic aberrations and their associated cancer genes that are indicators of clinical stage. We ranked candidate genes by their regression coefficient and level of support from multiple assay modalities. Results A fit of the elastic-net regularized regression to 197 samples and integrated analysis of four genomic platforms identified the set of top gene predictors of advanced clinical stage, including: WRN, SYK, DDX5 and ADRA2C. These genetic features were identified robustly in bootstrap resampling analysis. Conclusions We conducted an analysis integrating multiple genomic features including mutations, copy number alterations, gene expression and methylation. This integrated approach in which one considers all of these genomic features performs better than any individual genomic assay. We identified multiple genes that robustly delineate advanced clinical stage, suggesting their possible role in colorectal cancer metastatic progression. PMID:24308539
Multi-temporal clustering of continental floods and associated atmospheric circulations
NASA Astrophysics Data System (ADS)
Liu, Jianyu; Zhang, Yongqiang
2017-12-01
Investigating clustering of floods has important social, economic and ecological implications. This study examines the clustering of Australian floods at different temporal scales and its possible physical mechanisms. Flood series with different severities are obtained by peaks-over-threshold (POT) sampling in four flood thresholds. At intra-annual scale, Cox regression and monthly frequency methods are used to examine whether and when the flood clustering exists, respectively. At inter-annual scale, dispersion indices with four-time variation windows are applied to investigate the inter-annual flood clustering and its variation. Furthermore, the Kernel occurrence rate estimate and bootstrap resampling methods are used to identify flood-rich/flood-poor periods. Finally, seasonal variation of horizontal wind at 850 hPa and vertical wind velocity at 500 hPa are used to investigate the possible mechanisms causing the temporal flood clustering. Our results show that: (1) flood occurrences exhibit clustering at intra-annual scale, which are regulated by climate indices representing the impacts of the Pacific and Indian Oceans; (2) the flood-rich months occur from January to March over northern Australia, and from July to September over southwestern and southeastern Australia; (3) stronger inter-annual clustering takes place across southern Australia than northern Australia; and (4) Australian floods are characterised by regional flood-rich and flood-poor periods, with 1987-1992 identified as the flood-rich period across southern Australia, but the flood-poor period across northern Australia, and 2001-2006 being the flood-poor period across most regions of Australia. The intra-annual and inter-annual clustering and temporal variation of flood occurrences are in accordance with the variation of atmospheric circulation. These results provide relevant information for flood management under the influence of climate variability, and, therefore, are helpful for developing flood hazard mitigation schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cella, Laura, E-mail: laura.cella@cnr.it; Department of Advanced Biomedical Sciences, Federico II University School of Medicine, Naples; Liuzzi, Raffaele
Purpose: To establish a multivariate normal tissue complication probability (NTCP) model for radiation-induced asymptomatic heart valvular defects (RVD). Methods and Materials: Fifty-six patients treated with sequential chemoradiation therapy for Hodgkin lymphoma (HL) were retrospectively reviewed for RVD events. Clinical information along with whole heart, cardiac chambers, and lung dose distribution parameters was collected, and the correlations to RVD were analyzed by means of Spearman's rank correlation coefficient (Rs). For the selection of the model order and parameters for NTCP modeling, a multivariate logistic regression method using resampling techniques (bootstrapping) was applied. Model performance was evaluated using the area under themore » receiver operating characteristic curve (AUC). Results: When we analyzed the whole heart, a 3-variable NTCP model including the maximum dose, whole heart volume, and lung volume was shown to be the optimal predictive model for RVD (Rs = 0.573, P<.001, AUC = 0.83). When we analyzed the cardiac chambers individually, for the left atrium and for the left ventricle, an NTCP model based on 3 variables including the percentage volume exceeding 30 Gy (V30), cardiac chamber volume, and lung volume was selected as the most predictive model (Rs = 0.539, P<.001, AUC = 0.83; and Rs = 0.557, P<.001, AUC = 0.82, respectively). The NTCP values increase as heart maximum dose or cardiac chambers V30 increase. They also increase with larger volumes of the heart or cardiac chambers and decrease when lung volume is larger. Conclusions: We propose logistic NTCP models for RVD considering not only heart irradiation dose but also the combined effects of lung and heart volumes. Our study establishes the statistical evidence of the indirect effect of lung size on radio-induced heart toxicity.« less
The effects of velocities and lensing on moments of the Hubble diagram
NASA Astrophysics Data System (ADS)
Macaulay, E.; Davis, T. M.; Scovacricchi, D.; Bacon, D.; Collett, T.; Nichol, R. C.
2017-05-01
We consider the dispersion on the supernova distance-redshift relation due to peculiar velocities and gravitational lensing, and the sensitivity of these effects to the amplitude of the matter power spectrum. We use the Method-of-the-Moments (MeMo) lensing likelihood developed by Quartin et al., which accounts for the characteristic non-Gaussian distribution caused by lensing magnification with measurements of the first four central moments of the distribution of magnitudes. We build on the MeMo likelihood by including the effects of peculiar velocities directly into the model for the moments. In order to measure the moments from sparse numbers of supernovae, we take a new approach using Kernel density estimation to estimate the underlying probability density function of the magnitude residuals. We also describe a bootstrap re-sampling approach to estimate the data covariance matrix. We then apply the method to the joint light-curve analysis (JLA) supernova catalogue. When we impose only that the intrinsic dispersion in magnitudes is independent of redshift, we find σ _8=0.44^{+0.63}_{-0.44} at the one standard deviation level, although we note that in tests on simulations, this model tends to overestimate the magnitude of the intrinsic dispersion, and underestimate σ8. We note that the degeneracy between intrinsic dispersion and the effects of σ8 is more pronounced when lensing and velocity effects are considered simultaneously, due to a cancellation of redshift dependence when both effects are included. Keeping the model of the intrinsic dispersion fixed as a Gaussian distribution of width 0.14 mag, we find σ _8 = 1.07^{+0.50}_{-0.76}.
Assessing operating characteristics of CAD algorithms in the absence of a gold standard
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy Choudhury, Kingshuk; Paik, David S.; Yi, Chin A.
2010-04-15
Purpose: The authors examine potential bias when using a reference reader panel as ''gold standard'' for estimating operating characteristics of CAD algorithms for detecting lesions. As an alternative, the authors propose latent class analysis (LCA), which does not require an external gold standard to evaluate diagnostic accuracy. Methods: A binomial model for multiple reader detections using different diagnostic protocols was constructed, assuming conditional independence of readings given true lesion status. Operating characteristics of all protocols were estimated by maximum likelihood LCA. Reader panel and LCA based estimates were compared using data simulated from the binomial model for a range ofmore » operating characteristics. LCA was applied to 36 thin section thoracic computed tomography data sets from the Lung Image Database Consortium (LIDC): Free search markings of four radiologists were compared to markings from four different CAD assisted radiologists. For real data, bootstrap-based resampling methods, which accommodate dependence in reader detections, are proposed to test of hypotheses of differences between detection protocols. Results: In simulation studies, reader panel based sensitivity estimates had an average relative bias (ARB) of -23% to -27%, significantly higher (p-value <0.0001) than LCA (ARB -2% to -6%). Specificity was well estimated by both reader panel (ARB -0.6% to -0.5%) and LCA (ARB 1.4%-0.5%). Among 1145 lesion candidates LIDC considered, LCA estimated sensitivity of reference readers (55%) was significantly lower (p-value 0.006) than CAD assisted readers' (68%). Average false positives per patient for reference readers (0.95) was not significantly lower (p-value 0.28) than CAD assisted readers' (1.27). Conclusions: Whereas a gold standard based on a consensus of readers may substantially bias sensitivity estimates, LCA may be a significantly more accurate and consistent means for evaluating diagnostic accuracy.« less
ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS
Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.
2017-01-01
The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112
Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.
Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B
2017-09-20
The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Nonparametric Regression and the Parametric Bootstrap for Local Dependence Assessment.
ERIC Educational Resources Information Center
Habing, Brian
2001-01-01
Discusses ideas underlying nonparametric regression and the parametric bootstrap with an overview of their application to item response theory and the assessment of local dependence. Illustrates the use of the method in assessing local dependence that varies with examinee trait levels. (SLD)
Closure of the operator product expansion in the non-unitary bootstrap
DOE Office of Scientific and Technical Information (OSTI.GOV)
Esterlis, Ilya; Fitzpatrick, A. Liam; Ramirez, David M.
We use the numerical conformal bootstrap in two dimensions to search for finite, closed sub-algebras of the operator product expansion (OPE), without assuming unitarity. We find the minimal models as special cases, as well as additional lines of solutions that can be understood in the Coulomb gas formalism. All the solutions we find that contain the vacuum in the operator algebra are cases where the external operators of the bootstrap equation are degenerate operators, and we argue that this follows analytically from the expressions in arXiv:1202.4698 for the crossing matrices of Virasoro conformal blocks. Our numerical analysis is a specialmore » case of the “Gliozzi” bootstrap method, and provides a simpler setting in which to study technical challenges with the method. In the supplementary material, we provide a Mathematica notebook that automates the calculation of the crossing matrices and OPE coefficients for degenerate operators using the formulae of Dotsenko and Fateev.« less
Closure of the operator product expansion in the non-unitary bootstrap
Esterlis, Ilya; Fitzpatrick, A. Liam; Ramirez, David M.
2016-11-07
We use the numerical conformal bootstrap in two dimensions to search for finite, closed sub-algebras of the operator product expansion (OPE), without assuming unitarity. We find the minimal models as special cases, as well as additional lines of solutions that can be understood in the Coulomb gas formalism. All the solutions we find that contain the vacuum in the operator algebra are cases where the external operators of the bootstrap equation are degenerate operators, and we argue that this follows analytically from the expressions in arXiv:1202.4698 for the crossing matrices of Virasoro conformal blocks. Our numerical analysis is a specialmore » case of the “Gliozzi” bootstrap method, and provides a simpler setting in which to study technical challenges with the method. In the supplementary material, we provide a Mathematica notebook that automates the calculation of the crossing matrices and OPE coefficients for degenerate operators using the formulae of Dotsenko and Fateev.« less
NASA Astrophysics Data System (ADS)
Lu, Siliang; Wang, Xiaoxian; He, Qingbo; Liu, Fang; Liu, Yongbin
2016-12-01
Transient signal analysis (TSA) has been proven an effective tool for motor bearing fault diagnosis, but has yet to be applied in processing bearing fault signals with variable rotating speed. In this study, a new TSA-based angular resampling (TSAAR) method is proposed for fault diagnosis under speed fluctuation condition via sound signal analysis. By applying the TSAAR method, the frequency smearing phenomenon is eliminated and the fault characteristic frequency is exposed in the envelope spectrum for bearing fault recognition. The TSAAR method can accurately estimate the phase information of the fault-induced impulses using neither complicated time-frequency analysis techniques nor external speed sensors, and hence it provides a simple, flexible, and data-driven approach that realizes variable-speed motor bearing fault diagnosis. The effectiveness and efficiency of the proposed TSAAR method are verified through a series of simulated and experimental case studies.
Testing variance components by two jackknife methods
USDA-ARS?s Scientific Manuscript database
The jacknife method, a resampling technique, has been widely used for statistical tests for years. The pseudo value based jacknife method (defined as pseudo jackknife method) is commonly used to reduce the bias for an estimate; however, sometimes it could result in large variaion for an estmimate a...
NASA Astrophysics Data System (ADS)
Baisden, W. T.; Prior, C.; Lambie, S.; Tate, K.; Bruhn, F.; Parfitt, R.; Schipper, L.; Wilde, R. H.; Ross, C.
2006-12-01
Soil organic matter contains more C than terrestrial biomass and atmospheric CO2 combined, and reacts to climate and land-use change on timescales requiring long-term experiments or monitoring. The direction and uncertainty of soil C stock changes has been difficult to predict and incorporate in decision support tools for climate change policies. Moreover, standardization of approaches has been difficult because historic methods of soil sampling have varied regionally, nationally and temporally. The most common and uniform type of historic sampling is soil profiles, which have commonly been collected, described and archived in the course of both soil survey studies and research. Resampling soil profiles has considerable utility in carbon monitoring and in parameterizing models to understand the ecosystem responses to global change. Recent work spanning seven soil orders in New Zealand's grazed pastures has shown that, averaged over approximately 20 years, 31 soil profiles lost 106 g C m-2 y-1 (p=0.01) and 9.1 g N m{^-2} y-1 (p=0.002). These losses are unexpected and appear to extend well below the upper 30 cm of soil. Following on these recent results, additional advantages of resampling soil profiles can be emphasized. One of the most powerful applications afforded by resampling archived soils is the use of the pulse label of radiocarbon injected into the atmosphere by thermonuclear weapons testing circa 1963 as a tracer of soil carbon dynamics. This approach allows estimation of the proportion of soil C that is `passive' or `inert' and therefore unlikely to respond to global change. Evaluation of resampled soil horizons in a New Zealand soil chronosequence confirms that the approach yields consistent values for the proportion of `passive' soil C, reaching 25% of surface horizon soil C over 12,000 years. Across whole profiles, radiocarbon data suggest that the proportion of `passive' C in New Zealand grassland soil can be less than 40% of total soil C. Below 30 cm, 1 kg C m-2 or more may be reactive on decadal timescales, supporting evidence of soil C losses from throughout the soil profiles. Information from resampled soil profiles can be combined with additional contemporary measurements to test hypotheses about mechanisms for soil C changes. For example, Δ14C in excess of 200‰ in water extractable dissolved organic C (DOC) from surface soil horizons supports the hypothesis that decadal movement of DOC represents an important translocation of soil C. These preliminary results demonstrate that resampling whole soil profiles can support substantial progress in C cycle science, ranging from updating operational C accounting systems to the frontiers of research. Resampling can be complementary or superior to fixed-depth interval sampling of surface soil layers. Resampling must however be undertaken with relative urgency to maximize the potential interpretive power of bomb-derived radiocarbon.
System for monitoring non-coincident, nonstationary process signals
Gross, Kenneth C.; Wegerich, Stephan W.
2005-01-04
An improved system for monitoring non-coincident, non-stationary, process signals. The mean, variance, and length of a reference signal is defined by an automated system, followed by the identification of the leading and falling edges of a monitored signal and the length of the monitored signal. The monitored signal is compared to the reference signal, and the monitored signal is resampled in accordance with the reference signal. The reference signal is then correlated with the resampled monitored signal such that the reference signal and the resampled monitored signal are coincident in time with each other. The resampled monitored signal is then compared to the reference signal to determine whether the resampled monitored signal is within a set of predesignated operating conditions.
A Bootstrap Procedure of Propensity Score Estimation
ERIC Educational Resources Information Center
Bai, Haiyan
2013-01-01
Propensity score estimation plays a fundamental role in propensity score matching for reducing group selection bias in observational data. To increase the accuracy of propensity score estimation, the author developed a bootstrap propensity score. The commonly used propensity score matching methods: nearest neighbor matching, caliper matching, and…
Communication Optimizations for a Wireless Distributed Prognostic Framework
NASA Technical Reports Server (NTRS)
Saha, Sankalita; Saha, Bhaskar; Goebel, Kai
2009-01-01
Distributed architecture for prognostics is an essential step in prognostic research in order to enable feasible real-time system health management. Communication overhead is an important design problem for such systems. In this paper we focus on communication issues faced in the distributed implementation of an important class of algorithms for prognostics - particle filters. In spite of being computation and memory intensive, particle filters lend well to distributed implementation except for one significant step - resampling. We propose new resampling scheme called parameterized resampling that attempts to reduce communication between collaborating nodes in a distributed wireless sensor network. Analysis and comparison with relevant resampling schemes is also presented. A battery health management system is used as a target application. A new resampling scheme for distributed implementation of particle filters has been discussed in this paper. Analysis and comparison of this new scheme with existing resampling schemes in the context for minimizing communication overhead have also been discussed. Our proposed new resampling scheme performs significantly better compared to other schemes by attempting to reduce both the communication message length as well as number total communication messages exchanged while not compromising prediction accuracy and precision. Future work will explore the effects of the new resampling scheme in the overall computational performance of the whole system as well as full implementation of the new schemes on the Sun SPOT devices. Exploring different network architectures for efficient communication is an importance future research direction as well.
A multistate dynamic site occupancy model for spatially aggregated sessile communities
Fukaya, Keiichi; Royle, J. Andrew; Okuda, Takehiro; Nakaoka, Masahiro; Noda, Takashi
2017-01-01
Estimation of transition probabilities of sessile communities seems easy in principle but may still be difficult in practice because resampling error (i.e. a failure to resample exactly the same location at fixed points) may cause significant estimation bias. Previous studies have developed novel analytical methods to correct for this estimation bias. However, they did not consider the local structure of community composition induced by the aggregated distribution of organisms that is typically observed in sessile assemblages and is very likely to affect observations.We developed a multistate dynamic site occupancy model to estimate transition probabilities that accounts for resampling errors associated with local community structure. The model applies a nonparametric multivariate kernel smoothing methodology to the latent occupancy component to estimate the local state composition near each observation point, which is assumed to determine the probability distribution of data conditional on the occurrence of resampling error.By using computer simulations, we confirmed that an observation process that depends on local community structure may bias inferences about transition probabilities. By applying the proposed model to a real data set of intertidal sessile communities, we also showed that estimates of transition probabilities and of the properties of community dynamics may differ considerably when spatial dependence is taken into account.Results suggest the importance of accounting for resampling error and local community structure for developing management plans that are based on Markovian models. Our approach provides a solution to this problem that is applicable to broad sessile communities. It can even accommodate an anisotropic spatial correlation of species composition, and may also serve as a basis for inferring complex nonlinear ecological dynamics.
NASA Astrophysics Data System (ADS)
Mortuza, M. R.; Demissie, Y. K.
2015-12-01
In lieu with the recent and anticipated more server and frequently droughts incidences in Yakima River Basin (YRB), a reliable and comprehensive drought assessment is deemed necessary to avoid major crop production loss and better manage the water right issues in the region during low precipitation and/or snow accumulation years. In this study, we have conducted frequency analysis of hydrological droughts and quantified associated uncertainty in the YRB under both historical and changing climate. Streamflow drought index (SDI) was employed to identify mutually correlated drought characteristics (e.g., severity, duration and peak). The historical and future characteristics of drought were estimated by applying tri-variate copulas probability distribution, which effectively describe the joint distribution and dependence of drought severity, duration, and peak. The associated prediction uncertainty, related to parameters of the joint probability and climate projections, were evaluated using the Bayesian approach with bootstrap resampling. For the climate change scenarios, two future representative pathways (RCP4.5 and RCP8.5) from University of Idaho's Multivariate Adaptive Constructed Analogs (MACA) database were considered. The results from the study are expected to provide useful information towards drought risk management in YRB under anticipated climate changes.
POLYMAT-C: a comprehensive SPSS program for computing the polychoric correlation matrix.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2015-09-01
We provide a free noncommercial SPSS program that implements procedures for (a) obtaining the polychoric correlation matrix between a set of ordered categorical measures, so that it can be used as input for the SPSS factor analysis (FA) program; (b) testing the null hypothesis of zero population correlation for each element of the matrix by using appropriate simulation procedures; (c) obtaining valid and accurate confidence intervals via bootstrap resampling for those correlations found to be significant; and (d) performing, if necessary, a smoothing procedure that makes the matrix amenable to any FA estimation procedure. For the main purpose (a), the program uses a robust unified procedure that allows four different types of estimates to be obtained at the user's choice. Overall, we hope the program will be a very useful tool for the applied researcher, not only because it provides an appropriate input matrix for FA, but also because it allows the researcher to carefully check the appropriateness of the matrix for this purpose. The SPSS syntax, a short manual, and data files related to this article are available as Supplemental materials that are available for download with this article.
Tourscape: A systematic approach towards a sustainable rural tourism management
NASA Astrophysics Data System (ADS)
Lo, M. C.; Wang, Y. C.; Songan, P.; Yeo, A. W.
2014-02-01
Tourism plays an important role in the Malaysian economy as it is considered to be one of the corner stones of the country's economy. The purpose of this research is to conduct an analysis based on the existing tourism industry in rural tourism destinations in Malaysia by examining the impact of economics, environmental, social and cultural factors of the tourism industry on the local communities in Malaysia. 516 respondents comprising of tourism stakeholders from 34 rural tourism sites in Malaysia took part voluntarily in this study. To assess the developed model, SmartPLS 2.0 (M3) was applied based on path modeling and then bootstrapping with 200 re-samples was applied to generate the standard error of the estimate and t-values. Subsequently, a system named Tourscape was designed to manage the information. This system can be considered as a benchmark for tourism industry stakeholders as it is able to display the current situational analysis and the tourism health of selected tourism destination sites by capturing data and information, not only from local communities but industry players and tourists as well. The findings from this study revealed that the cooperation from various stakeholders has created significant impact on the development of rural tourism.
Racial disparities in stage-specific colorectal cancer mortality: 1960-2005.
Soneji, Samir; Iyer, Shally Shalini; Armstrong, Katrina; Asch, David A
2010-10-01
We examined whether racial disparities in stage-specific colorectal cancer survival changed between 1960 and 2005. We used US Mortality Multiple-Cause-of-Death Data Files and intercensal estimates to calculate standardized mortality rates by gender and race from 1960 to 2005. We used Surveillance, Epidemiology, and End Results (SEER) data to estimate stage-specific colorectal cancer survival. To account for SEER sampling uncertainty, we used a bootstrap resampling procedure and fit a Cox proportional hazards model. Between 1960-2005, patterns of decline in mortality rate as a result of colorectal cancer differed greatly by gender and race: 54% reduction for White women, 14% reduction for Black women, 39% reduction for White men, and 28% increase for Black men. Blacks consistently experienced worse rates of stage-specific survival and life expectancy than did Whites for both genders, across all age groups, and for localized, regional, and distant stages of the disease. The rates of stage-specific colorectal cancer survival differed among Blacks when compared with Whites during the 4-decade study period. Differences in stage-specific life expectancy were the result of differences in access to care or quality of care. More attention should be given to racial disparities in colorectal cancer management.
Sample size determination for mediation analysis of longitudinal data.
Pan, Haitao; Liu, Suyu; Miao, Danmin; Yuan, Ying
2018-03-27
Sample size planning for longitudinal data is crucial when designing mediation studies because sufficient statistical power is not only required in grant applications and peer-reviewed publications, but is essential to reliable research results. However, sample size determination is not straightforward for mediation analysis of longitudinal design. To facilitate planning the sample size for longitudinal mediation studies with a multilevel mediation model, this article provides the sample size required to achieve 80% power by simulations under various sizes of the mediation effect, within-subject correlations and numbers of repeated measures. The sample size calculation is based on three commonly used mediation tests: Sobel's method, distribution of product method and the bootstrap method. Among the three methods of testing the mediation effects, Sobel's method required the largest sample size to achieve 80% power. Bootstrapping and the distribution of the product method performed similarly and were more powerful than Sobel's method, as reflected by the relatively smaller sample sizes. For all three methods, the sample size required to achieve 80% power depended on the value of the ICC (i.e., within-subject correlation). A larger value of ICC typically required a larger sample size to achieve 80% power. Simulation results also illustrated the advantage of the longitudinal study design. The sample size tables for most encountered scenarios in practice have also been published for convenient use. Extensive simulations study showed that the distribution of the product method and bootstrapping method have superior performance to the Sobel's method, but the product method was recommended to use in practice in terms of less computation time load compared to the bootstrapping method. A R package has been developed for the product method of sample size determination in mediation longitudinal study design.
NASA Astrophysics Data System (ADS)
Wang, Jinliang; Wu, Xuejiao
2010-11-01
Geometric correction of imagery is a basic application of remote sensing technology. Its precision will impact directly on the accuracy and reliability of applications. The accuracy of geometric correction depends on many factors, including the used model for correction and the accuracy of the reference map, the number of ground control points (GCP) and its spatial distribution, resampling methods. The ETM+ image of Kunming Dianchi Lake Basin and 1:50000 geographical maps had been used to compare different correction methods. The results showed that: (1) The correction errors were more than one pixel and some of them were several pixels when the polynomial model was used. The correction accuracy was not stable when the Delaunay model was used. The correction errors were less than one pixel when the collinearity equation was used. (2) 6, 9, 25 and 35 GCP were selected randomly for geometric correction using the polynomial correction model respectively, the best result was obtained when 25 GCPs were used. (3) The contrast ratio of image corrected by using nearest neighbor and the best resampling rate was compared to that of using the cubic convolution and bilinear model. But the continuity of pixel gravy value was not very good. The contrast of image corrected was the worst and the computation time was the longest by using the cubic convolution method. According to the above results, the result was the best by using bilinear to resample.
Briganti, Alberto; Karnes, R Jeffrey; Joniau, Steven; Boorjian, Stephen A; Cozzarini, Cesare; Gandaglia, Giorgio; Hinkelbein, Wolfgang; Haustermans, Karin; Tombal, Bertrand; Shariat, Shahrokh; Sun, Maxine; Karakiewicz, Pierre I; Montorsi, Francesco; Van Poppel, Hein; Wiegel, Thomas
2014-09-01
Early salvage radiotherapy (eSRT) represents the only curative option for prostate cancer patients experiencing biochemical recurrence (BCR) for local recurrence after radical prostatectomy (RP). To develop and internally validate a novel nomogram predicting BCR after eSRT in patients treated with RP. Using a multi-institutional cohort, 472 node-negative patients who experienced BCR after RP were identified. All patients received eSRT, defined as local radiation to the prostate and seminal vesicle bed, delivered at prostate-specific antigen (PSA) ≤ 0.5 ng/ml. BCR after eSRT was defined as two consecutive PSA values ≥ 0.2 ng/ml. Uni- and multivariable Cox regression models predicting BCR after eSRT were fitted. Regression-based coefficients were used to develop a nomogram predicting the risk of 5-yr BCR after eSRT. The discrimination of the nomogram was quantified with the Harrell concordance index and the calibration plot method. Two hundred bootstrap resamples were used for internal validation. Mean follow-up was 58 mo (median: 48 mo). Overall, 5-yr BCR-free survival rate after eSRT was 73.4%. In univariable analyses, pathologic stage, Gleason score, and positive surgical margins were associated with the risk of BCR after eSRT (all p ≤ 0.04). These results were confirmed in multivariable analysis, where all the previously mentioned covariates as well as pre-RT PSA were significantly associated with BCR after eSRT (all p ≤ 0.04). A coefficient-based nomogram demonstrated a bootstrap-corrected discrimination of 0.74. Our study is limited by its retrospective nature and use of BCR as an end point. eSRT leads to excellent cancer control in patients with BCR for presumed local failure after RP. We developed the first nomogram to predict outcome after eSRT. Our model facilitates risk stratification and patient counselling regarding the use of secondary therapy for individuals experiencing BCR after RP. Salvage radiotherapy leads to optimal cancer control in patients who experience recurrence after radical prostatectomy. We developed a novel tool to identify the best candidates for salvage treatment and to allow selection of patients to be considered for additional forms of therapy. Copyright © 2013 European Association of Urology. Published by Elsevier B.V. All rights reserved.
Methods to achieve accurate projection of regional and global raster databases
Usery, E.L.; Seong, J.C.; Steinwand, D.R.; Finn, M.P.
2002-01-01
This research aims at building a decision support system (DSS) for selecting an optimum projection considering various factors, such as pixel size, areal extent, number of categories, spatial pattern of categories, resampling methods, and error correction methods. Specifically, this research will investigate three goals theoretically and empirically and, using the already developed empirical base of knowledge with these results, develop an expert system for map projection of raster data for regional and global database modeling. The three theoretical goals are as follows: (1) The development of a dynamic projection that adjusts projection formulas for latitude on the basis of raster cell size to maintain equal-sized cells. (2) The investigation of the relationships between the raster representation and the distortion of features, number of categories, and spatial pattern. (3) The development of an error correction and resampling procedure that is based on error analysis of raster projection.
Speckle reduction in digital holography with resampling ring masks
NASA Astrophysics Data System (ADS)
Zhang, Wenhui; Cao, Liangcai; Jin, Guofan
2018-01-01
One-shot digital holographic imaging has the advantages of high stability and low temporal cost. However, the reconstruction is affected by the speckle noise. Resampling ring-mask method in spectrum domain is proposed for speckle reduction. The useful spectrum of one hologram is divided into several sub-spectra by ring masks. In the reconstruction, angular spectrum transform is applied to guarantee the calculation accuracy which has no approximation. N reconstructed amplitude images are calculated from the corresponding sub-spectra. Thanks to speckle's random distribution, superimposing these N uncorrelated amplitude images would lead to a final reconstructed image with lower speckle noise. Normalized relative standard deviation values of the reconstructed image are used to evaluate the reduction of speckle. Effect of the method on the spatial resolution of the reconstructed image is also quantitatively evaluated. Experimental and simulation results prove the feasibility and effectiveness of the proposed method.
Introducing Statistical Inference to Biology Students through Bootstrapping and Randomization
ERIC Educational Resources Information Center
Lock, Robin H.; Lock, Patti Frazer
2008-01-01
Bootstrap methods and randomization tests are increasingly being used as alternatives to standard statistical procedures in biology. They also serve as an effective introduction to the key ideas of statistical inference in introductory courses for biology students. We discuss the use of such simulation based procedures in an integrated curriculum…
Quantitative body DW-MRI biomarkers uncertainty estimation using unscented wild-bootstrap.
Freiman, M; Voss, S D; Mulkern, R V; Perez-Rossello, J M; Warfield, S K
2011-01-01
We present a new method for the uncertainty estimation of diffusion parameters for quantitative body DW-MRI assessment. Diffusion parameters uncertainty estimation from DW-MRI is necessary for clinical applications that use these parameters to assess pathology. However, uncertainty estimation using traditional techniques requires repeated acquisitions, which is undesirable in routine clinical use. Model-based bootstrap techniques, for example, assume an underlying linear model for residuals rescaling and cannot be utilized directly for body diffusion parameters uncertainty estimation due to the non-linearity of the body diffusion model. To offset this limitation, our method uses the Unscented transform to compute the residuals rescaling parameters from the non-linear body diffusion model, and then applies the wild-bootstrap method to infer the body diffusion parameters uncertainty. Validation through phantom and human subject experiments shows that our method identify the regions with higher uncertainty in body DWI-MRI model parameters correctly with realtive error of -36% in the uncertainty values.
Almeida, Mariana R; Fidelis, Carlos H V; Barata, Lauro E S; Poppi, Ronei J
2013-12-15
The Amazon tree Aniba rosaeodora Ducke (rosewood) provides an essential oil valuable for the perfume industry, but after decades of predatory extraction it is at risk of extinction. The extraction of the essential oil from wood implies the cutting of the tree, and then the study of oil extracted from the leaves is important as a sustainable alternative. The goal of this study was to test the applicability of Raman spectroscopy and Partial Least Square Discriminant Analysis (PLS-DA) as means to classify the essential oil extracted from different parties (wood, leaves and branches) of the Brazilian tree A. rosaeodora. For the development of classification models, the Raman spectra were split into two sets: training and test. The value of the limit that separates the classes was calculated based on the distribution of samples of training. This value was calculated in a manner that the classes are divided with a lower probability of incorrect classification for future estimates. The best model presented sensitivity and specificity of 100%, predictive accuracy and efficiency of 100%. These results give an overall vision of the behavior of the model, but do not give information about individual samples; in this case, the confidence interval for each sample of classification was also calculated using the resampling bootstrap technique. The methodology developed have the potential to be an alternative for standard procedures used for oil analysis and it can be employed as screening method, since it is fast, non-destructive and robust. © 2013 Elsevier B.V. All rights reserved.
Sakamoto, Yukiyo; Yamauchi, Yasuhiro; Yasunaga, Hideo; Takeshima, Hideyuki; Hasegawa, Wakae; Jo, Taisuke; Sasabuchi, Yusuke; Matsui, Hiroki; Fushimi, Kiyohide; Nagase, Takahide
2017-01-01
Patients with chronic obstructive pulmonary disease (COPD) often experience exacerbations of their disease, sometimes requiring hospital admission and being associated with increased mortality. Although previous studies have reported mortality from exacerbations of COPD, there is limited information about prediction of individual in-hospital mortality. We therefore aimed to use data from a nationwide inpatient database in Japan to generate a nomogram for predicting in-hospital mortality from patients' characteristics on admission. We retrospectively collected data on patients with COPD who had been admitted for exacerbations and been discharged between July 1, 2010 and March 31, 2013. We performed multivariable logistic regression analysis to examine factors associated with in-hospital mortality and thereafter used these factors to develop a nomogram for predicting in-hospital prognosis. The study comprised 3,064 eligible patients. In-hospital death occurred in 209 patients (6.8%). Higher mortality was associated with older age, being male, lower body mass index, disturbance of consciousness, severe dyspnea, history of mechanical ventilation, pneumonia, and having no asthma on admission. We developed a nomogram based on these variables to predict in-hospital mortality. The concordance index of the nomogram was 0.775. Internal validation was performed by a bootstrap method with 50 resamples, and calibration plots were found to be well fitted to predict in-hospital mortality. We developed a nomogram for predicting in-hospital mortality of exacerbations of COPD. This nomogram could help clinicians to predict risk of in-hospital mortality in individual patients with COPD exacerbation.
An improved Rosetta pedotransfer function and evaluation in earth system models
NASA Astrophysics Data System (ADS)
Zhang, Y.; Schaap, M. G.
2017-12-01
Soil hydraulic parameters are often difficult and expensive to measure, leading to the pedotransfer functions (PTFs) an alternative to predict those parameters. Rosetta (Schaap et al., 2001, denoted as Rosetta1) are widely used PTFs, which is based on artificial neural network (ANN) analysis coupled with the bootstrap re-sampling method, allowing the estimation of van Genuchten water retention parameters (van Genuchten, 1980, abbreviated here as VG), saturated hydraulic conductivity (Ks), as well as their uncertainties. We present an improved hierarchical pedotransfer functions (Rosetta3) that unify the VG water retention and Ks submodels into one, thus allowing the estimation of uni-variate and bi-variate probability distributions of estimated parameters. Results show that the estimation bias of moisture content was reduced significantly. Rosetta1 and Posetta3 were implemented in the python programming language, and the source code are available online. Based on different soil water retention equations, there are diverse PTFs used in different disciplines of earth system modelings. PTFs based on Campbell [1974] or Clapp and Hornberger [1978] are frequently used in land surface models and general circulation models, while van Genuchten [1980] based PTFs are more widely used in hydrology and soil sciences. We use an independent global scale soil database to evaluate the performance of diverse PTFs used in different disciplines of earth system modelings. PTFs are evaluated based on different soil characteristics and environmental characteristics, such as soil textural data, soil organic carbon, soil pH, as well as precipitation and soil temperature. This analysis provides more quantitative estimation error information for PTF predictions in different disciplines of earth system modelings.
2013-01-01
Background Whereas the prognosis of second kidney transplant recipients (STR) compared to the first ones has been frequently analyzed, no study has addressed the issue of comparing the risk factor effects on graft failure between both groups. Methods Here, we propose two alternative strategies to study the heterogeneity of risk factors between two groups of patients: (i) a multiplicative-regression model for relative survival (MRS) and (ii) a stratified Cox model (SCM) specifying the graft rank as strata and assuming subvectors of the explicatives variables. These developments were motivated by the analysis of factors associated with time to graft failure (return-to-dialysis or patient death) in second kidney transplant recipients (STR) compared to the first ones. Estimation of the parameters was based on partial likelihood maximization. Monte-Carlo simulations associated with bootstrap re-sampling was performed to calculate the standard deviations for the MRS. Results We demonstrate, for the first time in renal transplantation, that: (i) male donor gender is a specific risk factor for STR, (ii) the adverse effect of recipient age is enhanced for STR and (iii) the graft failure risk related to donor age is attenuated for STR. Conclusion While the traditional Cox model did not provide original results based on the renal transplantation literature, the proposed relative and stratified models revealed new findings that are useful for clinicians. These methodologies may be of interest in other medical fields when the principal objective is the comparison of risk factors between two populations. PMID:23915191
Development of a prediction model for residual disease in newly diagnosed advanced ovarian cancer.
Janco, Jo Marie Tran; Glaser, Gretchen; Kim, Bohyun; McGree, Michaela E; Weaver, Amy L; Cliby, William A; Dowdy, Sean C; Bakkum-Gamez, Jamie N
2015-07-01
To construct a tool, using computed tomography (CT) imaging and preoperative clinical variables, to estimate successful primary cytoreduction for advanced epithelial ovarian cancer (EOC). Women who underwent primary cytoreductive surgery for stage IIIC/IV EOC at Mayo Clinic between 1/2/2003 and 12/30/2011 and had preoperative CT images of the abdomen and pelvis within 90days prior to their surgery available for review were included. CT images were reviewed for large-volume ascites, diffuse peritoneal thickening (DPT), omental cake, lymphadenopathy (LP), and spleen or liver involvement. Preoperative factors included age, body mass index (BMI), Eastern Cooperative Oncology Group performance status (ECOG PS), American Society of Anesthesiologists (ASA) score, albumin, CA-125, and thrombocytosis. Two prediction models were developed to estimate the probability of (i) complete and (ii) suboptimal cytoreduction (residual disease (RD) >1cm) using multivariable logistic analysis with backward and stepwise variable selection methods. Internal validation was assessed using bootstrap resampling to derive an optimism-corrected estimate of the c-index. 279 patients met inclusion criteria: 143 had complete cytoreduction, 26 had suboptimal cytoreduction (RD>1cm), and 110 had measurable RD ≤1cm. On multivariable analysis, age, absence of ascites, omental cake, and DPT on CT imaging independently predicted complete cytoreduction (c-index=0.748). Conversely, predictors of suboptimal cytoreduction were ECOG PS, DPT, and LP on preoperative CT imaging (c-index=0.685). The generated models serve as preoperative evaluation tools that may improve counseling and selection for primary surgery, but need to be externally validated. Copyright © 2015 Elsevier Inc. All rights reserved.
Economic Evaluation of Manitoba Health Lines in the Management of Congestive Heart Failure
Cui, Yang; Doupe, Malcolm; Katz, Alan; Nyhof, Paul; Forget, Evelyn L.
2013-01-01
Objective: This one-year study investigated whether the Manitoba Provincial Health Contact program for congestive heart failure (CHF) is a cost-effective intervention relative to the standard treatment. Design: Individual patient-level, randomized clinical trial of cost-effective model using data from the Health Research Data Repository at the Manitoba Centre for Health Policy, University of Manitoba. Methods: A total of 179 patients aged 40 and over with a diagnosis of CHF levels II to IV were recruited from Winnipeg and Central Manitoba and randomized into three treatment groups: one receiving standard care, a second receiving Health Lines (HL) intervention and a third receiving Health Lines intervention plus in-house monitoring (HLM). A cost-effectiveness study was conducted in which outcomes were measured in terms of QALYs derived from the SF-36 and costs using 2005 Canadian dollars. Costs included intervention and healthcare utilization. Bootstrap-resampled incremental cost-effectiveness ratios were computed to take into account the uncertainty related to small sample size. Results: The total per-patient mean costs (including intervention cost) were not significantly different between study groups. Both interventions (HL and HLM) cost less and are more effective than standard care, with HL able to produce an additional QALY relative to HLM for $2,975. The sensitivity analysis revealed that there is an 85.8% probability that HL is cost-effective if decision-makers are willing to pay $50,000. Conclusion: Findings demonstrate that the HL intervention from the Manitoba Provincial Health Contact program for CHF is an optimal intervention strategy for CHF management compared to standard care and HLM. PMID:24359716
Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils.
Lawrence, Gregory B; Fernandez, Ivan J; Hazlett, Paul W; Bailey, Scott W; Ross, Donald S; Villars, Thomas R; Quintana, Angelica; Ouimet, Rock; McHale, Michael R; Johnson, Chris E; Briggs, Russell D; Colter, Robert A; Siemion, Jason; Bartlett, Olivia L; Vargas, Olga; Antidormi, Michael R; Koppers, Mary M
2016-11-25
Recent soils research has shown that important chemical soil characteristics can change in less than a decade, often the result of broad environmental changes. Repeated sampling to monitor these changes in forest soils is a relatively new practice that is not well documented in the literature and has only recently been broadly embraced by the scientific community. The objective of this protocol is therefore to synthesize the latest information on methods of soil resampling in a format that can be used to design and implement a soil monitoring program. Successful monitoring of forest soils requires that a study unit be defined within an area of forested land that can be characterized with replicate sampling locations. A resampling interval of 5 years is recommended, but if monitoring is done to evaluate a specific environmental driver, the rate of change expected in that driver should be taken into consideration. Here, we show that the sampling of the profile can be done by horizon where boundaries can be clearly identified and horizons are sufficiently thick to remove soil without contamination from horizons above or below. Otherwise, sampling can be done by depth interval. Archiving of sample for future reanalysis is a key step in avoiding analytical bias and providing the opportunity for additional analyses as new questions arise.
Methods of Soil Resampling to Monitor Changes in the Chemical Concentrations of Forest Soils
Lawrence, Gregory B.; Fernandez, Ivan J.; Hazlett, Paul W.; Bailey, Scott W.; Ross, Donald S.; Villars, Thomas R.; Quintana, Angelica; Ouimet, Rock; McHale, Michael R.; Johnson, Chris E.; Briggs, Russell D.; Colter, Robert A.; Siemion, Jason; Bartlett, Olivia L.; Vargas, Olga; Antidormi, Michael R.; Koppers, Mary M.
2016-01-01
Recent soils research has shown that important chemical soil characteristics can change in less than a decade, often the result of broad environmental changes. Repeated sampling to monitor these changes in forest soils is a relatively new practice that is not well documented in the literature and has only recently been broadly embraced by the scientific community. The objective of this protocol is therefore to synthesize the latest information on methods of soil resampling in a format that can be used to design and implement a soil monitoring program. Successful monitoring of forest soils requires that a study unit be defined within an area of forested land that can be characterized with replicate sampling locations. A resampling interval of 5 years is recommended, but if monitoring is done to evaluate a specific environmental driver, the rate of change expected in that driver should be taken into consideration. Here, we show that the sampling of the profile can be done by horizon where boundaries can be clearly identified and horizons are sufficiently thick to remove soil without contamination from horizons above or below. Otherwise, sampling can be done by depth interval. Archiving of sample for future reanalysis is a key step in avoiding analytical bias and providing the opportunity for additional analyses as new questions arise. PMID:27911419
Methods of soil resampling to monitor changes in the chemical concentrations of forest soils
Lawrence, Gregory B.; Fernandez, Ivan J.; Hazlett, Paul W.; Bailey, Scott W.; Ross, Donald S.; Villars, Thomas R.; Quintana, Angelica; Ouimet, Rock; McHale, Michael; Johnson, Chris E.; Briggs, Russell D.; Colter, Robert A.; Siemion, Jason; Bartlett, Olivia L.; Vargas, Olga; Antidormi, Michael; Koppers, Mary Margaret
2016-01-01
Recent soils research has shown that important chemical soil characteristics can change in less than a decade, often the result of broad environmental changes. Repeated sampling to monitor these changes in forest soils is a relatively new practice that is not well documented in the literature and has only recently been broadly embraced by the scientific community. The objective of this protocol is therefore to synthesize the latest information on methods of soil resampling in a format that can be used to design and implement a soil monitoring program. Successful monitoring of forest soils requires that a study unit be defined within an area of forested land that can be characterized with replicate sampling locations. A resampling interval of 5 years is recommended, but if monitoring is done to evaluate a specific environmental driver, the rate of change expected in that driver should be taken into consideration. Here, we show that the sampling of the profile can be done by horizon where boundaries can be clearly identified and horizons are sufficiently thick to remove soil without contamination from horizons above or below. Otherwise, sampling can be done by depth interval. Archiving of sample for future reanalysis is a key step in avoiding analytical bias and providing the opportunity for additional analyses as new questions arise.
An add-in implementation of the RESAMPLING syntax under Microsoft EXCEL.
Meineke, I
2000-10-01
The RESAMPLING syntax defines a set of powerful commands, which allow the programming of probabilistic statistical models with few, easily memorized statements. This paper presents an implementation of the RESAMPLING syntax using Microsoft EXCEL with Microsoft WINDOWS(R) as a platform. Two examples are given to demonstrate typical applications of RESAMPLING in biomedicine. Details of the implementation with special emphasis on the programming environment are discussed at length. The add-in is available electronically to interested readers upon request. The use of the add-in facilitates numerical statistical analyses of data from within EXCEL in a comfortable way.
CME Velocity and Acceleration Error Estimates Using the Bootstrap Method
NASA Technical Reports Server (NTRS)
Michalek, Grzegorz; Gopalswamy, Nat; Yashiro, Seiji
2017-01-01
The bootstrap method is used to determine errors of basic attributes of coronal mass ejections (CMEs) visually identified in images obtained by the Solar and Heliospheric Observatory (SOHO) mission's Large Angle and Spectrometric Coronagraph (LASCO) instruments. The basic parameters of CMEs are stored, among others, in a database known as the SOHO/LASCO CME catalog and are widely employed for many research studies. The basic attributes of CMEs (e.g. velocity and acceleration) are obtained from manually generated height-time plots. The subjective nature of manual measurements introduces random errors that are difficult to quantify. In many studies the impact of such measurement errors is overlooked. In this study we present a new possibility to estimate measurements errors in the basic attributes of CMEs. This approach is a computer-intensive method because it requires repeating the original data analysis procedure several times using replicate datasets. This is also commonly called the bootstrap method in the literature. We show that the bootstrap approach can be used to estimate the errors of the basic attributes of CMEs having moderately large numbers of height-time measurements. The velocity errors are in the vast majority small and depend mostly on the number of height-time points measured for a particular event. In the case of acceleration, the errors are significant, and for more than half of all CMEs, they are larger than the acceleration itself.
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Comparison of mode estimation methods and application in molecular clock analysis
NASA Technical Reports Server (NTRS)
Hedges, S. Blair; Shah, Prachi
2003-01-01
BACKGROUND: Distributions of time estimates in molecular clock studies are sometimes skewed or contain outliers. In those cases, the mode is a better estimator of the overall time of divergence than the mean or median. However, different methods are available for estimating the mode. We compared these methods in simulations to determine their strengths and weaknesses and further assessed their performance when applied to real data sets from a molecular clock study. RESULTS: We found that the half-range mode and robust parametric mode methods have a lower bias than other mode methods under a diversity of conditions. However, the half-range mode suffers from a relatively high variance and the robust parametric mode is more susceptible to bias by outliers. We determined that bootstrapping reduces the variance of both mode estimators. Application of the different methods to real data sets yielded results that were concordant with the simulations. CONCLUSION: Because the half-range mode is a simple and fast method, and produced less bias overall in our simulations, we recommend the bootstrapped version of it as a general-purpose mode estimator and suggest a bootstrap method for obtaining the standard error and 95% confidence interval of the mode.
Frequency domain analysis of errors in cross-correlations of ambient seismic noise
NASA Astrophysics Data System (ADS)
Liu, Xin; Ben-Zion, Yehuda; Zigone, Dimitri
2016-12-01
We analyse random errors (variances) in cross-correlations of ambient seismic noise in the frequency domain, which differ from previous time domain methods. Extending previous theoretical results on ensemble averaged cross-spectrum, we estimate confidence interval of stacked cross-spectrum of finite amount of data at each frequency using non-overlapping windows with fixed length. The extended theory also connects amplitude and phase variances with the variance of each complex spectrum value. Analysis of synthetic stationary ambient noise is used to estimate the confidence interval of stacked cross-spectrum obtained with different length of noise data corresponding to different number of evenly spaced windows of the same duration. This method allows estimating Signal/Noise Ratio (SNR) of noise cross-correlation in the frequency domain, without specifying filter bandwidth or signal/noise windows that are needed for time domain SNR estimations. Based on synthetic ambient noise data, we also compare the probability distributions, causal part amplitude and SNR of stacked cross-spectrum function using one-bit normalization or pre-whitening with those obtained without these pre-processing steps. Natural continuous noise records contain both ambient noise and small earthquakes that are inseparable from the noise with the existing pre-processing steps. Using probability distributions of random cross-spectrum values based on the theoretical results provides an effective way to exclude such small earthquakes, and additional data segments (outliers) contaminated by signals of different statistics (e.g. rain, cultural noise), from continuous noise waveforms. This technique is applied to constrain values and uncertainties of amplitude and phase velocity of stacked noise cross-spectrum at different frequencies, using data from southern California at both regional scale (˜35 km) and dense linear array (˜20 m) across the plate-boundary faults. A block bootstrap resampling method is used to account for temporal correlation of noise cross-spectrum at low frequencies (0.05-0.2 Hz) near the ocean microseismic peaks.
A neural network based reputation bootstrapping approach for service selection
NASA Astrophysics Data System (ADS)
Wu, Quanwang; Zhu, Qingsheng; Li, Peng
2015-10-01
With the concept of service-oriented computing becoming widely accepted in enterprise application integration, more and more computing resources are encapsulated as services and published online. Reputation mechanism has been studied to establish trust on prior unknown services. One of the limitations of current reputation mechanisms is that they cannot assess the reputation of newly deployed services as no record of their previous behaviours exists. Most of the current bootstrapping approaches merely assign default reputation values to newcomers. However, by this kind of methods, either newcomers or existing services will be favoured. In this paper, we present a novel reputation bootstrapping approach, where correlations between features and performance of existing services are learned through an artificial neural network (ANN) and they are then generalised to establish a tentative reputation when evaluating new and unknown services. Reputations of services published previously by the same provider are also incorporated for reputation bootstrapping if available. The proposed reputation bootstrapping approach is seamlessly embedded into an existing reputation model and implemented in the extended service-oriented architecture. Empirical studies of the proposed approach are shown at last.
Towards a bootstrap approach to higher orders of epsilon expansion
NASA Astrophysics Data System (ADS)
Dey, Parijat; Kaviraj, Apratim
2018-02-01
We employ a hybrid approach in determining the anomalous dimension and OPE coefficient of higher spin operators in the Wilson-Fisher theory. First we do a large spin analysis for CFT data where we use results obtained from the usual and the Mellin bootstrap and also from Feynman diagram literature. This gives new predictions at O( ɛ 4) and O( ɛ 5) for anomalous dimensions and OPE coefficients, and also provides a cross-check for the results from Mellin bootstrap. These higher orders get contributions from all higher spin operators in the crossed channel. We also use the bootstrap in Mellin space method for ϕ 3 in d = 6 - ɛ CFT where we calculate general higher spin OPE data. We demonstrate a higher loop order calculation in this approach by summing over contributions from higher spin operators of the crossed channel in the same spirit as before.
Computationally Efficient Resampling of Nonuniform Oversampled SAR Data
2010-05-01
noncoherently . The resample data is calculated using both a simple average and a weighted average of the demodulated data. The average nonuniform...trials with randomly varying accelerations. The results are shown in Fig. 5 for the noncoherent power difference and Fig. 6 for and coherent power...simple average. Figure 5. Noncoherent difference between SAR imagery generated with uniform sampling and nonuniform sampling that was resampled
H. T. Schreuder; M. S. Williams
2000-01-01
In simulation sampling from forest populations using sample sizes of 20, 40, and 60 plots respectively, confidence intervals based on the bootstrap (accelerated, percentile, and t-distribution based) were calculated and compared with those based on the classical t confidence intervals for mapped populations and subdomains within those populations. A 68.1 ha mapped...
NASA Astrophysics Data System (ADS)
Serinaldi, Francesco; Kilsby, Chris G.
2013-06-01
The information contained in hyetographs and hydrographs is often synthesized by using key properties such as the peak or maximum value Xp, volume V, duration D, and average intensity I. These variables play a fundamental role in hydrologic engineering as they are used, for instance, to define design hyetographs and hydrographs as well as to model and simulate the rainfall and streamflow processes. Given their inherent variability and the empirical evidence of the presence of a significant degree of association, such quantities have been studied as correlated random variables suitable to be modeled by multivariate joint distribution functions. The advent of copulas in geosciences simplified the inference procedures allowing for splitting the analysis of the marginal distributions and the study of the so-called dependence structure or copula. However, the attention paid to the modeling task has overlooked a more thorough study of the true nature and origin of the relationships that link Xp,V,D, and I. In this study, we apply a set of ad hoc bootstrap algorithms to investigate these aspects by analyzing the hyetographs and hydrographs extracted from 282 daily rainfall series from central eastern Europe, three 5 min rainfall series from central Italy, 80 daily streamflow series from the continental United States, and two sets of 200 simulated universal multifractal time series. Our results show that all the pairwise dependence structures between Xp,V,D, and I exhibit some key properties that can be reproduced by simple bootstrap algorithms that rely on a standard univariate resampling without resort to multivariate techniques. Therefore, the strong similarities between the observed dependence structures and the agreement between the observed and bootstrap samples suggest the existence of a numerical generating mechanism based on the superposition of the effects of sampling data at finite time steps and the process of summing realizations of independent random variables over random durations. We also show that the pairwise dependence structures are weakly dependent on the internal patterns of the hyetographs and hydrographs, meaning that the temporal evolution of the rainfall and runoff events marginally influences the mutual relationships of Xp,V,D, and I. Finally, our findings point out that subtle and often overlooked deterministic relationships between the properties of the event hyetographs and hydrographs exist. Confusing these relationships with genuine stochastic relationships can lead to an incorrect application of multivariate distributions and copulas and to misleading results.
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
NASA Astrophysics Data System (ADS)
Jannati, Mojtaba; Valadan Zoej, Mohammad Javad; Mokhtarzade, Mehdi
2018-03-01
This paper presents a novel approach to epipolar resampling of cross-track linear pushbroom imagery using orbital parameters model (OPM). The backbone of the proposed method relies on modification of attitude parameters of linear array stereo imagery in such a way to parallelize the approximate conjugate epipolar lines (ACELs) with the instantaneous base line (IBL) of the conjugate image points (CIPs). Afterward, a complementary rotation is applied in order to parallelize all the ACELs throughout the stereo imagery. The new estimated attitude parameters are evaluated based on the direction of the IBL and the ACELs. Due to the spatial and temporal variability of the IBL (respectively changes in column and row numbers of the CIPs) and nonparallel nature of the epipolar lines in the stereo linear images, some polynomials in the both column and row numbers of the CIPs are used to model new attitude parameters. As the instantaneous position of sensors remains fix, the digital elevation model (DEM) of the area of interest is not required in the resampling process. According to the experimental results obtained from two pairs of SPOT and RapidEye stereo imagery with a high elevation relief, the average absolute values of remained vertical parallaxes of CIPs in the normalized images were obtained 0.19 and 0.28 pixels respectively, which confirm the high accuracy and applicability of the proposed method.
Vorburger, Robert S; Habeck, Christian G; Narkhede, Atul; Guzman, Vanessa A; Manly, Jennifer J; Brickman, Adam M
2016-01-01
Diffusion tensor imaging suffers from an intrinsic low signal-to-noise ratio. Bootstrap algorithms have been introduced to provide a non-parametric method to estimate the uncertainty of the measured diffusion parameters. To quantify the variability of the principal diffusion direction, bootstrap-derived metrics such as the cone of uncertainty have been proposed. However, bootstrap-derived metrics are not independent of the underlying diffusion profile. A higher mean diffusivity causes a smaller signal-to-noise ratio and, thus, increases the measurement uncertainty. Moreover, the goodness of the tensor model, which relies strongly on the complexity of the underlying diffusion profile, influences bootstrap-derived metrics as well. The presented simulations clearly depict the cone of uncertainty as a function of the underlying diffusion profile. Since the relationship of the cone of uncertainty and common diffusion parameters, such as the mean diffusivity and the fractional anisotropy, is not linear, the cone of uncertainty has a different sensitivity. In vivo analysis of the fornix reveals the cone of uncertainty to be a predictor of memory function among older adults. No significant correlation occurs with the common diffusion parameters. The present work not only demonstrates the cone of uncertainty as a function of the actual diffusion profile, but also discloses the cone of uncertainty as a sensitive predictor of memory function. Future studies should incorporate bootstrap-derived metrics to provide more comprehensive analysis.
Molinos-Senante, María; Donoso, Guillermo; Sala-Garrido, Ramon; Villegas, Andrés
2018-03-01
Benchmarking the efficiency of water companies is essential to set water tariffs and to promote their sustainability. In doing so, most of the previous studies have applied conventional data envelopment analysis (DEA) models. However, it is a deterministic method that does not allow to identify environmental factors influencing efficiency scores. To overcome this limitation, this paper evaluates the efficiency of a sample of Chilean water and sewerage companies applying a double-bootstrap DEA model. Results evidenced that the ranking of water and sewerage companies changes notably whether efficiency scores are computed applying conventional or double-bootstrap DEA models. Moreover, it was found that the percentage of non-revenue water and customer density are factors influencing the efficiency of Chilean water and sewerage companies. This paper illustrates the importance of using a robust and reliable method to increase the relevance of benchmarking tools.
Pasta, D J; Taylor, J L; Henning, J M
1999-01-01
Decision-analytic models are frequently used to evaluate the relative costs and benefits of alternative therapeutic strategies for health care. Various types of sensitivity analysis are used to evaluate the uncertainty inherent in the models. Although probabilistic sensitivity analysis is more difficult theoretically and computationally, the results can be much more powerful and useful than deterministic sensitivity analysis. The authors show how a Monte Carlo simulation can be implemented using standard software to perform a probabilistic sensitivity analysis incorporating the bootstrap. The method is applied to a decision-analytic model evaluating the cost-effectiveness of Helicobacter pylori eradication. The necessary steps are straightforward and are described in detail. The use of the bootstrap avoids certain difficulties encountered with theoretical distributions. The probabilistic sensitivity analysis provided insights into the decision-analytic model beyond the traditional base-case and deterministic sensitivity analyses and should become the standard method for assessing sensitivity.
Image analysis of representative food structures: application of the bootstrap method.
Ramírez, Cristian; Germain, Juan C; Aguilera, José M
2009-08-01
Images (for example, photomicrographs) are routinely used as qualitative evidence of the microstructure of foods. In quantitative image analysis it is important to estimate the area (or volume) to be sampled, the field of view, and the resolution. The bootstrap method is proposed to estimate the size of the sampling area as a function of the coefficient of variation (CV(Bn)) and standard error (SE(Bn)) of the bootstrap taking sub-areas of different sizes. The bootstrap method was applied to simulated and real structures (apple tissue). For simulated structures, 10 computer-generated images were constructed containing 225 black circles (elements) and different coefficient of variation (CV(image)). For apple tissue, 8 images of apple tissue containing cellular cavities with different CV(image) were analyzed. Results confirmed that for simulated and real structures, increasing the size of the sampling area decreased the CV(Bn) and SE(Bn). Furthermore, there was a linear relationship between the CV(image) and CV(Bn) (.) For example, to obtain a CV(Bn) = 0.10 in an image with CV(image) = 0.60, a sampling area of 400 x 400 pixels (11% of whole image) was required, whereas if CV(image) = 1.46, a sampling area of 1000 x 100 pixels (69% of whole image) became necessary. This suggests that a large-size dispersion of element sizes in an image requires increasingly larger sampling areas or a larger number of images.
A bootstrapping method for development of Treebank
NASA Astrophysics Data System (ADS)
Zarei, F.; Basirat, A.; Faili, H.; Mirain, M.
2017-01-01
Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].
Fourier Descriptor Analysis and Unification of Voice Range Profile Contours: Method and Applications
ERIC Educational Resources Information Center
Pabon, Peter; Ternstrom, Sten; Lamarche, Anick
2011-01-01
Purpose: To describe a method for unified description, statistical modeling, and comparison of voice range profile (VRP) contours, even from diverse sources. Method: A morphologic modeling technique, which is based on Fourier descriptors (FDs), is applied to the VRP contour. The technique, which essentially involves resampling of the curve of the…
Bradu, Adrian; Kapinchev, Konstantin; Barnes, Frederick; Podoleanu, Adrian
2015-07-01
In a previous report, we demonstrated master-slave optical coherence tomography (MS-OCT), an OCT method that does not need resampling of data and can be used to deliver en face images from several depths simultaneously. In a separate report, we have also demonstrated MS-OCT's capability of producing cross-sectional images of a quality similar to those provided by the traditional Fourier domain (FD) OCT technique, but at a much slower rate. Here, we demonstrate that by taking advantage of the parallel processing capabilities offered by the MS-OCT method, cross-sectional OCT images of the human retina can be produced in real time. We analyze the conditions that ensure a true real-time B-scan imaging operation and demonstrate in vivo real-time images from human fovea and the optic nerve, with resolution and sensitivity comparable to those produced using the traditional FD-based method, however, without the need of data resampling.
A program for handling map projections of small-scale geospatial raster data
Finn, Michael P.; Steinwand, Daniel R.; Trent, Jason R.; Buehler, Robert A.; Mattli, David M.; Yamamoto, Kristina H.
2012-01-01
Scientists routinely accomplish small-scale geospatial modeling using raster datasets of global extent. Such use often requires the projection of global raster datasets onto a map or the reprojection from a given map projection associated with a dataset. The distortion characteristics of these projection transformations can have significant effects on modeling results. Distortions associated with the reprojection of global data are generally greater than distortions associated with reprojections of larger-scale, localized areas. The accuracy of areas in projected raster datasets of global extent is dependent on spatial resolution. To address these problems of projection and the associated resampling that accompanies it, methods for framing the transformation space, direct point-to-point transformations rather than gridded transformation spaces, a solution to the wrap-around problem, and an approach to alternative resampling methods are presented. The implementations of these methods are provided in an open-source software package called MapImage (or mapIMG, for short), which is designed to function on a variety of computer architectures.
An Acoustic OFDM System with Symbol-by-Symbol Doppler Compensation for Underwater Communication
MinhHai, Tran; Rie, Saotome; Suzuki, Taisaku; Wada, Tomohisa
2016-01-01
We propose an acoustic OFDM system for underwater communication, specifically for vertical link communications such as between a robot in the sea bottom and a mother ship in the surface. The main contributions are (1) estimation of time varying Doppler shift using continual pilots in conjunction with monitoring the drift of Power Delay Profile and (2) symbol-by-symbol Doppler compensation in frequency domain by an ICI matrix representing nonuniform Doppler. In addition, we compare our proposal against a resampling method. Simulation and experimental results confirm that our system outperforms the resampling method when the velocity changes roughly over OFDM symbols. Overall, experimental results taken in Shizuoka, Japan, show our system using 16QAM, and 64QAM achieved a data throughput of 7.5 Kbit/sec with a transmitter moving at maximum 2 m/s, in a complicated trajectory, over 30 m vertically. PMID:27057558
Measures of precision for dissimilarity-based multivariate analysis of ecological communities
Anderson, Marti J; Santana-Garcon, Julia
2015-01-01
Ecological studies require key decisions regarding the appropriate size and number of sampling units. No methods currently exist to measure precision for multivariate assemblage data when dissimilarity-based analyses are intended to follow. Here, we propose a pseudo multivariate dissimilarity-based standard error (MultSE) as a useful quantity for assessing sample-size adequacy in studies of ecological communities. Based on sums of squared dissimilarities, MultSE measures variability in the position of the centroid in the space of a chosen dissimilarity measure under repeated sampling for a given sample size. We describe a novel double resampling method to quantify uncertainty in MultSE values with increasing sample size. For more complex designs, values of MultSE can be calculated from the pseudo residual mean square of a permanova model, with the double resampling done within appropriate cells in the design. R code functions for implementing these techniques, along with ecological examples, are provided. PMID:25438826
Efficient high-quality volume rendering of SPH data.
Fraedrich, Roland; Auer, Stefan; Westermann, Rüdiger
2010-01-01
High quality volume rendering of SPH data requires a complex order-dependent resampling of particle quantities along the view rays. In this paper we present an efficient approach to perform this task using a novel view-space discretization of the simulation domain. Our method draws upon recent work on GPU-based particle voxelization for the efficient resampling of particles into uniform grids. We propose a new technique that leverages a perspective grid to adaptively discretize the view-volume, giving rise to a continuous level-of-detail sampling structure and reducing memory requirements compared to a uniform grid. In combination with a level-of-detail representation of the particle set, the perspective grid allows effectively reducing the amount of primitives to be processed at run-time. We demonstrate the quality and performance of our method for the rendering of fluid and gas dynamics SPH simulations consisting of many millions of particles.
Seol, Hyunsoo
2016-06-01
The purpose of this study was to apply the bootstrap procedure to evaluate how the bootstrapped confidence intervals (CIs) for polytomous Rasch fit statistics might differ according to sample sizes and test lengths in comparison with the rule-of-thumb critical value of misfit. A total of 25 simulated data sets were generated to fit the Rasch measurement and then a total of 1,000 replications were conducted to compute the bootstrapped CIs under each of 25 testing conditions. The results showed that rule-of-thumb critical values for assessing the magnitude of misfit were not applicable because the infit and outfit mean square error statistics showed different magnitudes of variability over testing conditions and the standardized fit statistics did not exactly follow the standard normal distribution. Further, they also do not share the same critical range for the item and person misfit. Based on the results of the study, the bootstrapped CIs can be used to identify misfitting items or persons as they offer a reasonable alternative solution, especially when the distributions of the infit and outfit statistics are not well known and depend on sample size. © The Author(s) 2016.
Soguero-Ruiz, Cristina; Hindberg, Kristian; Rojo-Alvarez, Jose Luis; Skrovseth, Stein Olav; Godtliebsen, Fred; Mortensen, Kim; Revhaug, Arthur; Lindsetmo, Rolv-Ole; Augestad, Knut Magne; Jenssen, Robert
2016-09-01
The free text in electronic health records (EHRs) conveys a huge amount of clinical information about health state and patient history. Despite a rapidly growing literature on the use of machine learning techniques for extracting this information, little effort has been invested toward feature selection and the features' corresponding medical interpretation. In this study, we focus on the task of early detection of anastomosis leakage (AL), a severe complication after elective surgery for colorectal cancer (CRC) surgery, using free text extracted from EHRs. We use a bag-of-words model to investigate the potential for feature selection strategies. The purpose is earlier detection of AL and prediction of AL with data generated in the EHR before the actual complication occur. Due to the high dimensionality of the data, we derive feature selection strategies using the robust support vector machine linear maximum margin classifier, by investigating: 1) a simple statistical criterion (leave-one-out-based test); 2) an intensive-computation statistical criterion (Bootstrap resampling); and 3) an advanced statistical criterion (kernel entropy). Results reveal a discriminatory power for early detection of complications after CRC (sensitivity 100%; specificity 72%). These results can be used to develop prediction models, based on EHR data, that can support surgeons and patients in the preoperative decision making phase.
A Satellite Mortality Study to Support Space Systems Lifetime Prediction
NASA Technical Reports Server (NTRS)
Fox, George; Salazar, Ronald; Habib-Agahi, Hamid; Dubos, Gregory
2013-01-01
Estimating the operational lifetime of satellites and spacecraft is a complex process. Operational lifetime can differ from mission design lifetime for a variety of reasons. Unexpected mortality can occur due to human errors in design and fabrication, to human errors in launch and operations, to random anomalies of hardware and software or even satellite function degradation or technology change, leading to unrealized economic or mission return. This study focuses on data collection of public information using, for the first time, a large, publically available dataset, and preliminary analysis of satellite lifetimes, both operational lifetime and design lifetime. The objective of this study is the illustration of the relationship of design life to actual lifetime for some representative classes of satellites and spacecraft. First, a Weibull and Exponential lifetime analysis comparison is performed on the ratio of mission operating lifetime to design life, accounting for terminated and ongoing missions. Next a Kaplan-Meier survivor function, standard practice for clinical trials analysis, is estimated from operating lifetime. Bootstrap resampling is used to provide uncertainty estimates of selected survival probabilities. This study highlights the need for more detailed databases and engineering reliability models of satellite lifetime that include satellite systems and subsystems, operations procedures and environmental characteristics to support the design of complex, multi-generation, long-lived space systems in Earth orbit.
A Bootstrap Algorithm for Mixture Models and Interval Data in Inter-Comparisons
2001-07-01
parametric bootstrap. The present algorithm will be applied to a thermometric inter-comparison, where data cannot be assumed to be normally distributed. 2 Data...experimental methods, used in each laboratory) often imply that the statistical assumptions are not satisfied, as for example in several thermometric ...triangular). Indeed, in thermometric experiments these three probabilistic models can represent several common stochastic variabilities for
NASA Astrophysics Data System (ADS)
Yang, P.; Ng, T. L.; Yang, W.
2015-12-01
Effective water resources management depends on the reliable estimation of the uncertainty of drought events. Confidence intervals (CIs) are commonly applied to quantify this uncertainty. A CI seeks to be at the minimal length necessary to cover the true value of the estimated variable with the desired probability. In drought analysis where two or more variables (e.g., duration and severity) are often used to describe a drought, copulas have been found suitable for representing the joint probability behavior of these variables. However, the comprehensive assessment of the parameter uncertainties of copulas of droughts has been largely ignored, and the few studies that have recognized this issue have not explicitly compared the various methods to produce the best CIs. Thus, the objective of this study to compare the CIs generated using two widely applied uncertainty estimation methods, bootstrapping and Markov Chain Monte Carlo (MCMC). To achieve this objective, (1) the marginal distributions lognormal, Gamma, and Generalized Extreme Value, and the copula functions Clayton, Frank, and Plackett are selected to construct joint probability functions of two drought related variables. (2) The resulting joint functions are then fitted to 200 sets of simulated realizations of drought events with known distribution and extreme parameters and (3) from there, using bootstrapping and MCMC, CIs of the parameters are generated and compared. The effect of an informative prior on the CIs generated by MCMC is also evaluated. CIs are produced for different sample sizes (50, 100, and 200) of the simulated drought events for fitting the joint probability functions. Preliminary results assuming lognormal marginal distributions and the Clayton copula function suggest that for cases with small or medium sample sizes (~50-100), MCMC to be superior method if an informative prior exists. Where an informative prior is unavailable, for small sample sizes (~50), both bootstrapping and MCMC yield the same level of performance, and for medium sample sizes (~100), bootstrapping is better. For cases with a large sample size (~200), there is little difference between the CIs generated using bootstrapping and MCMC regardless of whether or not an informative prior exists.
Poder, Thomas G; Kouakou, Christian R C; Bouchard, Pierre-Alexandre; Tremblay, Véronique; Blais, Sébastien; Maltais, François; Lellouche, François
2018-01-23
Conduct a cost-effectiveness analysis of FreeO 2 technology versus manual oxygen-titration technology for patients with chronic obstructive pulmonary disease (COPD) hospitalised for acute exacerbations. Tertiary acute care hospital in Quebec, Canada. 47 patients with COPD hospitalised for acute exacerbations. An automated oxygen-titration and oxygen-weaning technology. The costs for hospitalisation and follow-up for 180 days were calculated using a microcosting approach and included the cost of FreeO 2 technology. Incremental cost-effectiveness ratios (ICERs) were calculated using bootstrap resampling with 5000 replications. The main effect variable was the percentage of time spent at the target oxygen saturation (SpO 2 ). The other two effect variables were the time spent in hyperoxia (target SpO 2 +5%) and in severe hypoxaemia (SpO 2 <85%). The resamplings were based on data from a randomised controlled trial with 47 patients with COPD hospitalised for acute exacerbations. FreeO 2 generated savings of 20.7% of the per-patient costs at 180 days (ie, -$C2959.71). This decrease is nevertheless not significant at the 95% threshold (P=0.13), but the effect variables all improved (P<0.001). The improvement in the time spent at the target SpO 2 was 56.3%. The ICERs indicate that FreeO 2 technology is more cost-effective than manual oxygen titration with a savings of -$C96.91 per percentage point of time spent at the target SpO 2 (95% CI -301.26 to 116.96). FreeO 2 technology could significantly enhance the efficiency of the health system by reducing per-patient costs at 180 days. A study with a larger patient sample needs to be carried out to confirm these preliminary results. NCT01393015; Post-results. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Direct measurement of fast transients by using boot-strapped waveform averaging
NASA Astrophysics Data System (ADS)
Olsson, Mattias; Edman, Fredrik; Karki, Khadga Jung
2018-03-01
An approximation to coherent sampling, also known as boot-strapped waveform averaging, is presented. The method uses digital cavities to determine the condition for coherent sampling. It can be used to increase the effective sampling rate of a repetitive signal and the signal to noise ratio simultaneously. The method is demonstrated by using it to directly measure the fluorescence lifetime from Rhodamine 6G by digitizing the signal from a fast avalanche photodiode. The obtained lifetime of 4.0 ns is in agreement with the known values.
Motion vector field phase-to-amplitude resampling for 4D motion-compensated cone-beam CT
NASA Astrophysics Data System (ADS)
Sauppe, Sebastian; Kuhm, Julian; Brehm, Marcus; Paysan, Pascal; Seghers, Dieter; Kachelrieß, Marc
2018-02-01
We propose a phase-to-amplitude resampling (PTAR) method to reduce motion blurring in motion-compensated (MoCo) 4D cone-beam CT (CBCT) image reconstruction, without increasing the computational complexity of the motion vector field (MVF) estimation approach. PTAR is able to improve the image quality in reconstructed 4D volumes, including both regular and irregular respiration patterns. The PTAR approach starts with a robust phase-gating procedure for the initial MVF estimation and then switches to a phase-adapted amplitude gating method. The switch implies an MVF-resampling, which makes them amplitude-specific. PTAR ensures that the MVFs, which have been estimated on phase-gated reconstructions, are still valid for all amplitude-gated reconstructions. To validate the method, we use an artificially deformed clinical CT scan with a realistic breathing pattern and several patient data sets acquired with a TrueBeamTM integrated imaging system (Varian Medical Systems, Palo Alto, CA, USA). Motion blurring, which still occurs around the area of the diaphragm or at small vessels above the diaphragm in artifact-specific cyclic motion compensation (acMoCo) images based on phase-gating, is significantly reduced by PTAR. Also, small lung structures appear sharper in the images. This is demonstrated both for simulated and real patient data. A quantification of the sharpness of the diaphragm confirms these findings. PTAR improves the image quality of 4D MoCo reconstructions compared to conventional phase-gated MoCo images, in particular for irregular breathing patterns. Thus, PTAR increases the robustness of MoCo reconstructions for CBCT. Because PTAR does not require any additional steps for the MVF estimation, it is computationally efficient. Our method is not restricted to CBCT but could rather be applied to other image modalities.
Confidence intervals for correlations when data are not normal.
Bishara, Anthony J; Hittner, James B
2017-02-01
With nonnormal data, the typical confidence interval of the correlation (Fisher z') may be inaccurate. The literature has been unclear as to which of several alternative methods should be used instead, and how extreme a violation of normality is needed to justify an alternative. Through Monte Carlo simulation, 11 confidence interval methods were compared, including Fisher z', two Spearman rank-order methods, the Box-Cox transformation, rank-based inverse normal (RIN) transformation, and various bootstrap methods. Nonnormality often distorted the Fisher z' confidence interval-for example, leading to a 95 % confidence interval that had actual coverage as low as 68 %. Increasing the sample size sometimes worsened this problem. Inaccurate Fisher z' intervals could be predicted by a sample kurtosis of at least 2, an absolute sample skewness of at least 1, or significant violations of normality hypothesis tests. Only the Spearman rank-order and RIN transformation methods were universally robust to nonnormality. Among the bootstrap methods, an observed imposed bootstrap came closest to accurate coverage, though it often resulted in an overly long interval. The results suggest that sample nonnormality can justify avoidance of the Fisher z' interval in favor of a more robust alternative. R code for the relevant methods is provided in supplementary materials.
NASA Astrophysics Data System (ADS)
Ruggeri, Paolo; Irving, James; Holliger, Klaus
2015-08-01
We critically examine the performance of sequential geostatistical resampling (SGR) as a model proposal mechanism for Bayesian Markov-chain-Monte-Carlo (MCMC) solutions to near-surface geophysical inverse problems. Focusing on a series of simple yet realistic synthetic crosshole georadar tomographic examples characterized by different numbers of data, levels of data error and degrees of model parameter spatial correlation, we investigate the efficiency of three different resampling strategies with regard to their ability to generate statistically independent realizations from the Bayesian posterior distribution. Quite importantly, our results show that, no matter what resampling strategy is employed, many of the examined test cases require an unreasonably high number of forward model runs to produce independent posterior samples, meaning that the SGR approach as currently implemented will not be computationally feasible for a wide range of problems. Although use of a novel gradual-deformation-based proposal method can help to alleviate these issues, it does not offer a full solution. Further, we find that the nature of the SGR is found to strongly influence MCMC performance; however no clear rule exists as to what set of inversion parameters and/or overall proposal acceptance rate will allow for the most efficient implementation. We conclude that although the SGR methodology is highly attractive as it allows for the consideration of complex geostatistical priors as well as conditioning to hard and soft data, further developments are necessary in the context of novel or hybrid MCMC approaches for it to be considered generally suitable for near-surface geophysical inversions.
Monte Carlo based statistical power analysis for mediation models: methods and software.
Zhang, Zhiyong
2014-12-01
The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
NASA Astrophysics Data System (ADS)
Huang, Huan; Baddour, Natalie; Liang, Ming
2018-02-01
Under normal operating conditions, bearings often run under time-varying rotational speed conditions. Under such circumstances, the bearing vibrational signal is non-stationary, which renders ineffective the techniques used for bearing fault diagnosis under constant running conditions. One of the conventional methods of bearing fault diagnosis under time-varying speed conditions is resampling the non-stationary signal to a stationary signal via order tracking with the measured variable speed. With the resampled signal, the methods available for constant condition cases are thus applicable. However, the accuracy of the order tracking is often inadequate and the time-varying speed is sometimes not measurable. Thus, resampling-free methods are of interest for bearing fault diagnosis under time-varying rotational speed for use without tachometers. With the development of time-frequency analysis, the time-varying fault character manifests as curves in the time-frequency domain. By extracting the Instantaneous Fault Characteristic Frequency (IFCF) from the Time-Frequency Representation (TFR) and converting the IFCF, its harmonics, and the Instantaneous Shaft Rotational Frequency (ISRF) into straight lines, the bearing fault can be detected and diagnosed without resampling. However, so far, the extraction of the IFCF for bearing fault diagnosis is mostly based on the assumption that at each moment the IFCF has the highest amplitude in the TFR, which is not always true. Hence, a more reliable T-F curve extraction approach should be investigated. Moreover, if the T-F curves including the IFCF, its harmonic, and the ISRF can be all extracted from the TFR directly, no extra processing is needed for fault diagnosis. Therefore, this paper proposes an algorithm for multiple T-F curve extraction from the TFR based on a fast path optimization which is more reliable for T-F curve extraction. Then, a new procedure for bearing fault diagnosis under unknown time-varying speed conditions is developed based on the proposed algorithm and a new fault diagnosis strategy. The average curve-to-curve ratios are utilized to describe the relationship of the extracted curves and fault diagnosis can then be achieved by comparing the ratios to the fault characteristic coefficients. The effectiveness of the proposed method is validated by simulated and experimental signals.
McRoy, Susan; Jones, Sean; Kurmally, Adam
2016-09-01
This article examines methods for automated question classification applied to cancer-related questions that people have asked on the web. This work is part of a broader effort to provide automated question answering for health education. We created a new corpus of consumer-health questions related to cancer and a new taxonomy for those questions. We then compared the effectiveness of different statistical methods for developing classifiers, including weighted classification and resampling. Basic methods for building classifiers were limited by the high variability in the natural distribution of questions and typical refinement approaches of feature selection and merging categories achieved only small improvements to classifier accuracy. Best performance was achieved using weighted classification and resampling methods, the latter yielding an accuracy of F1 = 0.963. Thus, it would appear that statistical classifiers can be trained on natural data, but only if natural distributions of classes are smoothed. Such classifiers would be useful for automated question answering, for enriching web-based content, or assisting clinical professionals to answer questions. © The Author(s) 2015.
Empirical single sample quantification of bias and variance in Q-ball imaging.
Hainline, Allison E; Nath, Vishwesh; Parvathaneni, Prasanna; Blaber, Justin A; Schilling, Kurt G; Anderson, Adam W; Kang, Hakmook; Landman, Bennett A
2018-02-06
The bias and variance of high angular resolution diffusion imaging methods have not been thoroughly explored in the literature and may benefit from the simulation extrapolation (SIMEX) and bootstrap techniques to estimate bias and variance of high angular resolution diffusion imaging metrics. The SIMEX approach is well established in the statistics literature and uses simulation of increasingly noisy data to extrapolate back to a hypothetical case with no noise. The bias of calculated metrics can then be computed by subtracting the SIMEX estimate from the original pointwise measurement. The SIMEX technique has been studied in the context of diffusion imaging to accurately capture the bias in fractional anisotropy measurements in DTI. Herein, we extend the application of SIMEX and bootstrap approaches to characterize bias and variance in metrics obtained from a Q-ball imaging reconstruction of high angular resolution diffusion imaging data. The results demonstrate that SIMEX and bootstrap approaches provide consistent estimates of the bias and variance of generalized fractional anisotropy, respectively. The RMSE for the generalized fractional anisotropy estimates shows a 7% decrease in white matter and an 8% decrease in gray matter when compared with the observed generalized fractional anisotropy estimates. On average, the bootstrap technique results in SD estimates that are approximately 97% of the true variation in white matter, and 86% in gray matter. Both SIMEX and bootstrap methods are flexible, estimate population characteristics based on single scans, and may be extended for bias and variance estimation on a variety of high angular resolution diffusion imaging metrics. © 2018 International Society for Magnetic Resonance in Medicine.
Reassessing the NTCTCS Staging Systems for Differentiated Thyroid Cancer, Including Age at Diagnosis
McLeod, Donald S.A.; Jonklaas, Jacqueline; Brierley, James D.; Ain, Kenneth B.; Cooper, David S.; Fein, Henry G.; Haugen, Bryan R.; Ladenson, Paul W.; Magner, James; Ross, Douglas S.; Skarulis, Monica C.; Steward, David L.; Xing, Mingzhao; Litofsky, Danielle R.; Maxon, Harry R.
2015-01-01
Background: Thyroid cancer is unique for having age as a staging variable. Recently, the commonly used age cut-point of 45 years has been questioned. Objective: This study assessed alternate staging systems on the outcome of overall survival, and compared these with current National Thyroid Cancer Treatment Cooperative Study (NTCTCS) staging systems for papillary and follicular thyroid cancer. Methods: A total of 4721 patients with differentiated thyroid cancer were assessed. Five potential alternate staging systems were generated at age cut-points in five-year increments from 35 to 70 years, and tested for model discrimination (Harrell's C-statistic) and calibration (R2). The best five models for papillary and follicular cancer were further tested with bootstrap resampling and significance testing for discrimination. Results: The best five alternate papillary cancer systems had age cut-points of 45–50 years, with the highest scoring model using 50 years. No significant difference in C-statistic was found between the best alternate and current NTCTCS systems (p = 0.200). The best five alternate follicular cancer systems had age cut-points of 50–55 years, with the highest scoring model using 50 years. All five best alternate staging systems performed better compared with the current system (p = 0.003–0.035). There was no significant difference in discrimination between the best alternate system (cut-point age 50 years) and the best system of cut-point age 45 years (p = 0.197). Conclusions: No alternate papillary cancer systems assessed were significantly better than the current system. New alternate staging systems for follicular cancer appear to be better than the current NTCTCS system, although they require external validation. PMID:26203804
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, Donald L.; Hilohi, C. Michael; Spelic, David C.
2012-10-15
Purpose: To determine patient radiation doses from interventional cardiology procedures in the U.S and to suggest possible initial values for U.S. benchmarks for patient radiation dose from selected interventional cardiology procedures [fluoroscopically guided diagnostic cardiac catheterization and percutaneous coronary intervention (PCI)]. Methods: Patient radiation dose metrics were derived from analysis of data from the 2008 to 2009 Nationwide Evaluation of X-ray Trends (NEXT) survey of cardiac catheterization. This analysis used deidentified data and did not require review by an IRB. Data from 171 facilities in 30 states were analyzed. The distributions (percentiles) of radiation dose metrics were determined for diagnosticmore » cardiac catheterizations, PCI, and combined diagnostic and PCI procedures. Confidence intervals for these dose distributions were determined using bootstrap resampling. Results: Percentile distributions (advisory data sets) and possible preliminary U.S. reference levels (based on the 75th percentile of the dose distributions) are provided for cumulative air kerma at the reference point (K{sub a,r}), cumulative air kerma-area product (P{sub KA}), fluoroscopy time, and number of cine runs. Dose distributions are sufficiently detailed to permit dose audits as described in National Council on Radiation Protection and Measurements Report No. 168. Fluoroscopy times are consistent with those observed in European studies, but P{sub KA} is higher in the U.S. Conclusions: Sufficient data exist to suggest possible initial benchmarks for patient radiation dose for certain interventional cardiology procedures in the U.S. Our data suggest that patient radiation dose in these procedures is not optimized in U.S. practice.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cauthen, Katherine Regina; Lambert, Gregory Joseph; Finley, Patrick D.
There is mounting evidence that alcohol use is significantly linked to lower HCV treatment response rates in interferon-based therapies, though some of the evidence is conflicting. Furthermore, although health care providers recommend reducing or abstaining from alcohol use prior to treatment, many patients do not succeed in doing so. The goal of this meta-analysis was to systematically review and summarize the Englishlanguage literature up through January 30, 2015 regarding the relationship between alcohol use and HCV treatment outcomes, among patients who were not required to abstain from alcohol use in order to receive treatment. Seven pertinent articles studying 1,751 HCV-infectedmore » patients were identified. Log-ORs of HCV treatment response for heavy alcohol use and light alcohol use were calculated and compared. We employed a hierarchical Bayesian meta-analytic model to accommodate the small sample size. The summary estimate for the log-OR of HCV treatment response was -0.775 with a 95% credible interval of (-1.397, -0.236). The results of the Bayesian meta-analysis are slightly more conservative compared to those obtained from a boot-strapped, random effects model. We found evidence of heterogeneity (Q = 14.489, p = 0.025), accounting for 60.28% of the variation among log-ORs. Meta-regression to capture the sources of this heterogeneity did not identify any of the covariates investigated as significant. This meta-analysis confirms that heavy alcohol use is associated with decreased HCV treatment response compared to lighter levels of alcohol use. Further research is required to characterize the mechanism by which alcohol use affects HCV treatment response.« less
Gupta, Nidhi; Christiansen, Caroline Stordal; Hanisch, Christiana; Bay, Hans; Burr, Hermann; Holtermann, Andreas
2017-01-01
Objectives To investigate the differences between a questionnaire-based and accelerometer-based sitting time, and develop a model for improving the accuracy of questionnaire-based sitting time for predicting accelerometer-based sitting time. Methods 183 workers in a cross-sectional study reported sitting time per day using a single question during the measurement period, and wore 2 Actigraph GT3X+ accelerometers on the thigh and trunk for 1–4 working days to determine their actual sitting time per day using the validated Acti4 software. Least squares regression models were fitted with questionnaire-based siting time and other self-reported predictors to predict accelerometer-based sitting time. Results Questionnaire-based and accelerometer-based average sitting times were ≈272 and ≈476 min/day, respectively. A low Pearson correlation (r=0.32), high mean bias (204.1 min) and wide limits of agreement (549.8 to −139.7 min) between questionnaire-based and accelerometer-based sitting time were found. The prediction model based on questionnaire-based sitting explained 10% of the variance in accelerometer-based sitting time. Inclusion of 9 self-reported predictors in the model increased the explained variance to 41%, with 10% optimism using a resampling bootstrap validation. Based on a split validation analysis, the developed prediction model on ≈75% of the workers (n=132) reduced the mean and the SD of the difference between questionnaire-based and accelerometer-based sitting time by 64% and 42%, respectively, in the remaining 25% of the workers. Conclusions This study indicates that questionnaire-based sitting time has low validity and that a prediction model can be one solution to materially improve the precision of questionnaire-based sitting time. PMID:28093433
2014-01-01
Background For over a decade, the presence of trombiculid mites in some mountain areas of La Rioja (Northern Spain) and their association with seasonal human dermatitis have been recognized. This work aimed to establish the species identity of the agent causing trombiculiasis in the study area. Methods Trombiculid larvae (chigger mites) were collected from vegetation in the Sierra Cebollera Natural Park and in Sierra La Hez during an outbreak of human trombiculiasis in 2010. Three specimens collected from a bird were also examined. Identification was made using morphological and morphometric traits based on the most recent taxonomic sources. A comparison of those mites with specimens of the same species collected throughout Europe was performed by means of cluster analysis with multiscale bootstrap resampling and calculation of approximately unbiased p-values. Results All collected mites were identified as Neotrombicula inopinata (Oudemans, 1909). Therefore, this species is the most likely causative agent of trombiculiasis in Spain, not Neotrombicula autumnalis (Shaw, 1790), as it was generally assumed. No chigger was identified as N. autumnalis in the study area. Neotrombicula inopinata clearly differs from N. autumnalis in the presence of eight or more setae in the 1st and 2nd rows of dorsal idiosomal setae vs. six setae in N. autumnalis. Comparison of N. inopinata samples from different locations shows significant geographic variability in morphometric traits. Samples from Western and Eastern Europe and the Caucasus formed three separate clusters. Conclusion Since the taxonomical basis of many studies concerning N. autumnalis as a causative agent of trombiculiasis is insufficient, it is highly possible that N. inopinata may be hiding behind the common name of “harvest bug” in Europe, together with N. autumnalis. PMID:24589214
Varshney, Anubodh; Watson, Ryan A; Noll, Andrew; Im, KyungAh; Rossi, Jeffrey; Shah, Pinak; Giugliano, Robert P
2018-05-19
Optimal antithrombotic therapy after transcatheter aortic valve replacement (TAVR) remains unclear. We evaluated the association between antithrombotic regimens and outcomes in TAVR patients. We retrospectively analyzed consecutive patients who underwent TAVR at a single academic center from April 2009 to March 2014. Antithrombotic regimens were classified as single or dual antiplatelet therapy (AP), single antiplatelet plus anticoagulant (SAC), or triple therapy (TT). The primary endpoint was a composite of death, myocardial infarction (MI), stroke, and major bleeding. Adjusted hazard ratios (HRs) were obtained with best subset variable selection methods using bootstrap resampling. Of 246 patients who underwent TAVR, 241 were eligible for analysis with 133, 88, and 20 patients in the AP, SAC, and TT groups, respectively. During a median 2.1-year follow-up, 53.5% had at least one endpoint-the most common was death (68%), followed by major bleeding (23%), stroke (6%), and MI (3%). At 2 years, the composite outcome occurred in 70% of TT, 42% of SAC, and 31% of AP patients. Compared to AP, adjusted HRs for the composite outcome were 2.88 [95% Confidence intervals (CI) (1.61-5.16); p = 0.0004] and 1.66 (95% CI [1.13-2.42]; p = 0.009) in the TT and SAC groups, respectively. Mortality rates at 2 years were 61% in the TT, 32% in the SAC, and 26% in the AP groups (p = 0.005). The risk of the composite outcome of death, MI, stroke, or major bleeding at 2-year follow-up was significantly higher in TAVR patients treated with TT or SAC versus AP, even after multivariate adjustment.
Geriatric Fever Score: A New Decision Rule for Geriatric Care
Vong, Si-Chon; Yang, Tzu-Meng; Chen, Kuo-Tai; Lin, Hung-Jung; Chen, Jiann-Hwa; Su, Shih-Bin; Guo, How-Ran; Hsu, Chien-Chin
2014-01-01
Background Evaluating geriatric patients with fever is time-consuming and challenging. We investigated independent mortality predictors of geriatric patients with fever and developed a prediction rule for emergency care, critical care, and geriatric care physicians to classify patients into mortality risk and disposition groups. Materials and Methods Consecutive geriatric patients (≥65 years old) visiting the emergency department (ED) of a university-affiliated medical center between June 1 and July 21, 2010, were enrolled when they met the criteria of fever: a tympanic temperature ≥37.2°C or a baseline temperature elevated ≥1.3°C. Thirty-day mortality was the primary endpoint. Internal validation with bootstrap re-sampling was done. Results Three hundred thirty geriatric patients were enrolled. We found three independent mortality predictors: Leukocytosis (WBC >12,000 cells/mm3), Severe coma (GCS ≤ 8), and Thrombocytopenia (platelets <150 103/mm3) (LST). After assigning weights to each predictor, we developed a Geriatric Fever Score that stratifies patients into two mortality-risk and disposition groups: low (4.0%) (95% CI: 2.3–6.9%): a general ward or treatment in the ED then discharge and high (30.3%) (95% CI: 17.4–47.3%): consider the intensive care unit. The area under the curve for the rule was 0.73. Conclusions We found that the Geriatric Fever Score is a simple and rapid rule for predicting 30-day mortality and classifying mortality risk and disposition in geriatric patients with fever, although external validation should be performed to confirm its usefulness in other clinical settings. It might help preserve medical resources for patients in greater need. PMID:25340811
NASA Astrophysics Data System (ADS)
Adjorlolo, Clement; Mutanga, Onisimo; Cho, Moses A.; Ismail, Riyad
2013-04-01
In this paper, a user-defined inter-band correlation filter function was used to resample hyperspectral data and thereby mitigate the problem of multicollinearity in classification analysis. The proposed resampling technique convolves the spectral dependence information between a chosen band-centre and its shorter and longer wavelength neighbours. Weighting threshold of inter-band correlation (WTC, Pearson's r) was calculated, whereby r = 1 at the band-centre. Various WTC (r = 0.99, r = 0.95 and r = 0.90) were assessed, and bands with coefficients beyond a chosen threshold were assigned r = 0. The resultant data were used in the random forest analysis to classify in situ C3 and C4 grass canopy reflectance. The respective WTC datasets yielded improved classification accuracies (kappa = 0.82, 0.79 and 0.76) with less correlated wavebands when compared to resampled Hyperion bands (kappa = 0.76). Overall, the results obtained from this study suggested that resampling of hyperspectral data should account for the spectral dependence information to improve overall classification accuracy as well as reducing the problem of multicollinearity.
Lin, Jyh-Jiuan; Chang, Ching-Hui; Pal, Nabendu
2015-01-01
To test the mutual independence of two qualitative variables (or attributes), it is a common practice to follow the Chi-square tests (Pearson's as well as likelihood ratio test) based on data in the form of a contingency table. However, it should be noted that these popular Chi-square tests are asymptotic in nature and are useful when the cell frequencies are "not too small." In this article, we explore the accuracy of the Chi-square tests through an extensive simulation study and then propose their bootstrap versions that appear to work better than the asymptotic Chi-square tests. The bootstrap tests are useful even for small-cell frequencies as they maintain the nominal level quite accurately. Also, the proposed bootstrap tests are more convenient than the Fisher's exact test which is often criticized for being too conservative. Finally, all test methods are applied to a few real-life datasets for demonstration purposes.
Révész, Kinga M; Landwehr, Jurate M
2002-01-01
A new method was developed to analyze the stable carbon and oxygen isotope ratios of small samples (400 +/- 20 micro g) of calcium carbonate. This new method streamlines the classical phosphoric acid/calcium carbonate (H(3)PO(4)/CaCO(3)) reaction method by making use of a recently available Thermoquest-Finnigan GasBench II preparation device and a Delta Plus XL continuous flow isotope ratio mass spectrometer. Conditions for which the H(3)PO(4)/CaCO(3) reaction produced reproducible and accurate results with minimal error had to be determined. When the acid/carbonate reaction temperature was kept at 26 degrees C and the reaction time was between 24 and 54 h, the precision of the carbon and oxygen isotope ratios for pooled samples from three reference standard materials was =0.1 and =0.2 per mill or per thousand, respectively, although later analysis showed that materials from one specific standard required reaction time between 34 and 54 h for delta(18)O to achieve this level of precision. Aliquot screening methods were shown to further minimize the total error. The accuracy and precision of the new method were analyzed and confirmed by statistical analysis. The utility of the method was verified by analyzing calcite from Devils Hole, Nevada, for which isotope-ratio values had previously been obtained by the classical method. Devils Hole core DH-11 recently had been re-cut and re-sampled, and isotope-ratio values were obtained using the new method. The results were comparable with those obtained by the classical method with correlation = +0.96 for both isotope ratios. The consistency of the isotopic results is such that an alignment offset could be identified in the re-sampled core material, and two cutting errors that occurred during re-sampling then were confirmed independently. This result indicates that the new method is a viable alternative to the classical reaction method. In particular, the new method requires less sample material permitting finer resolution and allows automation of some processes resulting in considerable time savings.
Carroll, E L; Fewster, R M; Childerhouse, S J; Patenaude, N J; Boren, L; Baker, C S
2016-01-01
Juvenile survival and recruitment can be more sensitive to environmental, ecological and anthropogenic factors than adult survival, influencing population-level processes like recruitment and growth rate in long-lived, iteroparous species such as southern right whales. Conventionally, Southern right whales are individually identified using callosity patterns, which do not stabilise until 6-12 months, by which time the whale has left its natal wintering grounds. Here we use DNA profiling of skin biopsy samples to identify individual Southern right whales from year of birth and document their return to the species' primary wintering ground in New Zealand waters, the Subantarctic Auckland Islands. We find evidence of natal fidelity to the New Zealand wintering ground by the recapture of 15 of 57 whales, first sampled in year of birth and available for subsequent recapture, during winter surveys to the Auckland Islands in 1995-1998 and 2006-2009. Four individuals were recaptured at the ages of 9 to 11, including two females first sampled as calves in 1998 and subsequently resampled as cows with calves in 2007. Using these capture-recapture records of known-age individuals, we estimate changes in survival with age using Cormack-Jolly-Seber models. Survival is modelled using discrete age classes and as a continuous function of age. Using a bootstrap method to account for uncertainty in model selection and fitting, we provide the first direct estimate of juvenile survival for this population. Our analyses indicate a high annual apparent survival for juveniles at between 0.87 (standard error (SE) 0.17, to age 1) and 0.95 (SE 0.05: ages 2-8). Individual identification by DNA profiling is an effective method for long-term demographic and genetic monitoring, particularly in animals that change identifiable features as they develop or experience tag loss over time.
Cella, Laura; Liuzzi, Raffaele; Conson, Manuel; D'Avino, Vittoria; Salvatore, Marco; Pacelli, Roberto
2012-12-27
Hypothyroidism is a frequent late side effect of radiation therapy of the cervical region. Purpose of this work is to develop multivariate normal tissue complication probability (NTCP) models for radiation-induced hypothyroidism (RHT) and to compare them with already existing NTCP models for RHT. Fifty-three patients treated with sequential chemo-radiotherapy for Hodgkin's lymphoma (HL) were retrospectively reviewed for RHT events. Clinical information along with thyroid gland dose distribution parameters were collected and their correlation to RHT was analyzed by Spearman's rank correlation coefficient (Rs). Multivariate logistic regression method using resampling methods (bootstrapping) was applied to select model order and parameters for NTCP modeling. Model performance was evaluated through the area under the receiver operating characteristic curve (AUC). Models were tested against external published data on RHT and compared with other published NTCP models. If we express the thyroid volume exceeding X Gy as a percentage (Vx(%)), a two-variable NTCP model including V30(%) and gender resulted to be the optimal predictive model for RHT (Rs = 0.615, p < 0.001. AUC = 0.87). Conversely, if absolute thyroid volume exceeding X Gy (Vx(cc)) was analyzed, an NTCP model based on 3 variables including V30(cc), thyroid gland volume and gender was selected as the most predictive model (Rs = 0.630, p < 0.001. AUC = 0.85). The three-variable model performs better when tested on an external cohort characterized by large inter-individuals variation in thyroid volumes (AUC = 0.914, 95% CI 0.760-0.984). A comparable performance was found between our model and that proposed in the literature based on thyroid gland mean dose and volume (p = 0.264). The absolute volume of thyroid gland exceeding 30 Gy in combination with thyroid gland volume and gender provide an NTCP model for RHT with improved prediction capability not only within our patient population but also in an external cohort.
Volume effects of late term normal tissue toxicity in prostate cancer radiotherapy
NASA Astrophysics Data System (ADS)
Bonta, Dacian Viorel
Modeling of volume effects for treatment toxicity is paramount for optimization of radiation therapy. This thesis proposes a new model for calculating volume effects in gastro-intestinal and genito-urinary normal tissue complication probability (NTCP) following radiation therapy for prostate carcinoma. The radiobiological and the pathological basis for this model and its relationship to other models are detailed. A review of the radiobiological experiments and published clinical data identified salient features and specific properties a biologically adequate model has to conform to. The new model was fit to a set of actual clinical data. In order to verify the goodness of fit, two established NTCP models and a non-NTCP measure for complication risk were fitted to the same clinical data. The method of fit for the model parameters was maximum likelihood estimation. Within the framework of the maximum likelihood approach I estimated the parameter uncertainties for each complication prediction model. The quality-of-fit was determined using the Aikaike Information Criterion. Based on the model that provided the best fit, I identified the volume effects for both types of toxicities. Computer-based bootstrap resampling of the original dataset was used to estimate the bias and variance for the fitted parameter values. Computer simulation was also used to estimate the population size that generates a specific uncertainty level (3%) in the value of predicted complication probability. The same method was used to estimate the size of the patient population needed for accurate choice of the model underlying the NTCP. The results indicate that, depending on the number of parameters of a specific NTCP model, 100 (for two parameter models) and 500 patients (for three parameter models) are needed for accurate parameter fit. Correlation of complication occurrence in patients was also investigated. The results suggest that complication outcomes are correlated in a patient, although the correlation coefficient is rather small.
DuBay, Derek A.; Redden, David T.; Bryant, Mary K.; Dorn, David P; Fouad, Mona N.; Gray, Stephen H.; White, Jared A.; Locke, Jayme E.; Meeks, Christopher B.; Taylor, Garry C.; Kilgore, Meredith L.; Eckhoff, Devin E.
2014-01-01
Background The strategy of evaluating every donation opportunity warrants an investigation into the financial feasibility of this practice. The purpose of this investigation is to measure resource utilization required for procurement of transplantable organs in an organ procurement organization (OPO). Methods Donors were stratified into those that met OPTN-defined eligible death criteria (ED Donors, n=589) and those that did not (NED Donors, n=703). Variable direct costs and time utilization by OPO staff for organ procurement were measured and amortized per organ transplanted using permutation methods and statistical bootstrapping/resampling approaches. Results More organs per donor were procured (3.66 ± 1.2 vs. 2.34 ± 0.8, p<0.0001) and transplanted (3.51 ± 1.2 vs. 2.08 ± 0.8, p<0.0001) in ED donors compared to NED donors. The variable direct costs were significantly lower in NED donors ($29,879.4 ± 11590.1 vs. $19,019.6 ± 7599.60, p<0.0001). In contrast, the amortized variable direct costs per organ transplanted were significantly higher in the NED donors ($8,414.5 ± 138.29 vs. $9,272.04 ± 344.56, p<0.0001). ED donors where thoracic organ procurement occurred were 67% more expensive than in abdominal-only organ procurement. The total time allocated per donor was significantly shorter in NED donors (91.2 ± 44.9 hours vs. 86.8 ± 78.6, p=0.01). In contrast, the amortized time per organ transplanted was significantly longer in the NED donors (23.1 ± 0.8 hours vs. 36.9 ± 3.2, p<0.001). Discussion The variable direct costs and time allocated per organ transplanted is significantly higher in donors that do not meet the eligible death criteria. PMID:24503760
Carroll, E. L.; Fewster, R. M.; Childerhouse, S. J.; Patenaude, N. J.; Boren, L.; Baker, C. S.
2016-01-01
Juvenile survival and recruitment can be more sensitive to environmental, ecological and anthropogenic factors than adult survival, influencing population-level processes like recruitment and growth rate in long-lived, iteroparous species such as southern right whales. Conventionally, Southern right whales are individually identified using callosity patterns, which do not stabilise until 6–12 months, by which time the whale has left its natal wintering grounds. Here we use DNA profiling of skin biopsy samples to identify individual Southern right whales from year of birth and document their return to the species’ primary wintering ground in New Zealand waters, the Subantarctic Auckland Islands. We find evidence of natal fidelity to the New Zealand wintering ground by the recapture of 15 of 57 whales, first sampled in year of birth and available for subsequent recapture, during winter surveys to the Auckland Islands in 1995–1998 and 2006–2009. Four individuals were recaptured at the ages of 9 to 11, including two females first sampled as calves in 1998 and subsequently resampled as cows with calves in 2007. Using these capture-recapture records of known-age individuals, we estimate changes in survival with age using Cormack-Jolly-Seber models. Survival is modelled using discrete age classes and as a continuous function of age. Using a bootstrap method to account for uncertainty in model selection and fitting, we provide the first direct estimate of juvenile survival for this population. Our analyses indicate a high annual apparent survival for juveniles at between 0.87 (standard error (SE) 0.17, to age 1) and 0.95 (SE 0.05: ages 2–8). Individual identification by DNA profiling is an effective method for long-term demographic and genetic monitoring, particularly in animals that change identifiable features as they develop or experience tag loss over time. PMID:26751689
Simulating realistic predator signatures in quantitative fatty acid signature analysis
Bromaghin, Jeffrey F.
2015-01-01
Diet estimation is an important field within quantitative ecology, providing critical insights into many aspects of ecology and community dynamics. Quantitative fatty acid signature analysis (QFASA) is a prominent method of diet estimation, particularly for marine mammal and bird species. Investigators using QFASA commonly use computer simulation to evaluate statistical characteristics of diet estimators for the populations they study. Similar computer simulations have been used to explore and compare the performance of different variations of the original QFASA diet estimator. In both cases, computer simulations involve bootstrap sampling prey signature data to construct pseudo-predator signatures with known properties. However, bootstrap sample sizes have been selected arbitrarily and pseudo-predator signatures therefore may not have realistic properties. I develop an algorithm to objectively establish bootstrap sample sizes that generates pseudo-predator signatures with realistic properties, thereby enhancing the utility of computer simulation for assessing QFASA estimator performance. The algorithm also appears to be computationally efficient, resulting in bootstrap sample sizes that are smaller than those commonly used. I illustrate the algorithm with an example using data from Chukchi Sea polar bears (Ursus maritimus) and their marine mammal prey. The concepts underlying the approach may have value in other areas of quantitative ecology in which bootstrap samples are post-processed prior to their use.
NASA Astrophysics Data System (ADS)
Hasegawa, Chika; Nakayama, Yu
2018-03-01
In this paper, we solve the two-point function of the lowest dimensional scalar operator in the critical ϕ4 theory on 4 ‑ 𝜖 dimensional real projective space in three different methods. The first is to use the conventional perturbation theory, and the second is to impose the cross-cap bootstrap equation, and the third is to solve the Schwinger-Dyson equation under the assumption of conformal invariance. We find that the three methods lead to mutually consistent results but each has its own advantage.
ERIC Educational Resources Information Center
Nevitt, Johnathan; Hancock, Gregory R.
Though common structural equation modeling (SEM) methods are predicated upon the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to use distribution-free estimation methods. Fortunately, promising alternatives are being integrated into…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, S; Vedantham, S; Karellas, A
Purpose: Detectors with hexagonal pixels require resampling to square pixels for distortion-free display of acquired images. In this work, the presampling modulation transfer function (MTF) of a hexagonal pixel array photon-counting CdTe detector for region-of-interest fluoroscopy was measured and the optimal square pixel size for resampling was determined. Methods: A 0.65mm thick CdTe Schottky sensor capable of concurrently acquiring up to 3 energy-windowed images was operated in a single energy-window mode to include ≥10 KeV photons. The detector had hexagonal pixels with apothem of 30 microns resulting in pixel spacing of 60 and 51.96 microns along the two orthogonal directions.more » Images of a tungsten edge test device acquired under IEC RQA5 conditions were double Hough transformed to identify the edge and numerically differentiated. The presampling MTF was determined from the finely sampled line spread function that accounted for the hexagonal sampling. The optimal square pixel size was determined in two ways; the square pixel size for which the aperture function evaluated at the Nyquist frequencies along the two orthogonal directions matched that from the hexagonal pixel aperture functions, and the square pixel size for which the mean absolute difference between the square and hexagonal aperture functions was minimized over all frequencies up to the Nyquist limit. Results: Evaluation of the aperture functions over the entire frequency range resulted in square pixel size of 53 microns with less than 2% difference from the hexagonal pixel. Evaluation of the aperture functions at Nyquist frequencies alone resulted in 54 microns square pixels. For the photon-counting CdTe detector and after resampling to 53 microns square pixels using quadratic interpolation, the presampling MTF at Nyquist frequency of 9.434 cycles/mm along the two directions were 0.501 and 0.507. Conclusion: Hexagonal pixel array photon-counting CdTe detector after resampling to square pixels provides high-resolution imaging suitable for fluoroscopy.« less
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.
Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.
Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data
Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.
2016-01-01
Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732
Image restoration techniques as applied to Landsat MSS and TM data
Meyer, David
1987-01-01
Two factors are primarily responsible for the loss of image sharpness in processing digital Landsat images. The first factor is inherent in the data because the sensor's optics and electronics, along with other sensor elements, blur and smear the data. Digital image restoration can be used to reduce this degradation. The second factor, which further degrades by blurring or aliasing, is the resampling performed during geometric correction. An image restoration procedure, when used in place of typical resampled techniques, reduces sensor degradation without introducing the artifacts associated with resampling. The EROS Data Center (EDC) has implemented the restoration proceed for Landsat multispectral scanner (MSS) and thematic mapper (TM) data. This capability, developed at the University of Arizona by Dr. Robert Schowengerdt and Lynette Wood, combines restoration and resampling in a single step to produce geometrically corrected MSS and TM imagery. As with resampling, restoration demands a tradeoff be made between aliasing, which occurs when attempting to extract maximum sharpness from an image, and blurring, which reduces the aliasing problem but sacrifices image sharpness. The restoration procedure used at EDC minimizes these artifacts by being adaptive, tailoring the tradeoff to be optimal for individual images.
NASA Astrophysics Data System (ADS)
Arnaud, Patrick; Cantet, Philippe; Odry, Jean
2017-11-01
Flood frequency analyses (FFAs) are needed for flood risk management. Many methods exist ranging from classical purely statistical approaches to more complex approaches based on process simulation. The results of these methods are associated with uncertainties that are sometimes difficult to estimate due to the complexity of the approaches or the number of parameters, especially for process simulation. This is the case of the simulation-based FFA approach called SHYREG presented in this paper, in which a rainfall generator is coupled with a simple rainfall-runoff model in an attempt to estimate the uncertainties due to the estimation of the seven parameters needed to estimate flood frequencies. The six parameters of the rainfall generator are mean values, so their theoretical distribution is known and can be used to estimate the generator uncertainties. In contrast, the theoretical distribution of the single hydrological model parameter is unknown; consequently, a bootstrap method is applied to estimate the calibration uncertainties. The propagation of uncertainty from the rainfall generator to the hydrological model is also taken into account. This method is applied to 1112 basins throughout France. Uncertainties coming from the SHYREG method and from purely statistical approaches are compared, and the results are discussed according to the length of the recorded observations, basin size and basin location. Uncertainties of the SHYREG method decrease as the basin size increases or as the length of the recorded flow increases. Moreover, the results show that the confidence intervals of the SHYREG method are relatively small despite the complexity of the method and the number of parameters (seven). This is due to the stability of the parameters and takes into account the dependence of uncertainties due to the rainfall model and the hydrological calibration. Indeed, the uncertainties on the flow quantiles are on the same order of magnitude as those associated with the use of a statistical law with two parameters (here generalised extreme value Type I distribution) and clearly lower than those associated with the use of a three-parameter law (here generalised extreme value Type II distribution). For extreme flood quantiles, the uncertainties are mostly due to the rainfall generator because of the progressive saturation of the hydrological model.
Study on the Classification of GAOFEN-3 Polarimetric SAR Images Using Deep Neural Network
NASA Astrophysics Data System (ADS)
Zhang, J.; Zhang, J.; Zhao, Z.
2018-04-01
Polarimetric Synthetic Aperture Radar (POLSAR) imaging principle determines that the image quality will be affected by speckle noise. So the recognition accuracy of traditional image classification methods will be reduced by the effect of this interference. Since the date of submission, Deep Convolutional Neural Network impacts on the traditional image processing methods and brings the field of computer vision to a new stage with the advantages of a strong ability to learn deep features and excellent ability to fit large datasets. Based on the basic characteristics of polarimetric SAR images, the paper studied the types of the surface cover by using the method of Deep Learning. We used the fully polarimetric SAR features of different scales to fuse RGB images to the GoogLeNet model based on convolution neural network Iterative training, and then use the trained model to test the classification of data validation.First of all, referring to the optical image, we mark the surface coverage type of GF-3 POLSAR image with 8m resolution, and then collect the samples according to different categories. To meet the GoogLeNet model requirements of 256 × 256 pixel image input and taking into account the lack of full-resolution SAR resolution, the original image should be pre-processed in the process of resampling. In this paper, POLSAR image slice samples of different scales with sampling intervals of 2 m and 1 m to be trained separately and validated by the verification dataset. Among them, the training accuracy of GoogLeNet model trained with resampled 2-m polarimetric SAR image is 94.89 %, and that of the trained SAR image with resampled 1 m is 92.65 %.
Measures of precision for dissimilarity-based multivariate analysis of ecological communities.
Anderson, Marti J; Santana-Garcon, Julia
2015-01-01
Ecological studies require key decisions regarding the appropriate size and number of sampling units. No methods currently exist to measure precision for multivariate assemblage data when dissimilarity-based analyses are intended to follow. Here, we propose a pseudo multivariate dissimilarity-based standard error (MultSE) as a useful quantity for assessing sample-size adequacy in studies of ecological communities. Based on sums of squared dissimilarities, MultSE measures variability in the position of the centroid in the space of a chosen dissimilarity measure under repeated sampling for a given sample size. We describe a novel double resampling method to quantify uncertainty in MultSE values with increasing sample size. For more complex designs, values of MultSE can be calculated from the pseudo residual mean square of a permanova model, with the double resampling done within appropriate cells in the design. R code functions for implementing these techniques, along with ecological examples, are provided. © 2014 The Authors. Ecology Letters published by John Wiley & Sons Ltd and CNRS.
NASA Astrophysics Data System (ADS)
Vallières, M.; Freeman, C. R.; Skamene, S. R.; El Naqa, I.
2015-07-01
This study aims at developing a joint FDG-PET and MRI texture-based model for the early evaluation of lung metastasis risk in soft-tissue sarcomas (STSs). We investigate if the creation of new composite textures from the combination of FDG-PET and MR imaging information could better identify aggressive tumours. Towards this goal, a cohort of 51 patients with histologically proven STSs of the extremities was retrospectively evaluated. All patients had pre-treatment FDG-PET and MRI scans comprised of T1-weighted and T2-weighted fat-suppression sequences (T2FS). Nine non-texture features (SUV metrics and shape features) and forty-one texture features were extracted from the tumour region of separate (FDG-PET, T1 and T2FS) and fused (FDG-PET/T1 and FDG-PET/T2FS) scans. Volume fusion of the FDG-PET and MRI scans was implemented using the wavelet transform. The influence of six different extraction parameters on the predictive value of textures was investigated. The incorporation of features into multivariable models was performed using logistic regression. The multivariable modeling strategy involved imbalance-adjusted bootstrap resampling in the following four steps leading to final prediction model construction: (1) feature set reduction; (2) feature selection; (3) prediction performance estimation; and (4) computation of model coefficients. Univariate analysis showed that the isotropic voxel size at which texture features were extracted had the most impact on predictive value. In multivariable analysis, texture features extracted from fused scans significantly outperformed those from separate scans in terms of lung metastases prediction estimates. The best performance was obtained using a combination of four texture features extracted from FDG-PET/T1 and FDG-PET/T2FS scans. This model reached an area under the receiver-operating characteristic curve of 0.984 ± 0.002, a sensitivity of 0.955 ± 0.006, and a specificity of 0.926 ± 0.004 in bootstrapping evaluations. Ultimately, lung metastasis risk assessment at diagnosis of STSs could improve patient outcomes by allowing better treatment adaptation.
How Much Can Remotely-Sensed Natural Resource Inventories Benefit from Finer Spatial Resolutions?
NASA Astrophysics Data System (ADS)
Hou, Z.; Xu, Q.; McRoberts, R. E.; Ståhl, G.; Greenberg, J. A.
2017-12-01
For remote sensing facilitated natural resource inventories, the effects of spatial resolution in the form of pixel size and the effects of subpixel information on estimates of population parameters were evaluated by comparing results obtained using Landsat 8 and RapidEye auxiliary imagery. The study area was in Burkina Faso, and the variable of interest was the stem volume (m3/ha) convertible to the woodland aboveground biomass. A sample consisting of 160 field plots was selected and measured from the population following a two-stage sampling design. Models were fit using weighted least squares; the population mean, mu, and the variance of the estimator of the population mean, Var(mu.hat), were estimated in two inferential frameworks, model-based and model-assisted, and compared; for each framework, Var(mu.hat) was estimated both analytically and empirically. Empirical variances were estimated with bootstrapping that for resampling takes clustering effects into account. The primary results were twofold. First, for the effects of spatial resolution and subpixel information, four conclusions are relevant: (1) finer spatial resolution imagery indeed contributes to greater precision for estimators of population parameter, but this increase is slight at a maximum rate of 20% considering that RapidEye data are 36 times finer resolution than Landsat 8 data; (2) subpixel information on texture is marginally beneficial when it comes to making inference for population of large areas; (3) cost-effectiveness is more favorable for the free of charge Landsat 8 imagery than RapidEye imagery; and (4) for a given plot size, candidate remote sensing auxiliary datasets are more cost-effective when their spatial resolutions are similar to the plot size than with much finer alternatives. Second, for the comparison between estimators, three conclusions are relevant: (1) model-based variance estimates are consistent with each other and about half as large as stabilized model-assisted estimates, suggesting superior effectiveness of model-based inference to model-assisted inference; (2) bootstrapping is an effective alternative to analytical variance estimators; and (3) prediction accuracy expressed by RMSE is useful for screening candidate models to be used for population inferences.
ClonEvol: clonal ordering and visualization in cancer sequencing.
Dang, H X; White, B S; Foltz, S M; Miller, C A; Luo, J; Fields, R C; Maher, C A
2017-12-01
Reconstruction of clonal evolution is critical for understanding tumor progression and implementing personalized therapies. This is often done by clustering somatic variants based on their cellular prevalence estimated via bulk tumor sequencing of multiple samples. The clusters, consisting of the clonal marker variants, are then ordered based on their estimated cellular prevalence to reconstruct clonal evolution trees, a process referred to as 'clonal ordering'. However, cellular prevalence estimate is confounded by statistical variability and errors in sequencing/data analysis, and therefore inhibits accurate reconstruction of the clonal evolution. This problem is further complicated by intra- and inter-tumor heterogeneity. Furthermore, the field lacks a comprehensive visualization tool to facilitate the interpretation of complex clonal relationships. To address these challenges we developed ClonEvol, a unified software tool for clonal ordering, visualization, and interpretation. ClonEvol uses a bootstrap resampling technique to estimate the cellular fraction of the clones and probabilistically models the clonal ordering constraints to account for statistical variability. The bootstrapping allows identification of the sample founding- and sub-clones, thus enabling interpretation of clonal seeding. ClonEvol automates the generation of multiple widely used visualizations for reconstructing and interpreting clonal evolution. ClonEvol outperformed three of the state of the art tools (LICHeE, Canopy and PhyloWGS) for clonal evolution inference, showing more robust error tolerance and producing more accurate trees in a simulation. Building upon multiple recent publications that utilized ClonEvol to study metastasis and drug resistance in solid cancers, here we show that ClonEvol rediscovered relapsed subclones in two published acute myeloid leukemia patients. Furthermore, we demonstrated that through noninvasive monitoring ClonEvol recapitulated the emerging subclones throughout metastatic progression observed in the tumors of a published breast cancer patient. ClonEvol has broad applicability for longitudinal monitoring of clonal populations in tumor biopsies, or noninvasively, to guide precision medicine. ClonEvol is written in R and is available at https://github.com/ChrisMaherLab/ClonEvol. © The Author 2017. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Li, Hao; Dong, Siping
2015-01-01
China has long been stuck in applying traditional data envelopment analysis (DEA) models to measure technical efficiency of public hospitals without bias correction of efficiency scores. In this article, we have introduced the Bootstrap-DEA approach from the international literature to analyze the technical efficiency of public hospitals in Tianjin (China) and tried to improve the application of this method for benchmarking and inter-organizational learning. It is found that the bias corrected efficiency scores of Bootstrap-DEA differ significantly from those of the traditional Banker, Charnes, and Cooper (BCC) model, which means that Chinese researchers need to update their DEA models for more scientific calculation of hospital efficiency scores. Our research has helped shorten the gap between China and the international world in relative efficiency measurement and improvement of hospitals. It is suggested that Bootstrap-DEA be widely applied into afterward research to measure relative efficiency and productivity of Chinese hospitals so as to better serve for efficiency improvement and related decision making. © The Author(s) 2015.
Kent, Robert; Landon, Matthew K.
2016-01-01
From 2004 to 2011, the U.S. Geological Survey collected samples from 1686 wells across the State of California as part of the California State Water Resources Control Board’s Groundwater Ambient Monitoring and Assessment (GAMA) Priority Basin Project (PBP). From 2007 to 2013, 224 of these wells were resampled to assess temporal trends in water quality. The samples were analyzed for 216 water-quality constituents, including inorganic and organic compounds as well as isotopic tracers. The resampled wells were grouped into five hydrogeologic zones. A nonparametric hypothesis test was used to test the differences between initial sampling and resampling results to evaluate possible step trends in water-quality, statewide, and within each hydrogeologic zone. The hypothesis tests were performed on the 79 constituents that were detected in more than 5 % of the samples collected during either sampling period in at least one hydrogeologic zone. Step trends were detected for 17 constituents. Increasing trends were detected for alkalinity, aluminum, beryllium, boron, lithium, orthophosphate, perchlorate, sodium, and specific conductance. Decreasing trends were detected for atrazine, cobalt, dissolved oxygen, lead, nickel, pH, simazine, and tritium. Tritium was expected to decrease due to decreasing values in precipitation, and the detection of decreases indicates that the method is capable of resolving temporal trends.
Kent, Robert; Belitz, Kenneth; Fram, Miranda S.
2014-01-01
The Priority Basin Project (PBP) of the Groundwater Ambient Monitoring and Assessment (GAMA) Program was developed in response to the Groundwater Quality Monitoring Act of 2001 and is being conducted by the U.S. Geological Survey (USGS) in cooperation with the California State Water Resources Control Board (SWRCB). The GAMA-PBP began sampling, primarily public supply wells in May 2004. By the end of February 2006, seven (of what would eventually be 35) study units had been sampled over a wide area of the State. Selected wells in these first seven study units were resampled for water quality from August 2007 to November 2008 as part of an assessment of temporal trends in water quality by the GAMA-PBP. The initial sampling was designed to provide a spatially unbiased assessment of the quality of raw groundwater used for public water supplies within the seven study units. In the 7 study units, 462 wells were selected by using a spatially distributed, randomized grid-based method to provide statistical representation of the study area. Wells selected this way are referred to as grid wells or status wells. Approximately 3 years after the initial sampling, 55 of these previously sampled status wells (approximately 10 percent in each study unit) were randomly selected for resampling. The seven resampled study units, the total number of status wells sampled for each study unit, and the number of these wells resampled for trends are as follows, in chronological order of sampling: San Diego Drainages (53 status wells, 7 trend wells), North San Francisco Bay (84, 10), Northern San Joaquin Basin (51, 5), Southern Sacramento Valley (67, 7), San Fernando–San Gabriel (35, 6), Monterey Bay and Salinas Valley Basins (91, 11), and Southeast San Joaquin Valley (83, 9). The groundwater samples were analyzed for a large number of synthetic organic constituents (volatile organic compounds [VOCs], pesticides, and pesticide degradates), constituents of special interest (perchlorate, N-nitrosodimethylamine [NDMA], and 1,2,3-trichloropropane [1,2,3-TCP]), and naturally-occurring inorganic constituents (nutrients, major and minor ions, and trace elements). Naturally-occurring isotopes (tritium, carbon-14, and stable isotopes of hydrogen and oxygen in water) also were measured to help identify processes affecting groundwater quality and the sources and ages of the sampled groundwater. Nearly 300 constituents and water-quality indicators were investigated. Quality-control samples (blanks, replicates, and samples for matrix spikes) were collected at 24 percent of the 55 status wells resampled for trends, and the results for these samples were used to evaluate the quality of the data for the groundwater samples. Field blanks rarely contained detectable concentrations of any constituent, suggesting that contamination was not a noticeable source of bias in the data for the groundwater samples. Differences between replicate samples were mostly within acceptable ranges, indicating acceptably low variability in analytical results. Matrix-spike recoveries were within the acceptable range (70 to 130 percent) for 75 percent of the compounds for which matrix spikes were collected. This study did not attempt to evaluate the quality of water delivered to consumers. After withdrawal, groundwater typically is treated, disinfected, and blended with other waters to maintain acceptable water quality. The benchmarks used in this report apply to treated water that is served to the consumer, not to untreated groundwater. To provide some context for the results, however, concentrations of constituents measured in these groundwater samples were compared with benchmarks established by the U.S. Environmental Protection Agency (USEPA) and California Department of Public Health (CDPH). Comparisons between data collected for this study and benchmarks for drinking water are for illustrative purposes only and are not indicative of compliance or non-compliance with those benchmarks. Most constituents that were detected in groundwater samples from the trend wells were found at concentrations less than drinking-water benchmarks. Four VOCs—trichloroethene, tetrachloroethene, 1,2-dibromo-3-chloropropane, and methyl tert-butyl ether—were detected in one or more wells at concentrations greater than their health-based benchmarks, and six VOCs were detected in at least 10 percent of the samples during initial sampling or resampling of the trend wells. No pesticides were detected at concentrations near or greater than their health-based benchmarks. Three pesticide constituents—atrazine, deethylatrazine, and simazine—were detected in more than 10 percent of the trend-well samples during both sampling periods. Perchlorate, a constituent of special interest, was detected more frequently, and at greater concentrations during resampling than during initial sampling, but this may be due to a change in analytical method between the sampling periods, rather than to a change in groundwater quality. Another constituent of special interest, 1,2,3-TCP, was also detected more frequently during resampling than during initial sampling, but this pattern also may not reflect a change in groundwater quality. Samples from several of the wells where 1,2,3-TCP was detected by low-concentration-level analysis during resampling were not analyzed for 1,2,3-TCP using a low-level method during initial sampling. Most detections of nutrients and trace elements in samples from trend wells were less than health-based benchmarks during both sampling periods. Exceptions include nitrate, arsenic, boron, and vanadium, all detected at concentrations greater than their health-based benchmarks in at least one well during both sampling periods, and molybdenum, detected at concentrations greater than its health-based benchmark during resampling only. The isotopic ratios of oxygen and hydrogen in water and tritium and carbon-14 activities generally changed little between sampling periods, suggesting that the predominant sources and ages of groundwater in most trend wells were consistent between the sampling periods.
Confidence Limits for the Indirect Effect: Distribution of the Product and Resampling Methods
ERIC Educational Resources Information Center
MacKinnon, David P.; Lockwood, Chondra M.; Williams, Jason
2004-01-01
The most commonly used method to test an indirect effect is to divide the estimate of the indirect effect by its standard error and compare the resulting z statistic with a critical value from the standard normal distribution. Confidence limits for the indirect effect are also typically based on critical values from the standard normal…
A note on the kappa statistic for clustered dichotomous data.
Zhou, Ming; Yang, Zhao
2014-06-30
The kappa statistic is widely used to assess the agreement between two raters. Motivated by a simulation-based cluster bootstrap method to calculate the variance of the kappa statistic for clustered physician-patients dichotomous data, we investigate its special correlation structure and develop a new simple and efficient data generation algorithm. For the clustered physician-patients dichotomous data, based on the delta method and its special covariance structure, we propose a semi-parametric variance estimator for the kappa statistic. An extensive Monte Carlo simulation study is performed to evaluate the performance of the new proposal and five existing methods with respect to the empirical coverage probability, root-mean-square error, and average width of the 95% confidence interval for the kappa statistic. The variance estimator ignoring the dependence within a cluster is generally inappropriate, and the variance estimators from the new proposal, bootstrap-based methods, and the sampling-based delta method perform reasonably well for at least a moderately large number of clusters (e.g., the number of clusters K ⩾50). The new proposal and sampling-based delta method provide convenient tools for efficient computations and non-simulation-based alternatives to the existing bootstrap-based methods. Moreover, the new proposal has acceptable performance even when the number of clusters is as small as K = 25. To illustrate the practical application of all the methods, one psychiatric research data and two simulated clustered physician-patients dichotomous data are analyzed. Copyright © 2014 John Wiley & Sons, Ltd.
Construction of prediction intervals for Palmer Drought Severity Index using bootstrap
NASA Astrophysics Data System (ADS)
Beyaztas, Ufuk; Bickici Arikan, Bugrayhan; Beyaztas, Beste Hamiye; Kahya, Ercan
2018-04-01
In this study, we propose an approach based on the residual-based bootstrap method to obtain valid prediction intervals using monthly, short-term (three-months) and mid-term (six-months) drought observations. The effects of North Atlantic and Arctic Oscillation indexes on the constructed prediction intervals are also examined. Performance of the proposed approach is evaluated for the Palmer Drought Severity Index (PDSI) obtained from Konya closed basin located in Central Anatolia, Turkey. The finite sample properties of the proposed method are further illustrated by an extensive simulation study. Our results revealed that the proposed approach is capable of producing valid prediction intervals for future PDSI values.
2009-01-01
Background The International Commission on Radiological Protection (ICRP) recommended annual occupational dose limit is 20 mSv. Cancer mortality in Japanese A-bomb survivors exposed to less than 20 mSv external radiation in 1945 was analysed previously, using a latency model with non-linear dose response. Questions were raised regarding statistical inference with this model. Methods Cancers with over 100 deaths in the 0 - 20 mSv subcohort of the 1950-1990 Life Span Study are analysed with Poisson regression models incorporating latency, allowing linear and non-linear dose response. Bootstrap percentile and Bias-corrected accelerated (BCa) methods and simulation of the Likelihood Ratio Test lead to Confidence Intervals for Excess Relative Risk (ERR) and tests against the linear model. Results The linear model shows significant large, positive values of ERR for liver and urinary cancers at latencies from 37 - 43 years. Dose response below 20 mSv is strongly non-linear at the optimal latencies for the stomach (11.89 years), liver (36.9), lung (13.6), leukaemia (23.66), and pancreas (11.86) and across broad latency ranges. Confidence Intervals for ERR are comparable using Bootstrap and Likelihood Ratio Test methods and BCa 95% Confidence Intervals are strictly positive across latency ranges for all 5 cancers. Similar risk estimates for 10 mSv (lagged dose) are obtained from the 0 - 20 mSv and 5 - 500 mSv data for the stomach, liver, lung and leukaemia. Dose response for the latter 3 cancers is significantly non-linear in the 5 - 500 mSv range. Conclusion Liver and urinary cancer mortality risk is significantly raised using a latency model with linear dose response. A non-linear model is strongly superior for the stomach, liver, lung, pancreas and leukaemia. Bootstrap and Likelihood-based confidence intervals are broadly comparable and ERR is strictly positive by bootstrap methods for all 5 cancers. Except for the pancreas, similar estimates of latency and risk from 10 mSv are obtained from the 0 - 20 mSv and 5 - 500 mSv subcohorts. Large and significant cancer risks for Japanese survivors exposed to less than 20 mSv external radiation from the atomic bombs in 1945 cast doubt on the ICRP recommended annual occupational dose limit. PMID:20003238
Zhang, Yeqing; Wang, Meiling; Li, Yafeng
2018-01-01
For the objective of essentially decreasing computational complexity and time consumption of signal acquisition, this paper explores a resampling strategy and variable circular correlation time strategy specific to broadband multi-frequency GNSS receivers. In broadband GNSS receivers, the resampling strategy is established to work on conventional acquisition algorithms by resampling the main lobe of received broadband signals with a much lower frequency. Variable circular correlation time is designed to adapt to different signal strength conditions and thereby increase the operation flexibility of GNSS signal acquisition. The acquisition threshold is defined as the ratio of the highest and second highest correlation results in the search space of carrier frequency and code phase. Moreover, computational complexity of signal acquisition is formulated by amounts of multiplication and summation operations in the acquisition process. Comparative experiments and performance analysis are conducted on four sets of real GPS L2C signals with different sampling frequencies. The results indicate that the resampling strategy can effectively decrease computation and time cost by nearly 90–94% with just slight loss of acquisition sensitivity. With circular correlation time varying from 10 ms to 20 ms, the time cost of signal acquisition has increased by about 2.7–5.6% per millisecond, with most satellites acquired successfully. PMID:29495301
Zhang, Yeqing; Wang, Meiling; Li, Yafeng
2018-02-24
For the objective of essentially decreasing computational complexity and time consumption of signal acquisition, this paper explores a resampling strategy and variable circular correlation time strategy specific to broadband multi-frequency GNSS receivers. In broadband GNSS receivers, the resampling strategy is established to work on conventional acquisition algorithms by resampling the main lobe of received broadband signals with a much lower frequency. Variable circular correlation time is designed to adapt to different signal strength conditions and thereby increase the operation flexibility of GNSS signal acquisition. The acquisition threshold is defined as the ratio of the highest and second highest correlation results in the search space of carrier frequency and code phase. Moreover, computational complexity of signal acquisition is formulated by amounts of multiplication and summation operations in the acquisition process. Comparative experiments and performance analysis are conducted on four sets of real GPS L2C signals with different sampling frequencies. The results indicate that the resampling strategy can effectively decrease computation and time cost by nearly 90-94% with just slight loss of acquisition sensitivity. With circular correlation time varying from 10 ms to 20 ms, the time cost of signal acquisition has increased by about 2.7-5.6% per millisecond, with most satellites acquired successfully.
Yen, Glorian P; Davey, Adam; Ma, Grace X
2015-04-01
Biorepositories have been key resources in examining genetically-linked diseases, particularly cancer. Asian Americans contribute to biorepositories at lower rates than other racial groups, but the reasons for this are unclear. We hypothesized that attitudes toward biospecimen research mediate the relationship between demographic and healthcare access factors, and willingness to donate blood for research purposes among individuals of Korean heritage. Descriptive statistics and bivariate analyses were utilized to characterize the sample with respect to demographic, psychosocial, and behavioral variables. Structural equation modeling with 5000 re-sample bootstrapping was used to assess each component of the proposed simple mediation models. Attitudes towards biospecimen research fully mediate associations between age, income, number of years lived in the United States, and having a regular physician and willingness to donate blood for the purpose of research. Participants were willing to donate blood for the purpose of research despite having neutral feelings towards biospecimen research as a whole. Participants reported higher willingness to donate blood for research purposes when they were older, had lived in the United States longer, had higher income, and had a regular doctor that they visited. Many of the significant relationships between demographic and health care access factors, attitudes towards biospecimen research, and willingness to donate blood for the purpose of research may be explained by the extent of acculturation of the participants in the United States.
Khalaila, Rabia
2015-03-01
The impact of cognitive factors on academic achievement is well documented. However, little is known about the mediating and moderating effects of non-cognitive, motivational and situational factors on academic achievement among nursing students. The aim of this study is to explore the direct and/or indirect effects of academic self-concept on academic achievement, and examine whether intrinsic motivation moderates the negative effect of test anxiety on academic achievement. This descriptive-correlational study was carried out on a convenience sample of 170 undergraduate nursing students, in an academic college in northern Israel. Academic motivation, academic self-concept and test anxiety scales were used as measuring instruments. Bootstrapping with resampling strategies was used for testing multiple mediators' model and examining the moderator effect. A higher self-concept was found to be directly related to greater academic achievement. Test anxiety and intrinsic motivation were found to be significant mediators in the relationship between self-concept and academic achievement. In addition, intrinsic motivation significantly moderated the negative effect of test anxiety on academic achievement. The results suggested that institutions should pay more attention to the enhancement of motivational factors (e.g., self-concept and motivation) and alleviate the negative impact of situational factors (e.g., test anxiety) when offering psycho-educational interventions designed to improve nursing students' academic achievements. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
El Naqa, I.; Suneja, G.; Lindsay, P. E.; Hope, A. J.; Alaly, J. R.; Vicic, M.; Bradley, J. D.; Apte, A.; Deasy, J. O.
2006-11-01
Radiotherapy treatment outcome models are a complicated function of treatment, clinical and biological factors. Our objective is to provide clinicians and scientists with an accurate, flexible and user-friendly software tool to explore radiotherapy outcomes data and build statistical tumour control or normal tissue complications models. The software tool, called the dose response explorer system (DREES), is based on Matlab, and uses a named-field structure array data type. DREES/Matlab in combination with another open-source tool (CERR) provides an environment for analysing treatment outcomes. DREES provides many radiotherapy outcome modelling features, including (1) fitting of analytical normal tissue complication probability (NTCP) and tumour control probability (TCP) models, (2) combined modelling of multiple dose-volume variables (e.g., mean dose, max dose, etc) and clinical factors (age, gender, stage, etc) using multi-term regression modelling, (3) manual or automated selection of logistic or actuarial model variables using bootstrap statistical resampling, (4) estimation of uncertainty in model parameters, (5) performance assessment of univariate and multivariate analyses using Spearman's rank correlation and chi-square statistics, boxplots, nomograms, Kaplan-Meier survival plots, and receiver operating characteristics curves, and (6) graphical capabilities to visualize NTCP or TCP prediction versus selected variable models using various plots. DREES provides clinical researchers with a tool customized for radiotherapy outcome modelling. DREES is freely distributed. We expect to continue developing DREES based on user feedback.
Plate tectonics and continental basaltic geochemistry throughout Earth history
NASA Astrophysics Data System (ADS)
Keller, Brenhin; Schoene, Blair
2018-01-01
Basaltic magmas constitute the primary mass flux from Earth's mantle to its crust, carrying information about the conditions of mantle melting through which they were generated. As such, changes in the average basaltic geochemistry through time reflect changes in underlying parameters such as mantle potential temperature and the geodynamic setting of mantle melting. However, sampling bias, preservation bias, and geological heterogeneity complicate the calculation of representative average compositions. Here we use weighted bootstrap resampling to minimize sampling bias over the heterogeneous rock record and obtain maximally representative average basaltic compositions through time. Over the approximately 4 Ga of the continental rock record, the average composition of preserved continental basalts has evolved along a generally continuous trajectory, with decreasing compatible element concentrations and increasing incompatible element concentrations, punctuated by a comparatively rapid transition in some variables such as La/Yb ratios and Zr, Nb, and Ti abundances approximately 2.5 Ga ago. Geochemical modeling of mantle melting systematics and trace element partitioning suggests that these observations can be explained by discontinuous changes in the mineralogy of mantle partial melting driven by a gradual decrease in mantle potential temperature, without appealing to any change in tectonic process. This interpretation is supported by the geochemical record of slab fluid input to continental basalts, which indicates no long-term change in the global proportion of arc versus non-arc basaltic magmatism at any time in the preserved rock record.
Gaetbulicola byunsanensis gen. nov., sp. nov., isolated from tidal flat sediment.
Yoon, Jung-Hoon; Kang, So-Jung; Jung, Yong-Taek; Oh, Tae-Kwang
2010-01-01
A Gram-negative, non-motile and pleomorphic bacterial strain, SMK-114(T), which belongs to the class Alphaproteobacteria, was isolated from a tidal flat sample collected in Byunsan, Korea. Strain SMK-114(T) grew optimally at pH 7.0-8.0 and 25-30 degrees C and in the presence of 2 % (w/v) NaCl. A neighbour-joining phylogenetic tree based on 16S rRNA gene sequences showed that strain SMK-114(T) formed a cluster with Octadecabacter species, with which it exhibited 16S rRNA gene sequence similarity values of 95.2-95.4 %. This cluster was part of the clade comprising Thalassobius species with a bootstrap resampling value of 76.3 %. Strain SMK-114(T) exhibited 16S rRNA gene sequence similarity values of 95.1-96.3 % to members of the genus Thalassobius. It contained Q-10 as the predominant ubiquinone and C(18 : 1)omega7c as the major fatty acid. The DNA G+C content was 60.0 mol%. On the basis of phenotypic, chemotaxonomic and phylogenetic data, strain SMK-114(T) is considered to represent a novel species in a new genus for which the name Gaetbulicola byunsanensis gen. nov., sp. nov. is proposed. The type strain of Gaetbulicola byunsanensis is SMK-114(T) (=KCTC 22632(T) =CCUG 57612(T)).
NASA Astrophysics Data System (ADS)
Pandey, Gavendra; Sharan, Maithili
2018-01-01
Application of atmospheric dispersion models in air quality analysis requires a proper representation of the vertical and horizontal growth of the plume. For this purpose, various schemes for the parameterization of dispersion parameters σ‧s are described in both stable and unstable conditions. These schemes differ on the use of (i) extent of availability of on-site measurements (ii) formulations developed for other sites and (iii) empirical relations. The performance of these schemes is evaluated in an earlier developed IIT (Indian Institute of Technology) dispersion model with the data set in single and multiple releases conducted at Fusion Field Trials, Dugway Proving Ground, Utah 2007. Qualitative and quantitative evaluation of the relative performance of all the schemes is carried out in both stable and unstable conditions in the light of (i) peak/maximum concentrations, and (ii) overall concentration distribution. The blocked bootstrap resampling technique is adopted to investigate the statistical significance of the differences in performances of each of the schemes by computing 95% confidence limits on the parameters FB and NMSE. The various analysis based on some selected statistical measures indicated consistency in the qualitative and quantitative performances of σ schemes. The scheme which is based on standard deviation of wind velocity fluctuations and Lagrangian time scales exhibits a relatively better performance in predicting the peak as well as the lateral spread.
Revesz, Kinga M.; Landwehr, Jurate M.
2002-01-01
A new method was developed to analyze the stable carbon and oxygen isotope ratios of small samples (400 ± 20 µg) of calcium carbonate. This new method streamlines the classical phosphoric acid/calcium carbonate (H3PO4/CaCO3) reaction method by making use of a recently available Thermoquest-Finnigan GasBench II preparation device and a Delta Plus XL continuous flow isotope ratio mass spectrometer. Conditions for which the H3PO4/CaCO3 reaction produced reproducible and accurate results with minimal error had to be determined. When the acid/carbonate reaction temperature was kept at 26 °C and the reaction time was between 24 and 54 h, the precision of the carbon and oxygen isotope ratios for pooled samples from three reference standard materials was ≤0.1 and ≤0.2 per mill or ‰, respectively, although later analysis showed that materials from one specific standard required reaction time between 34 and 54 h for δ18O to achieve this level of precision. Aliquot screening methods were shown to further minimize the total error. The accuracy and precision of the new method were analyzed and confirmed by statistical analysis. The utility of the method was verified by analyzing calcite from Devils Hole, Nevada, for which isotope-ratio values had previously been obtained by the classical method. Devils Hole core DH-11 recently had been re-cut and re-sampled, and isotope-ratio values were obtained using the new method. The results were comparable with those obtained by the classical method with correlation = +0.96 for both isotope ratios. The consistency of the isotopic results is such that an alignment offset could be identified in the re-sampled core material, and two cutting errors that occurred during re-sampling then were confirmed independently. This result indicates that the new method is a viable alternative to the classical reaction method. In particular, the new method requires less sample material permitting finer resolution and allows automation of some processes resulting in considerable time savings.
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Immersive volume rendering of blood vessels
NASA Astrophysics Data System (ADS)
Long, Gregory; Kim, Han Suk; Marsden, Alison; Bazilevs, Yuri; Schulze, Jürgen P.
2012-03-01
In this paper, we present a novel method of visualizing flow in blood vessels. Our approach reads unstructured tetrahedral data, resamples it, and uses slice based 3D texture volume rendering. Due to the sparse structure of blood vessels, we utilize an octree to efficiently store the resampled data by discarding empty regions of the volume. We use animation to convey time series data, wireframe surface to give structure, and utilize the StarCAVE, a 3D virtual reality environment, to add a fully immersive element to the visualization. Our tool has great value in interdisciplinary work, helping scientists collaborate with clinicians, by improving the understanding of blood flow simulations. Full immersion in the flow field allows for a more intuitive understanding of the flow phenomena, and can be a great help to medical experts for treatment planning.
Voice Conversion Using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling
NASA Astrophysics Data System (ADS)
Mousa, Allam
2010-01-01
Voice changing has many applications in the industry and commercial filed. This paper emphasizes voice conversion using a pitch shifting method which depends on detecting the pitch of the signal (fundamental frequency) using Simplified Inverse Filter Tracking (SIFT) and changing it according to the target pitch period using time stretching with Pitch Synchronous Over Lap Add Algorithm (PSOLA), then resampling the signal in order to have the same play rate. The same study was performed to see the effect of voice conversion when some Arabic speech signal is considered. Treatment of certain Arabic voiced vowels and the conversion between male and female speech has shown some expansion or compression in the resulting speech. Comparison in terms of pitch shifting is presented here. Analysis was performed for a single frame and a full segmentation of speech.
Tran, Viet-Thi; Porcher, Raphael; Falissard, Bruno; Ravaud, Philippe
2016-12-01
To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys. We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited. In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study. In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility. Copyright © 2016 Elsevier Inc. All rights reserved.
Preprocessing the Nintendo Wii Board Signal to Derive More Accurate Descriptors of Statokinesigrams.
Audiffren, Julien; Contal, Emile
2016-08-01
During the past few years, the Nintendo Wii Balance Board (WBB) has been used in postural control research as an affordable but less reliable replacement for laboratory grade force platforms. However, the WBB suffers some limitations, such as a lower accuracy and an inconsistent sampling rate. In this study, we focus on the latter, namely the non uniform acquisition frequency. We show that this problem, combined with the poor signal to noise ratio of the WBB, can drastically decrease the quality of the obtained information if not handled properly. We propose a new resampling method, Sliding Window Average with Relevance Interval Interpolation (SWARII), specifically designed with the WBB in mind, for which we provide an open source implementation. We compare it with several existing methods commonly used in postural control, both on synthetic and experimental data. The results show that some methods, such as linear and piecewise constant interpolations should definitely be avoided, particularly when the resulting signal is differentiated, which is necessary to estimate speed, an important feature in postural control. Other methods, such as averaging on sliding windows or SWARII, perform significantly better on synthetic dataset, and produce results more similar to the laboratory-grade AMTI force plate (AFP) during experiments. Those methods should be preferred when resampling data collected from a WBB.
Preprocessing the Nintendo Wii Board Signal to Derive More Accurate Descriptors of Statokinesigrams
Audiffren, Julien; Contal, Emile
2016-01-01
During the past few years, the Nintendo Wii Balance Board (WBB) has been used in postural control research as an affordable but less reliable replacement for laboratory grade force platforms. However, the WBB suffers some limitations, such as a lower accuracy and an inconsistent sampling rate. In this study, we focus on the latter, namely the non uniform acquisition frequency. We show that this problem, combined with the poor signal to noise ratio of the WBB, can drastically decrease the quality of the obtained information if not handled properly. We propose a new resampling method, Sliding Window Average with Relevance Interval Interpolation (SWARII), specifically designed with the WBB in mind, for which we provide an open source implementation. We compare it with several existing methods commonly used in postural control, both on synthetic and experimental data. The results show that some methods, such as linear and piecewise constant interpolations should definitely be avoided, particularly when the resulting signal is differentiated, which is necessary to estimate speed, an important feature in postural control. Other methods, such as averaging on sliding windows or SWARII, perform significantly better on synthetic dataset, and produce results more similar to the laboratory-grade AMTI force plate (AFP) during experiments. Those methods should be preferred when resampling data collected from a WBB. PMID:27490545
Ter Braak, Cajo J F; Peres-Neto, Pedro; Dray, Stéphane
2017-01-01
Statistical testing of trait-environment association from data is a challenge as there is no common unit of observation: the trait is observed on species, the environment on sites and the mediating abundance on species-site combinations. A number of correlation-based methods, such as the community weighted trait means method (CWM), the fourth-corner correlation method and the multivariate method RLQ, have been proposed to estimate such trait-environment associations. In these methods, valid statistical testing proceeds by performing two separate resampling tests, one site-based and the other species-based and by assessing significance by the largest of the two p -values (the p max test). Recently, regression-based methods using generalized linear models (GLM) have been proposed as a promising alternative with statistical inference via site-based resampling. We investigated the performance of this new approach along with approaches that mimicked the p max test using GLM instead of fourth-corner. By simulation using models with additional random variation in the species response to the environment, the site-based resampling tests using GLM are shown to have severely inflated type I error, of up to 90%, when the nominal level is set as 5%. In addition, predictive modelling of such data using site-based cross-validation very often identified trait-environment interactions that had no predictive value. The problem that we identify is not an "omitted variable bias" problem as it occurs even when the additional random variation is independent of the observed trait and environment data. Instead, it is a problem of ignoring a random effect. In the same simulations, the GLM-based p max test controlled the type I error in all models proposed so far in this context, but still gave slightly inflated error in more complex models that included both missing (but important) traits and missing (but important) environmental variables. For screening the importance of single trait-environment combinations, the fourth-corner test is shown to give almost the same results as the GLM-based tests in far less computing time.
A CNN based Hybrid approach towards automatic image registration
NASA Astrophysics Data System (ADS)
Arun, Pattathal V.; Katiyar, Sunil K.
2013-06-01
Image registration is a key component of various image processing operations which involve the analysis of different image data sets. Automatic image registration domains have witnessed the application of many intelligent methodologies over the past decade; however inability to properly model object shape as well as contextual information had limited the attainable accuracy. In this paper, we propose a framework for accurate feature shape modeling and adaptive resampling using advanced techniques such as Vector Machines, Cellular Neural Network (CNN), SIFT, coreset, and Cellular Automata. CNN has found to be effective in improving feature matching as well as resampling stages of registration and complexity of the approach has been considerably reduced using corset optimization The salient features of this work are cellular neural network approach based SIFT feature point optimisation, adaptive resampling and intelligent object modelling. Developed methodology has been compared with contemporary methods using different statistical measures. Investigations over various satellite images revealed that considerable success was achieved with the approach. System has dynamically used spectral and spatial information for representing contextual knowledge using CNN-prolog approach. Methodology also illustrated to be effective in providing intelligent interpretation and adaptive resampling. Rejestracja obrazu jest kluczowym składnikiem różnych operacji jego przetwarzania. W ostatnich latach do automatycznej rejestracji obrazu wykorzystuje się metody sztucznej inteligencji, których największą wadą, obniżającą dokładność uzyskanych wyników jest brak możliwości dobrego wymodelowania kształtu i informacji kontekstowych. W niniejszej pracy zaproponowano zasady dokładnego modelowania kształtu oraz adaptacyjnego resamplingu z wykorzystaniem zaawansowanych technik, takich jak Vector Machines (VM), komórkowa sieć neuronowa (CNN), przesiewanie (SIFT), Coreset i automaty komórkowe. Stwierdzono, że za pomocą CNN można skutecznie poprawiać dopasowanie obiektów obrazowych oraz resampling kolejnych kroków rejestracji, zaś zastosowanie optymalizacji metodą Coreset znacznie redukuje złożoność podejścia. Zasadniczym przedmiotem pracy są: optymalizacja punktów metodą SIFT oparta na podejściu CNN, adaptacyjny resampling oraz inteligentne modelowanie obiektów. Opracowana metoda została porównana ze współcześnie stosowanymi metodami wykorzystującymi różne miary statystyczne. Badania nad różnymi obrazami satelitarnymi wykazały, że stosując opracowane podejście osiągnięto bardzo dobre wyniki. System stosując podejście CNN-prolog dynamicznie wykorzystuje informacje spektralne i przestrzenne dla reprezentacji wiedzy kontekstowej. Metoda okazała się również skuteczna w dostarczaniu inteligentnych interpretacji i w adaptacyjnym resamplingu.
Resampling probability values for weighted kappa with multiple raters.
Mielke, Paul W; Berry, Kenneth J; Johnston, Janis E
2008-04-01
A new procedure to compute weighted kappa with multiple raters is described. A resampling procedure to compute approximate probability values for weighted kappa with multiple raters is presented. Applications of weighted kappa are illustrated with an example analysis of classifications by three independent raters.
SOCIAL COMPETENCE AND PSYCHOLOGICAL VULNERABILITY: THE MEDIATING ROLE OF FLOURISHING.
Uysal, Recep
2015-10-01
This study examined whether flourishing mediated the social competence and psychological vulnerability. Participants were 259 university students (147 women, 112 men; M age = 21.3 yr., SD = 1.7) who completed the Turkish versions of the Perceived Social Competence Scale, the Flourishing Scale, and the Psychological Vulnerability Scale. Mediation models were tested using the bootstrapping method to examine indirect effects. Consistent with the hypotheses, the results indicated a positive relationship between social competence and flourishing, and a negative relationship between social competence and psychological vulnerability. Results of the bootstrapping method revealed that flourishing significantly mediated the relationship between social competence and psychological vulnerability. The significance and limitations of the results were discussed.
A bootstrap method for estimating uncertainty of water quality trends
Hirsch, Robert M.; Archfield, Stacey A.; DeCicco, Laura
2015-01-01
Estimation of the direction and magnitude of trends in surface water quality remains a problem of great scientific and practical interest. The Weighted Regressions on Time, Discharge, and Season (WRTDS) method was recently introduced as an exploratory data analysis tool to provide flexible and robust estimates of water quality trends. This paper enhances the WRTDS method through the introduction of the WRTDS Bootstrap Test (WBT), an extension of WRTDS that quantifies the uncertainty in WRTDS-estimates of water quality trends and offers various ways to visualize and communicate these uncertainties. Monte Carlo experiments are applied to estimate the Type I error probabilities for this method. WBT is compared to other water-quality trend-testing methods appropriate for data sets of one to three decades in length with sampling frequencies of 6–24 observations per year. The software to conduct the test is in the EGRETci R-package.
Ramírez-Prado, Dolores; Cortés, Ernesto; Aguilar-Segura, María Soledad; Gil-Guillén, Vicente Francisco
2016-01-01
In January 2012, a review of the cases of chromosome 15q24 microdeletion syndrome was published. However, this study did not include inferential statistics. The aims of the present study were to update the literature search and calculate confidence intervals for the prevalence of each phenotype using bootstrap methodology. Published case reports of patients with the syndrome that included detailed information about breakpoints and phenotype were sought and 36 were included. Deletions in megabase (Mb) pairs were determined to calculate the size of the interstitial deletion of the phenotypes studied in 2012. To determine confidence intervals for the prevalence of the phenotype and the interstitial loss, we used bootstrap methodology. Using the bootstrap percentiles method, we found wide variability in the prevalence of the different phenotypes (3–100%). The mean interstitial deletion size was 2.72 Mb (95% CI [2.35–3.10 Mb]). In comparison with our work, which expanded the literature search by 45 months, there were differences in the prevalence of 17% of the phenotypes, indicating that more studies are needed to analyze this rare disease. PMID:26925314
NASA Astrophysics Data System (ADS)
Richman, Michael B.; Gong, Xiaofeng
1999-06-01
When applying eigenanalysis, one decision analysts make is the determination of what magnitude an eigenvector coefficient (e.g., principal component (PC) loading) must achieve to be considered as physically important. Such coefficients can be displayed on maps or in a time series or tables to gain a fuller understanding of a large array of multivariate data. Previously, such a decision on what value of loading designates a useful signal (hereafter called the loading `cutoff') for each eigenvector has been purely subjective. The importance of selecting such a cutoff is apparent since those loading elements in the range of zero to the cutoff are ignored in the interpretation and naming of PCs since only the absolute values of loadings greater than the cutoff are physically analyzed. This research sets out to objectify the problem of best identifying the cutoff by application of matching between known correlation/covariance structures and their corresponding eigenpatterns, as this cutoff point (known as the hyperplane width) is varied.A Monte Carlo framework is used to resample at five sample sizes. Fourteen different hyperplane cutoff widths are tested, bootstrap resampled 50 times to obtain stable results. The key findings are that the location of an optimal hyperplane cutoff width (one which maximized the information content match between the eigenvector and the parent dispersion matrix from which it was derived) is a well-behaved unimodal function. On an individual eigenvector, this enables the unique determination of a hyperplane cutoff value to be used to separate those loadings that best reflect the relationships from those that do not. The effects of sample size on the matching accuracy are dramatic as the values for all solutions (i.e., unrotated, rotated) rose steadily from 25 through 250 observations and then weakly thereafter. The specific matching coefficients are useful to assess the penalties incurred when one analyzes eigenvector coefficients of a lower absolute value than the cutoff (termed coefficient in the hyperplane) or, alternatively, chooses not to analyze coefficients that contain useful physical signal outside of the hyperplane. Therefore, this study enables the analyst to make the best use of the information available in their PCs to shed light on complicated data structures.
A resampling procedure for generating conditioned daily weather sequences
Clark, Martyn P.; Gangopadhyay, Subhrendu; Brandon, David; Werner, Kevin; Hay, Lauren E.; Rajagopalan, Balaji; Yates, David
2004-01-01
A method is introduced to generate conditioned daily precipitation and temperature time series at multiple stations. The method resamples data from the historical record “nens” times for the period of interest (nens = number of ensemble members) and reorders the ensemble members to reconstruct the observed spatial (intersite) and temporal correlation statistics. The weather generator model is applied to 2307 stations in the contiguous United States and is shown to reproduce the observed spatial correlation between neighboring stations, the observed correlation between variables (e.g., between precipitation and temperature), and the observed temporal correlation between subsequent days in the generated weather sequence. The weather generator model is extended to produce sequences of weather that are conditioned on climate indices (in this case the Niño 3.4 index). Example illustrations of conditioned weather sequences are provided for a station in Arizona (Petrified Forest, 34.8°N, 109.9°W), where El Niño and La Niña conditions have a strong effect on winter precipitation. The conditioned weather sequences generated using the methods described in this paper are appropriate for use as input to hydrologic models to produce multiseason forecasts of streamflow.
NEAT: an efficient network enrichment analysis test.
Signorelli, Mirko; Vinciotti, Veronica; Wit, Ernst C
2016-09-05
Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).
Copula based prediction models: an application to an aortic regurgitation study
Kumar, Pranesh; Shoukri, Mohamed M
2007-01-01
Background: An important issue in prediction modeling of multivariate data is the measure of dependence structure. The use of Pearson's correlation as a dependence measure has several pitfalls and hence application of regression prediction models based on this correlation may not be an appropriate methodology. As an alternative, a copula based methodology for prediction modeling and an algorithm to simulate data are proposed. Methods: The method consists of introducing copulas as an alternative to the correlation coefficient commonly used as a measure of dependence. An algorithm based on the marginal distributions of random variables is applied to construct the Archimedean copulas. Monte Carlo simulations are carried out to replicate datasets, estimate prediction model parameters and validate them using Lin's concordance measure. Results: We have carried out a correlation-based regression analysis on data from 20 patients aged 17–82 years on pre-operative and post-operative ejection fractions after surgery and estimated the prediction model: Post-operative ejection fraction = - 0.0658 + 0.8403 (Pre-operative ejection fraction); p = 0.0008; 95% confidence interval of the slope coefficient (0.3998, 1.2808). From the exploratory data analysis, it is noted that both the pre-operative and post-operative ejection fractions measurements have slight departures from symmetry and are skewed to the left. It is also noted that the measurements tend to be widely spread and have shorter tails compared to normal distribution. Therefore predictions made from the correlation-based model corresponding to the pre-operative ejection fraction measurements in the lower range may not be accurate. Further it is found that the best approximated marginal distributions of pre-operative and post-operative ejection fractions (using q-q plots) are gamma distributions. The copula based prediction model is estimated as: Post -operative ejection fraction = - 0.0933 + 0.8907 × (Pre-operative ejection fraction); p = 0.00008 ; 95% confidence interval for slope coefficient (0.4810, 1.3003). For both models differences in the predicted post-operative ejection fractions in the lower range of pre-operative ejection measurements are considerably different and prediction errors due to copula model are smaller. To validate the copula methodology we have re-sampled with replacement fifty independent bootstrap samples and have estimated concordance statistics 0.7722 (p = 0.0224) for the copula model and 0.7237 (p = 0.0604) for the correlation model. The predicted and observed measurements are concordant for both models. The estimates of accuracy components are 0.9233 and 0.8654 for copula and correlation models respectively. Conclusion: Copula-based prediction modeling is demonstrated to be an appropriate alternative to the conventional correlation-based prediction modeling since the correlation-based prediction models are not appropriate to model the dependence in populations with asymmetrical tails. Proposed copula-based prediction model has been validated using the independent bootstrap samples. PMID:17573974
Causality constraints in conformal field theory
Hartman, Thomas; Jain, Sachin; Kundu, Sandipan
2016-05-17
Causality places nontrivial constraints on QFT in Lorentzian signature, for example fixing the signs of certain terms in the low energy Lagrangian. In d dimensional conformal field theory, we show how such constraints are encoded in crossing symmetry of Euclidean correlators, and derive analogous constraints directly from the conformal bootstrap (analytically). The bootstrap setup is a Lorentzian four-point function corresponding to propagation through a shockwave. Crossing symmetry fixes the signs of certain log terms that appear in the conformal block expansion, which constrains the interactions of low-lying operators. As an application, we use the bootstrap to rederive the well knownmore » sign constraint on the (Φ) 4 coupling in effective field theory, from a dual CFT. We also find constraints on theories with higher spin conserved currents. As a result, our analysis is restricted to scalar correlators, but we argue that similar methods should also impose nontrivial constraints on the interactions of spinning operators« less
Conformal Bootstrap in Mellin Space
NASA Astrophysics Data System (ADS)
Gopakumar, Rajesh; Kaviraj, Apratim; Sen, Kallol; Sinha, Aninda
2017-02-01
We propose a new approach towards analytically solving for the dynamical content of conformal field theories (CFTs) using the bootstrap philosophy. This combines the original bootstrap idea of Polyakov with the modern technology of the Mellin representation of CFT amplitudes. We employ exchange Witten diagrams with built-in crossing symmetry as our basic building blocks rather than the conventional conformal blocks in a particular channel. Demanding consistency with the operator product expansion (OPE) implies an infinite set of constraints on operator dimensions and OPE coefficients. We illustrate the power of this method in the ɛ expansion of the Wilson-Fisher fixed point by reproducing anomalous dimensions and, strikingly, obtaining OPE coefficients to higher orders in ɛ than currently available using other analytic techniques (including Feynman diagram calculations). Our results enable us to get a somewhat better agreement between certain observables in the 3D Ising model and the precise numerical values that have been recently obtained.
Confidence intervals for distinguishing ordinal and disordinal interactions in multiple regression.
Lee, Sunbok; Lei, Man-Kit; Brody, Gene H
2015-06-01
Distinguishing between ordinal and disordinal interaction in multiple regression is useful in testing many interesting theoretical hypotheses. Because the distinction is made based on the location of a crossover point of 2 simple regression lines, confidence intervals of the crossover point can be used to distinguish ordinal and disordinal interactions. This study examined 2 factors that need to be considered in constructing confidence intervals of the crossover point: (a) the assumption about the sampling distribution of the crossover point, and (b) the possibility of abnormally wide confidence intervals for the crossover point. A Monte Carlo simulation study was conducted to compare 6 different methods for constructing confidence intervals of the crossover point in terms of the coverage rate, the proportion of true values that fall to the left or right of the confidence intervals, and the average width of the confidence intervals. The methods include the reparameterization, delta, Fieller, basic bootstrap, percentile bootstrap, and bias-corrected accelerated bootstrap methods. The results of our Monte Carlo simulation study suggest that statistical inference using confidence intervals to distinguish ordinal and disordinal interaction requires sample sizes more than 500 to be able to provide sufficiently narrow confidence intervals to identify the location of the crossover point. (c) 2015 APA, all rights reserved).
Gotelli, Nicholas J.; Dorazio, Robert M.; Ellison, Aaron M.; Grossman, Gary D.
2010-01-01
Quantifying patterns of temporal trends in species assemblages is an important analytical challenge in community ecology. We describe methods of analysis that can be applied to a matrix of counts of individuals that is organized by species (rows) and time-ordered sampling periods (columns). We first developed a bootstrapping procedure to test the null hypothesis of random sampling from a stationary species abundance distribution with temporally varying sampling probabilities. This procedure can be modified to account for undetected species. We next developed a hierarchical model to estimate species-specific trends in abundance while accounting for species-specific probabilities of detection. We analysed two long-term datasets on stream fishes and grassland insects to demonstrate these methods. For both assemblages, the bootstrap test indicated that temporal trends in abundance were more heterogeneous than expected under the null model. We used the hierarchical model to estimate trends in abundance and identified sets of species in each assemblage that were steadily increasing, decreasing or remaining constant in abundance over more than a decade of standardized annual surveys. Our methods of analysis are broadly applicable to other ecological datasets, and they represent an advance over most existing procedures, which do not incorporate effects of incomplete sampling and imperfect detection.
Coefficient Omega Bootstrap Confidence Intervals: Nonnormal Distributions
ERIC Educational Resources Information Center
Padilla, Miguel A.; Divers, Jasmin
2013-01-01
The performance of the normal theory bootstrap (NTB), the percentile bootstrap (PB), and the bias-corrected and accelerated (BCa) bootstrap confidence intervals (CIs) for coefficient omega was assessed through a Monte Carlo simulation under conditions not previously investigated. Of particular interests were nonnormal Likert-type and binary items.…
Crossing symmetry in alpha space
NASA Astrophysics Data System (ADS)
Hogervorst, Matthijs; van Rees, Balt C.
2017-11-01
We initiate the study of the conformal bootstrap using Sturm-Liouville theory, specializing to four-point functions in one-dimensional CFTs. We do so by decomposing conformal correlators using a basis of eigenfunctions of the Casimir which are labeled by a complex number α. This leads to a systematic method for computing conformal block decompositions. Analyzing bootstrap equations in alpha space turns crossing symmetry into an eigenvalue problem for an integral operator K. The operator K is closely related to the Wilson transform, and some of its eigenfunctions can be found in closed form.
2011-03-01
resampling a second time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 70 Plot of RSA bitgroup exponentiation with DAILMOM after a...14 DVFS Dynamic Voltage and Frequency Switching . . . . . . . . . . . . . . . . . . . 14 MDPL Masked Dual-Rail...algorithms to prevent whole-sale discovery of PINs and other simple methods to prevent employee tampering [5]. In time , cryptographic systems have
On the estimation of spread rate for a biological population
Jim Clark; Lajos Horváth; Mark Lewis
2001-01-01
We propose a nonparametric estimator for the rate of spread of an introduced population. We prove that the limit distribution of the estimator is normal or stable, depending on the behavior of the moment generating function. We show that resampling methods can also be used to approximate the distribution of the estimators.
Liu, Fang; Shen, Changqing; He, Qingbo; Zhang, Ao; Liu, Yongbin; Kong, Fanrang
2014-01-01
A fault diagnosis strategy based on the wayside acoustic monitoring technique is investigated for locomotive bearing fault diagnosis. Inspired by the transient modeling analysis method based on correlation filtering analysis, a so-called Parametric-Mother-Doppler-Wavelet (PMDW) is constructed with six parameters, including a center characteristic frequency and five kinematic model parameters. A Doppler effect eliminator containing a PMDW generator, a correlation filtering analysis module, and a signal resampler is invented to eliminate the Doppler effect embedded in the acoustic signal of the recorded bearing. Through the Doppler effect eliminator, the five kinematic model parameters can be identified based on the signal itself. Then, the signal resampler is applied to eliminate the Doppler effect using the identified parameters. With the ability to detect early bearing faults, the transient model analysis method is employed to detect localized bearing faults after the embedded Doppler effect is eliminated. The effectiveness of the proposed fault diagnosis strategy is verified via simulation studies and applications to diagnose locomotive roller bearing defects. PMID:24803197
Maximum likelihood resampling of noisy, spatially correlated data
NASA Astrophysics Data System (ADS)
Goff, J.; Jenkins, C.
2005-12-01
In any geologic application, noisy data are sources of consternation for researchers, inhibiting interpretability and marring images with unsightly and unrealistic artifacts. Filtering is the typical solution to dealing with noisy data. However, filtering commonly suffers from ad hoc (i.e., uncalibrated, ungoverned) application, which runs the risk of erasing high variability components of the field in addition to the noise components. We present here an alternative to filtering: a newly developed methodology for correcting noise in data by finding the "best" value given the data value, its uncertainty, and the data values and uncertainties at proximal locations. The motivating rationale is that data points that are close to each other in space cannot differ by "too much", where how much is "too much" is governed by the field correlation properties. Data with large uncertainties will frequently violate this condition, and in such cases need to be corrected, or "resampled." The best solution for resampling is determined by the maximum of the likelihood function defined by the intersection of two probability density functions (pdf): (1) the data pdf, with mean and variance determined by the data value and square uncertainty, respectively, and (2) the geostatistical pdf, whose mean and variance are determined by the kriging algorithm applied to proximal data values. A Monte Carlo sampling of the data probability space eliminates non-uniqueness, and weights the solution toward data values with lower uncertainties. A test with a synthetic data set sampled from a known field demonstrates quantitatively and qualitatively the improvement provided by the maximum likelihood resampling algorithm. The method is also applied to three marine geology/geophysics data examples: (1) three generations of bathymetric data on the New Jersey shelf with disparate data uncertainties; (2) mean grain size data from the Adriatic Sea, which is combination of both analytic (low uncertainty) and word-based (higher uncertainty) sources; and (3) sidescan backscatter data from the Martha's Vineyard Coastal Observatory which are, as is typical for such data, affected by speckly noise.
Multiple-instance ensemble learning for hyperspectral images
NASA Astrophysics Data System (ADS)
Ergul, Ugur; Bilgin, Gokhan
2017-10-01
An ensemble framework for multiple-instance (MI) learning (MIL) is introduced for use in hyperspectral images (HSIs) by inspiring the bagging (bootstrap aggregation) method in ensemble learning. Ensemble-based bagging is performed by a small percentage of training samples, and MI bags are formed by a local windowing process with variable window sizes on selected instances. In addition to bootstrap aggregation, random subspace is another method used to diversify base classifiers. The proposed method is implemented using four MIL classification algorithms. The classifier model learning phase is carried out with MI bags, and the estimation phase is performed over single-test instances. In the experimental part of the study, two different HSIs that have ground-truth information are used, and comparative results are demonstrated with state-of-the-art classification methods. In general, the MI ensemble approach produces more compact results in terms of both diversity and error compared to equipollent non-MIL algorithms.
Power in Bayesian Mediation Analysis for Small Sample Research
Miočević, Milica; MacKinnon, David P.; Levy, Roy
2018-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results. PMID:29662296
Power in Bayesian Mediation Analysis for Small Sample Research.
Miočević, Milica; MacKinnon, David P; Levy, Roy
2017-01-01
It was suggested that Bayesian methods have potential for increasing power in mediation analysis (Koopman, Howe, Hollenbeck, & Sin, 2015; Yuan & MacKinnon, 2009). This paper compares the power of Bayesian credibility intervals for the mediated effect to the power of normal theory, distribution of the product, percentile, and bias-corrected bootstrap confidence intervals at N≤ 200. Bayesian methods with diffuse priors have power comparable to the distribution of the product and bootstrap methods, and Bayesian methods with informative priors had the most power. Varying degrees of precision of prior distributions were also examined. Increased precision led to greater power only when N≥ 100 and the effects were small, N < 60 and the effects were large, and N < 200 and the effects were medium. An empirical example from psychology illustrated a Bayesian analysis of the single mediator model from prior selection to interpreting results.
NASA Astrophysics Data System (ADS)
Yan, Hong; Song, Xiangzhong; Tian, Kuangda; Chen, Yilin; Xiong, Yanmei; Min, Shungeng
2018-02-01
A novel method, mid-infrared (MIR) spectroscopy, which enables the determination of Chlorantraniliprole in Abamectin within minutes, is proposed. We further evaluate the prediction ability of four wavelength selection methods, including bootstrapping soft shrinkage approach (BOSS), Monte Carlo uninformative variable elimination (MCUVE), genetic algorithm partial least squares (GA-PLS) and competitive adaptive reweighted sampling (CARS) respectively. The results showed that BOSS method obtained the lowest root mean squared error of cross validation (RMSECV) (0.0245) and root mean squared error of prediction (RMSEP) (0.0271), as well as the highest coefficient of determination of cross-validation (Qcv2) (0.9998) and the coefficient of determination of test set (Q2test) (0.9989), which demonstrated that the mid infrared spectroscopy can be used to detect Chlorantraniliprole in Abamectin conveniently. Meanwhile, a suitable wavelength selection method (BOSS) is essential to conducting a component spectral analysis.
Anomalous change detection in imagery
Theiler, James P [Los Alamos, NM; Perkins, Simon J [Santa Fe, NM
2011-05-31
A distribution-based anomaly detection platform is described that identifies a non-flat background that is specified in terms of the distribution of the data. A resampling approach is also disclosed employing scrambled resampling of the original data with one class specified by the data and the other by the explicit distribution, and solving using binary classification.
De-Dopplerization of Acoustic Measurements
2017-08-10
band energy obtained from fractional octave band digital filters generates a de-Dopplerized spectrum without complex resampling algorithms. An...energy obtained from fractional octave band digital filters generates a de-Dopplerized spectrum without complex resampling algorithms. An equation...fractional octave representation and smearing that occurs within the spectrum11, digital filtering techniques were not considered by these earlier
Thematic mapper design parameter investigation
NASA Technical Reports Server (NTRS)
Colby, C. P., Jr.; Wheeler, S. G.
1978-01-01
This study simulated the multispectral data sets to be expected from three different Thematic Mapper configurations, and the ground processing of these data sets by three different resampling techniques. The simulated data sets were then evaluated by processing them for multispectral classification, and the Thematic Mapper configuration, and resampling technique which provided the best classification accuracy were identified.
Effects of magnetic islands on bootstrap current in toroidal plasmas
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dong, G.; Lin, Z.
The effects of magnetic islands on electron bootstrap current in toroidal plasmas are studied using gyrokinetic simulations. The magnetic islands cause little changes of the bootstrap current level in the banana regime because of trapped electron effects. In the plateau regime, the bootstrap current is completely suppressed at the island centers due to the destruction of trapped electron orbits by collisions and the flattening of pressure profiles by the islands. In the collisional regime, small but finite bootstrap current can exist inside the islands because of the pressure gradients created by large collisional transport across the islands. Lastly, simulation resultsmore » show that the bootstrap current level increases near the island separatrix due to steeper local density gradients.« less
Effects of magnetic islands on bootstrap current in toroidal plasmas
Dong, G.; Lin, Z.
2016-12-19
The effects of magnetic islands on electron bootstrap current in toroidal plasmas are studied using gyrokinetic simulations. The magnetic islands cause little changes of the bootstrap current level in the banana regime because of trapped electron effects. In the plateau regime, the bootstrap current is completely suppressed at the island centers due to the destruction of trapped electron orbits by collisions and the flattening of pressure profiles by the islands. In the collisional regime, small but finite bootstrap current can exist inside the islands because of the pressure gradients created by large collisional transport across the islands. Lastly, simulation resultsmore » show that the bootstrap current level increases near the island separatrix due to steeper local density gradients.« less
Kappa statistic for the clustered dichotomous responses from physicians and patients
Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L.; Cai, Jianwen
2013-01-01
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared to the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. An example of an application to a coronary heart disease prevention study is presented. PMID:23533082
NASA Astrophysics Data System (ADS)
Müller, H.; Haberlandt, U.
2018-01-01
Rainfall time series of high temporal resolution and spatial density are crucial for urban hydrology. The multiplicative random cascade model can be used for temporal rainfall disaggregation of daily data to generate such time series. Here, the uniform splitting approach with a branching number of 3 in the first disaggregation step is applied. To achieve a final resolution of 5 min, subsequent steps after disaggregation are necessary. Three modifications at different disaggregation levels are tested in this investigation (uniform splitting at Δt = 15 min, linear interpolation at Δt = 7.5 min and Δt = 3.75 min). Results are compared both with observations and an often used approach, based on the assumption that a time steps with Δt = 5.625 min, as resulting if a branching number of 2 is applied throughout, can be replaced with Δt = 5 min (called the 1280 min approach). Spatial consistence is implemented in the disaggregated time series using a resampling algorithm. In total, 24 recording stations in Lower Saxony, Northern Germany with a 5 min resolution have been used for the validation of the disaggregation procedure. The urban-hydrological suitability is tested with an artificial combined sewer system of about 170 hectares. The results show that all three variations outperform the 1280 min approach regarding reproduction of wet spell duration, average intensity, fraction of dry intervals and lag-1 autocorrelation. Extreme values with durations of 5 min are also better represented. For durations of 1 h, all approaches show only slight deviations from the observed extremes. The applied resampling algorithm is capable to achieve sufficient spatial consistence. The effects on the urban hydrological simulations are significant. Without spatial consistence, flood volumes of manholes and combined sewer overflow are strongly underestimated. After resampling, results using disaggregated time series as input are in the range of those using observed time series. Best overall performance regarding rainfall statistics are obtained by the method in which the disaggregation process ends at time steps with 7.5 min duration, deriving the 5 min time steps by linear interpolation. With subsequent resampling this method leads to a good representation of manhole flooding and combined sewer overflow volume in terms of hydrological simulations and outperforms the 1280 min approach.
Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico
2018-02-01
To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Technical and scale efficiency in public and private Irish nursing homes - a bootstrap DEA approach.
Ni Luasa, Shiovan; Dineen, Declan; Zieba, Marta
2016-10-27
This article provides methodological and empirical insights into the estimation of technical efficiency in the nursing home sector. Focusing on long-stay care and using primary data, we examine technical and scale efficiency in 39 public and 73 private Irish nursing homes by applying an input-oriented data envelopment analysis (DEA). We employ robust bootstrap methods to validate our nonparametric DEA scores and to integrate the effects of potential determinants in estimating the efficiencies. Both the homogenous and two-stage double bootstrap procedures are used to obtain confidence intervals for the bias-corrected DEA scores. Importantly, the application of the double bootstrap approach affords true DEA technical efficiency scores after adjusting for the effects of ownership, size, case-mix, and other determinants such as location, and quality. Based on our DEA results for variable returns to scale technology, the average technical efficiency score is 62 %, and the mean scale efficiency is 88 %, with nearly all units operating on the increasing returns to scale part of the production frontier. Moreover, based on the double bootstrap results, Irish nursing homes are less technically efficient, and more scale efficient than the conventional DEA estimates suggest. Regarding the efficiency determinants, in terms of ownership, we find that private facilities are less efficient than the public units. Furthermore, the size of the nursing home has a positive effect, and this reinforces our finding that Irish homes produce at increasing returns to scale. Also, notably, we find that a tendency towards quality improvements can lead to poorer technical efficiency performance.
Methods for Estimating Uncertainty in Factor Analytic Solutions
The EPA PMF (Environmental Protection Agency positive matrix factorization) version 5.0 and the underlying multilinear engine-executable ME-2 contain three methods for estimating uncertainty in factor analytic models: classical bootstrap (BS), displacement of factor elements (DI...
Kaufmann, Esther; Wittmann, Werner W.
2016-01-01
The success of bootstrapping or replacing a human judge with a model (e.g., an equation) has been demonstrated in Paul Meehl’s (1954) seminal work and bolstered by the results of several meta-analyses. To date, however, analyses considering different types of meta-analyses as well as the potential dependence of bootstrapping success on the decision domain, the level of expertise of the human judge, and the criterion for what constitutes an accurate decision have been missing from the literature. In this study, we addressed these research gaps by conducting a meta-analysis of lens model studies. We compared the results of a traditional (bare-bones) meta-analysis with findings of a meta-analysis of the success of bootstrap models corrected for various methodological artifacts. In line with previous studies, we found that bootstrapping was more successful than human judgment. Furthermore, bootstrapping was more successful in studies with an objective decision criterion than in studies with subjective or test score criteria. We did not find clear evidence that the success of bootstrapping depended on the decision domain (e.g., education or medicine) or on the judge’s level of expertise (novice or expert). Correction of methodological artifacts increased the estimated success of bootstrapping, suggesting that previous analyses without artifact correction (i.e., traditional meta-analyses) may have underestimated the value of bootstrapping models. PMID:27327085
ERIC Educational Resources Information Center
Livingston, Samuel A.; Kim, Sooyeon
2010-01-01
A series of resampling studies investigated the accuracy of equating by four different methods in a random groups equating design with samples of 400, 200, 100, and 50 test takers taking each form. Six pairs of forms were constructed. Each pair was constructed by assigning items from an existing test taken by 9,000 or more test takers. The…
A Data Augmentation Approach to Short Text Classification
ERIC Educational Resources Information Center
Rosario, Ryan Robert
2017-01-01
Text classification typically performs best with large training sets, but short texts are very common on the World Wide Web. Can we use resampling and data augmentation to construct larger texts using similar terms? Several current methods exist for working with short text that rely on using external data and contexts, or workarounds. Our focus is…
Jeffrey T. Walton
2008-01-01
Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
Propagating probability distributions of stand variables using sequential Monte Carlo methods
Jeffrey H. Gove
2009-01-01
A general probabilistic approach to stand yield estimation is developed based on sequential Monte Carlo filters, also known as particle filters. The essential steps in the development of the sampling importance resampling (SIR) particle filter are presented. The SIR filter is then applied to simulated and observed data showing how the 'predictor - corrector'...
Collateral Information for Equating in Small Samples: A Preliminary Investigation
ERIC Educational Resources Information Center
Kim, Sooyeon; Livingston, Samuel A.; Lewis, Charles
2011-01-01
This article describes a preliminary investigation of an empirical Bayes (EB) procedure for using collateral information to improve equating of scores on test forms taken by small numbers of examinees. Resampling studies were done on two different forms of the same test. In each study, EB and non-EB versions of two equating methods--chained linear…
Lee, Kangjoo; Lina, Jean-Marc; Gotman, Jean; Grova, Christophe
2016-07-01
Functional hubs are defined as the specific brain regions with dense connections to other regions in a functional brain network. Among them, connector hubs are of great interests, as they are assumed to promote global and hierarchical communications between functionally specialized networks. Damage to connector hubs may have a more crucial effect on the system than does damage to other hubs. Hubs in graph theory are often identified from a correlation matrix, and classified as connector hubs when the hubs are more connected to regions in other networks than within the networks to which they belong. However, the identification of hubs from functional data is more complex than that from structural data, notably because of the inherent problem of multicollinearity between temporal dynamics within a functional network. In this context, we developed and validated a method to reliably identify connectors and corresponding overlapping network structure from resting-state fMRI. This new method is actually handling the multicollinearity issue, since it does not rely on counting the number of connections from a thresholded correlation matrix. The novelty of the proposed method is that besides counting the number of networks involved in each voxel, it allows us to identify which networks are actually involved in each voxel, using a data-driven sparse general linear model in order to identify brain regions involved in more than one network. Moreover, we added a bootstrap resampling strategy to assess statistically the reproducibility of our results at the single subject level. The unified framework is called SPARK, i.e. SParsity-based Analysis of Reliable k-hubness, where k-hubness denotes the number of networks overlapping in each voxel. The accuracy and robustness of SPARK were evaluated using two dimensional box simulations and realistic simulations that examined detection of artificial hubs generated on real data. Then, test/retest reliability of the method was assessed using the 1000 Functional Connectome Project database, which includes data obtained from 25 healthy subjects at three different occasions with long and short intervals between sessions. We demonstrated that SPARK provides an accurate and reliable estimation of k-hubness, suggesting a promising tool for understanding hub organization in resting-state fMRI. Copyright © 2016 Elsevier Inc. All rights reserved.
On critical exponents without Feynman diagrams
NASA Astrophysics Data System (ADS)
Sen, Kallol; Sinha, Aninda
2016-11-01
In order to achieve a better analytic handle on the modern conformal bootstrap program, we re-examine and extend the pioneering 1974 work of Polyakov’s, which was based on consistency between the operator product expansion and unitarity. As in the bootstrap approach, this method does not depend on evaluating Feynman diagrams. We show how this approach can be used to compute the anomalous dimensions of certain operators in the O(n) model at the Wilson-Fisher fixed point in 4-ɛ dimensions up to O({ɛ }2). AS dedicates this work to the loving memory of his mother.
Paixão, Paulo; Gouveia, Luís F; Silva, Nuno; Morais, José A G
2017-03-01
A simulation study is presented, evaluating the performance of the f 2 , the model-independent multivariate statistical distance and the f 2 bootstrap methods in the ability to conclude similarity between two dissolution profiles. Different dissolution profiles, based on the Noyes-Whitney equation and ranging from theoretical f 2 values between 100 and 40, were simulated. Variability was introduced in the dissolution model parameters in an increasing order, ranging from a situation complying with the European guidelines requirements for the use of the f 2 metric to several situations where the f 2 metric could not be used anymore. Results have shown that the f 2 is an acceptable metric when used according to the regulatory requirements, but loses its applicability when variability increases. The multivariate statistical distance presented contradictory results in several of the simulation scenarios, which makes it an unreliable metric for dissolution profile comparisons. The bootstrap f 2 , although conservative in its conclusions is an alternative suitable method. Overall, as variability increases, all of the discussed methods reveal problems that can only be solved by increasing the number of dosage form units used in the comparison, which is usually not practical or feasible. Additionally, experimental corrective measures may be undertaken in order to reduce the overall variability, particularly when it is shown that it is mainly due to the dissolution assessment instead of being intrinsic to the dosage form. Copyright © 2016. Published by Elsevier B.V.
Boiret, Mathieu; Meunier, Loïc; Ginot, Yves-Michel
2011-02-20
A near infrared (NIR) method was developed for determination of tablet potency of active pharmaceutical ingredient (API) in a complex coated tablet matrix. The calibration set contained samples from laboratory and production scale batches. The reference values were obtained by high performance liquid chromatography (HPLC) and partial least squares (PLS) regression was used to establish a model. The model was challenged by calculating tablet potency of two external test sets. Root mean square errors of prediction were respectively equal to 2.0% and 2.7%. To use this model with a second spectrometer from the production field, a calibration transfer method called piecewise direct standardisation (PDS) was used. After the transfer, the root mean square error of prediction of the first test set was 2.4% compared to 4.0% without transferring the spectra. A statistical technique using bootstrap of PLS residuals was used to estimate confidence intervals of tablet potency calculations. This method requires an optimised PLS model, selection of the bootstrap number and determination of the risk. In the case of a chemical analysis, the tablet potency value will be included within the confidence interval calculated by the bootstrap method. An easy to use graphical interface was developed to easily determine if the predictions, surrounded by minimum and maximum values, are within the specifications defined by the regulatory organisation. Copyright © 2010 Elsevier B.V. All rights reserved.
Korenromp, Eline L; Mahiané, Guy; Rowley, Jane; Nagelkerke, Nico; Abu-Raddad, Laith; Ndowa, Francis; El-Kettani, Amina; El-Rhilani, Houssine; Mayaud, Philippe; Chico, R Matthew; Pretorius, Carel; Hecht, Kendall; Wi, Teodora
2017-01-01
Objective To develop a tool for estimating national trends in adult prevalence of sexually transmitted infections by low- and middle-income countries, using standardised, routinely collected programme indicator data. Methods The Spectrum-STI model fits time trends in the prevalence of active syphilis through logistic regression on prevalence data from antenatal clinic-based surveys, routine antenatal screening and general population surveys where available, weighting data by their national coverage and representativeness. Gonorrhoea prevalence was fitted as a moving average on population surveys (from the country, neighbouring countries and historic regional estimates), with trends informed additionally by urethral discharge case reports, where these were considered to have reasonably stable completeness. Prevalence data were adjusted for diagnostic test performance, high-risk populations not sampled, urban/rural and male/female prevalence ratios, using WHO's assumptions from latest global and regional-level estimations. Uncertainty intervals were obtained by bootstrap resampling. Results Estimated syphilis prevalence (in men and women) declined from 1.9% (95% CI 1.1% to 3.4%) in 2000 to 1.5% (1.3% to 1.8%) in 2016 in Zimbabwe, and from 1.5% (0.76% to 1.9%) to 0.55% (0.30% to 0.93%) in Morocco. At these time points, gonorrhoea estimates for women aged 15–49 years were 2.5% (95% CI 1.1% to 4.6%) and 3.8% (1.8% to 6.7%) in Zimbabwe; and 0.6% (0.3% to 1.1%) and 0.36% (0.1% to 1.0%) in Morocco, with male gonorrhoea prevalences 14% lower than female prevalence. Conclusions This epidemiological framework facilitates data review, validation and strategic analysis, prioritisation of data collection needs and surveillance strengthening by national experts. We estimated ongoing syphilis declines in both Zimbabwe and Morocco. For gonorrhoea, time trends were less certain, lacking recent population-based surveys. PMID:28325771
Beukinga, Roelof J; Hulshoff, Jan Binne; Mul, Véronique E M; Noordzij, Walter; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Plukker, John T M
2018-06-01
Purpose To assess the value of baseline and restaging fluorine 18 ( 18 F) fluorodeoxyglucose (FDG) positron emission tomography (PET) radiomics in predicting pathologic complete response to neoadjuvant chemotherapy and radiation therapy (NCRT) in patients with locally advanced esophageal cancer. Materials and Methods In this retrospective study, 73 patients with histologic analysis-confirmed T1/N1-3/M0 or T2-4a/N0-3/M0 esophageal cancer were treated with NCRT followed by surgery (Chemoradiotherapy for Esophageal Cancer followed by Surgery Study regimen) between October 2014 and August 2017. Clinical variables and radiomic features from baseline and restaging 18 F-FDG PET were selected by univariable logistic regression and least absolute shrinkage and selection operator. The selected variables were used to fit a multivariable logistic regression model, which was internally validated by using bootstrap resampling with 20 000 replicates. The performance of this model was compared with reference prediction models composed of maximum standardized uptake value metrics, clinical variables, and maximum standardized uptake value at baseline NCRT radiomic features. Outcome was defined as complete versus incomplete pathologic response (tumor regression grade 1 vs 2-5 according to the Mandard classification). Results Pathologic response was complete in 16 patients (21.9%) and incomplete in 57 patients (78.1%). A prediction model combining clinical T-stage and restaging NCRT (post-NCRT) joint maximum (quantifying image orderliness) yielded an optimism-corrected area under the receiver operating characteristics curve of 0.81. Post-NCRT joint maximum was replaceable with five other redundant post-NCRT radiomic features that provided equal model performance. All reference prediction models exhibited substantially lower discriminatory accuracy. Conclusion The combination of clinical T-staging and quantitative assessment of post-NCRT 18 F-FDG PET orderliness (joint maximum) provided high discriminatory accuracy in predicting pathologic complete response in patients with esophageal cancer. © RSNA, 2018 Online supplemental material is available for this article.
Are Study and Journal Characteristics Reliable Indicators of "Truth" in Imaging Research?
Frank, Robert A; McInnes, Matthew D F; Levine, Deborah; Kressel, Herbert Y; Jesurum, Julia S; Petrcich, William; McGrath, Trevor A; Bossuyt, Patrick M
2018-04-01
Purpose To evaluate whether journal-level variables (impact factor, cited half-life, and Standards for Reporting of Diagnostic Accuracy Studies [STARD] endorsement) and study-level variables (citation rate, timing of publication, and order of publication) are associated with the distance between primary study results and summary estimates from meta-analyses. Materials and Methods MEDLINE was searched for meta-analyses of imaging diagnostic accuracy studies, published from January 2005 to April 2016. Data on journal-level and primary-study variables were extracted for each meta-analysis. Primary studies were dichotomized by variable as first versus subsequent publication, publication before versus after STARD introduction, STARD endorsement, or by median split. The mean absolute deviation of primary study estimates from the corresponding summary estimates for sensitivity and specificity was compared between groups. Means and confidence intervals were obtained by using bootstrap resampling; P values were calculated by using a t test. Results Ninety-eight meta-analyses summarizing 1458 primary studies met the inclusion criteria. There was substantial variability, but no significant differences, in deviations from the summary estimate between paired groups (P > .0041 in all comparisons). The largest difference found was in mean deviation for sensitivity, which was observed for publication timing, where studies published first on a topic demonstrated a mean deviation that was 2.5 percentage points smaller than subsequently published studies (P = .005). For journal-level factors, the greatest difference found (1.8 percentage points; P = .088) was in mean deviation for sensitivity in journals with impact factors above the median compared with those below the median. Conclusion Journal- and study-level variables considered important when evaluating diagnostic accuracy information to guide clinical decisions are not systematically associated with distance from the truth; critical appraisal of individual articles is recommended. © RSNA, 2017 Online supplemental material is available for this article.
NASA Astrophysics Data System (ADS)
Fengler, Felipe; Ribeiro, Admilson; Longo, Regina; Merides, Marcela; Soares, Herlon; Melo, Wanderley
2017-04-01
Although reclamation techniques for forest ecosystems recovery have been developed over the past decades, there is still a great difficulty in the establishment on environment assessment, especially when compared to the non-disturbed ecosystems. This work evaluated the results and limitations on cassiterite-mined areas in reclamation, at Brazilian Amazônia. Floristic variables from 29 plots located on 15-year-old native species reforestation sites and two plots from preserved open/closed canopy forests were analyzed in a chronosequece way (2010-2015). Regeneration density, species richness, average girth, and average height were evaluated every year, by means of cluster analysis (Euclidian distance, Ward method) and submitted to multiscale bootstrap resampling (a=5%). It was conduced the regression analysis for each identified group in 2015 in order to verify differences between the chronosequece development. The results showed the existence of two main groups in 2010, one witch all mined plots were allocated and other with open/closed canopy plots. After 2011 some mined areas became allocated in the open/closed canopy plots group. From 2013 and on open/closed canopy plots appeared shuffled in the formed groups, indicating the reclamation sites conditions became similar to natural areas. Finally, in 2015 three main groups were formed. The regression analysis showed that group three had a higher trend of development for regeneration density, with higher angular coefficient and higher values. For species richness all the groups had a similar trend, with values lower than open/closed canopy forest. In average girth higher trends were observed in group one and all values were near to open canopy forest in 2015. Average height showed better trends and higher values in group two. It was concluded that all mined sites had a forest recovery process. However, different responses to reclamation process were observed due to the differences in the degraded soils characteristics. Keywords: Recovery, Restoration, Forest, Chronosequece, Cassiterite.
Garziera, Marica; Bidoli, Ettore; Cecchin, Erika; Mini, Enrico; Nobili, Stefania; Lonardi, Sara; Buonadonna, Angela; Errante, Domenico; Pella, Nicoletta; D'Andrea, Mario; De Marchi, Francesco; De Paoli, Antonino; Zanusso, Chiara; De Mattia, Elena; Tassi, Renato; Toffoli, Giuseppe
2015-01-01
An important hallmark of CRC is the evasion of immune surveillance. HLA-G is a negative regulator of host's immune response. Overexpression of HLA-G protein in primary tumour CRC tissues has already been associated to worse prognosis; however a definition of the role of immunogenetic host background is still lacking. Germline polymorphisms in the 3'UTR region of HLA-G influence the magnitude of the protein by modulating HLA-G mRNA stability. Soluble HLA-G has been associated to 3'UTR +2960 Ins/Ins and +3035 C/T (lower levels) and +3187 G/G (high levels) genotypes. HLA-G 3'UTR SNPs have never been explored in CRC outcome. The purpose of this study was to investigate if common HLA-G 3'UTR polymorphisms have an impact on DFS and OS of 253 stage II-III CRC patients, after primary surgery and ADJ-CT based on FL. The 3'UTR was sequenced and SNPs were analyzed for their association with survival by Kaplan-Meier and multivariate Cox models; results underwent internal validation using a resampling method (bootstrap analysis). In a multivariate analysis, we estimated an association with improved DFS in Ins allele (Ins/Del +Ins/Ins) carriers (HR 0.60, 95% CI 0.38-0.93, P = 0.023) and in patients with +3035 C/T genotype (HR 0.51, 95% CI 0.26-0.99, P = 0.045). The +3187 G/G mutated carriers (G/G vs A/A+A/G) were associated to a worst prognosis in both DFS (HR 2.46, 95% CI 1.19-5.05, P = 0.015) and OS (HR 2.71, 95% CI 1.16-6.63, P = 0.022). Our study shows a prognostic and independent role of 3 HLA-G 3'UTR SNPs, +2960 14-bp INDEL, +3035 C>T, and +3187 A>G.
Girija, S; Manjunath, A P; Salahudin, A; Jeyaseelan, L; Gowri, V; Abu-Heija, A; Al Kharusi, L
2017-08-01
To validate whether change in serum HCG levels between days 0 and 4 confer any prognostic value during methotrexate therapy and to quantify its change. This is a retrospective study of 48 tubal ectopic pregnancies treated with single dose methotrexate protocol at University Hospital, Muscat, Oman from January 2012 to December 2013. The clinical outcome was analyzed based on the complete resolution of HCG levels or need for additional doses of methotrexate or recourse to surgery. The percentage change in HCG levels between days 0 and 4 (HCG index) of methotrexate were calculated and receiver operator characteristics curve was plotted to identify the best cutoff levels. In order to get a robust 95% confidence interval, bootstrap method using R software was done using 1000 re-sampling. ROC curve and the predictive values were estimated using MEDCALC software. The mean HCG level on day 4 is significantly higher in treatment failure group (4254±4095 IU/L vs. 2109±3646 IU/L, P=0.008). The HCG levels between day 0 and 4 decreased in 42.7% (21/48) of cases and 80.9% of these cases had treatment success. The HCG levels increased in 57.4% (27/48) of cases and 33.3% of these cases had treatment success. (P=0.001). A 10 percent decline in day 4 HCG levels predict the treatment success with sensitivity of 77% and Specificity 81%. The area under the ROC curve was 0.82 (95% CI: 0.67-0.92), (P<0.001). The success with single dose of methotrexate therapy for tubal ectopic pregnancies was predicted early in the course of treatment by following three key findings: the absolute mean HCG values on day 4, decrease in HCG level from day 0 to 4 and 10% or more fall in day 4 HCG levels. Copyright © 2017 Elsevier B.V. All rights reserved.
Automation and Evaluation of the SOWH Test with SOWHAT.
Church, Samuel H; Ryan, Joseph F; Dunn, Casey W
2015-11-01
The Swofford-Olsen-Waddell-Hillis (SOWH) test evaluates statistical support for incongruent phylogenetic topologies. It is commonly applied to determine if the maximum likelihood tree in a phylogenetic analysis is significantly different than an alternative hypothesis. The SOWH test compares the observed difference in log-likelihood between two topologies to a null distribution of differences in log-likelihood generated by parametric resampling. The test is a well-established phylogenetic method for topology testing, but it is sensitive to model misspecification, it is computationally burdensome to perform, and its implementation requires the investigator to make several decisions that each have the potential to affect the outcome of the test. We analyzed the effects of multiple factors using seven data sets to which the SOWH test was previously applied. These factors include a number of sample replicates, likelihood software, the introduction of gaps to simulated data, the use of distinct models of evolution for data simulation and likelihood inference, and a suggested test correction wherein an unresolved "zero-constrained" tree is used to simulate sequence data. To facilitate these analyses and future applications of the SOWH test, we wrote SOWHAT, a program that automates the SOWH test. We find that inadequate bootstrap sampling can change the outcome of the SOWH test. The results also show that using a zero-constrained tree for data simulation can result in a wider null distribution and higher p-values, but does not change the outcome of the SOWH test for most of the data sets tested here. These results will help others implement and evaluate the SOWH test and allow us to provide recommendations for future applications of the SOWH test. SOWHAT is available for download from https://github.com/josephryan/SOWHAT. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
The effects of normal aging on multiple aspects of financial decision-making
Bangma, Dorien F.; Fuermaier, Anselm B. M.; Tucha, Lara; Tucha, Oliver; Koerts, Janneke
2017-01-01
Objectives Financial decision-making (FDM) is crucial for independent living. Due to cognitive decline that accompanies normal aging, older adults might have difficulties in some aspects of FDM. However, an improved knowledge, personal experience and affective decision-making, which are also related to normal aging, may lead to a stable or even improved age-related performance in some other aspects of FDM. Therefore, the present explorative study examines the effects of normal aging on multiple aspects of FDM. Methods One-hundred and eighty participants (range 18–87 years) were assessed with eight FDM tests and several standard neuropsychological tests. Age effects were evaluated using hierarchical multiple regression analyses. The validity of the prediction models was examined by internal validation (i.e. bootstrap resampling procedure) as well as external validation on another, independent, sample of participants (n = 124). Multiple regression and correlation analyses were applied to investigate the mediation effect of standard measures of cognition on the observed effects of age on FDM. Results On a relatively basic level of FDM (e.g., paying bills or using FDM styles) no significant effects of aging were found. However more complex FDM, such as making decisions in accordance with specific rules, becomes more difficult with advancing age. Furthermore, an older age was found to be related to a decreased sensitivity for impulsive buying. These results were confirmed by the internal and external validation analyses. Mediation effects of numeracy and planning were found to explain parts of the association between one aspect of FDM (i.e. Competence in decision rules) and age; however, these cognitive domains were not able to completely explain the relation between age and FDM. Conclusion Normal aging has a negative influence on a complex aspect of FDM, however, other aspects appear to be unaffected by normal aging or improve. PMID:28792973
DuBay, Derek A; Redden, David T; Bryant, Mary K; Dorn, David P; Fouad, Mona N; Gray, Stephen H; White, Jared A; Locke, Jayme E; Meeks, Christopher B; Taylor, Garry C; Kilgore, Meredith L; Eckhoff, Devin E
2014-05-27
The strategy of evaluating every donation opportunity warrants an investigation into the financial feasibility of this practice. The purpose of this investigation is to measure resource utilization required for procurement of transplantable organs in an organ procurement organization (OPO). Donors were stratified into those that met OPTN-defined eligible death criteria (ED donors, n=589) and those that did not (NED donors, n=703). Variable direct costs and time utilization by OPO staff for organ procurement were measured and amortized per organ transplanted using permutation methods and statistical bootstrapping/resampling approaches. More organs per donor were procured (3.66±1.2 vs. 2.34±0.8, P<0.0001) and transplanted (3.51±1.2 vs. 2.08±0.8, P<0.0001) in ED donors compared with NED donors. The variable direct costs were significantly lower in the NED donors ($29,879.4±11590.1 vs. $19,019.6±7599.60, P<0.0001). In contrast, the amortized variable direct costs per organ transplanted were significantly higher in the NED donors ($8,414.5±138.29 vs. $9,272.04±344.56, P<0.0001). The ED donors where thoracic organ procurement occurred were 67% more expensive than in abdominal-only organ procurement. The total time allocated per donor was significantly shorter in the NED donors (91.2±44.9 hr vs. 86.8±78.6 hr, P=0.01). In contrast, the amortized time per organ transplanted was significantly longer in the NED donors (23.1±0.8 hr vs. 36.9±3.2 hr, P<0.001). The variable direct costs and time allocated per organ transplanted is significantly higher in donors that do not meet the eligible death criteria.
Assessing Mediational Models: Testing and Interval Estimation for Indirect Effects.
Biesanz, Jeremy C; Falk, Carl F; Savalei, Victoria
2010-08-06
Theoretical models specifying indirect or mediated effects are common in the social sciences. An indirect effect exists when an independent variable's influence on the dependent variable is mediated through an intervening variable. Classic approaches to assessing such mediational hypotheses ( Baron & Kenny, 1986 ; Sobel, 1982 ) have in recent years been supplemented by computationally intensive methods such as bootstrapping, the distribution of the product methods, and hierarchical Bayesian Markov chain Monte Carlo (MCMC) methods. These different approaches for assessing mediation are illustrated using data from Dunn, Biesanz, Human, and Finn (2007). However, little is known about how these methods perform relative to each other, particularly in more challenging situations, such as with data that are incomplete and/or nonnormal. This article presents an extensive Monte Carlo simulation evaluating a host of approaches for assessing mediation. We examine Type I error rates, power, and coverage. We study normal and nonnormal data as well as complete and incomplete data. In addition, we adapt a method, recently proposed in statistical literature, that does not rely on confidence intervals (CIs) to test the null hypothesis of no indirect effect. The results suggest that the new inferential method-the partial posterior p value-slightly outperforms existing ones in terms of maintaining Type I error rates while maximizing power, especially with incomplete data. Among confidence interval approaches, the bias-corrected accelerated (BC a ) bootstrapping approach often has inflated Type I error rates and inconsistent coverage and is not recommended; In contrast, the bootstrapped percentile confidence interval and the hierarchical Bayesian MCMC method perform best overall, maintaining Type I error rates, exhibiting reasonable power, and producing stable and accurate coverage rates.
Laudet, V
1997-12-01
From a database containing the published nuclear hormone receptor (NR) sequences I constructed an alignment of the C, D and E domains of these molecules. Using this alignment, I have performed tree reconstruction using both distance matrix and parsimony analysis. The robustness of each branch was estimated using bootstrap resampling methods. The trees constructed by these two methods gave congruent topologies. From these analyses I defined six NR subfamilies: (i) a large one clustering thyroid hormone receptors (TRs), retinoic acid receptors (RARs), peroxisome proliferator-activated receptors (PPARs), vitamin D receptors (VDRs) and ecdysone receptors (EcRs) as well as numerous orphan receptors such as RORs or Rev-erbs; (ii) one containing retinoid X receptors (RXRs) together with COUP, HNF4, tailless, TR2 and TR4 orphan receptors; (iii) one containing steroid receptors; (iv) one containing the NGFIB orphan receptors; (v) one containing FTZ-F1 orphan receptors; and finally (vi) one containing to date only one gene, the GCNF1 orphan receptor. The relationships between the six subfamilies are not known except for subfamilies I and IV which appear to be related. Interestingly, most of the liganded receptors appear to be derived when compared with orphan receptors. This suggests that the ligand-binding ability of NRs has been gained by orphan receptors during the course of evolution to give rise to the presently known receptors. The distribution into six subfamilies correlates with the known abilities of the various NRs to bind to DNA as homo- or heterodimers. For example, receptors heterodimerizing efficiently with RXR belong to the first or the fourth subfamilies. I suggest that the ability to heterodimerize evolved once, just before the separation of subfamilies I and IV and that the first NR was able to bind to DNA as a homodimer. From the study of NR sequences existing in vertebrates, arthropods and nematodes, I define two major steps of NR diversification: one that took place very early, probably during the multicellularization event leading to all the metazoan phyla, and a second occurring later on, corresponding to the advent of vertebrates. Finally, I show that in vertebrate species the various groups of NRs accumulated mutations at very different rates.
Locations and magnitudes of historical earthquakes in the Sierra of Ecuador (1587-1996)
NASA Astrophysics Data System (ADS)
Beauval, Céline; Yepes, Hugo; Bakun, William H.; Egred, José; Alvarado, Alexandra; Singaucho, Juan-Carlos
2010-06-01
The whole territory of Ecuador is exposed to seismic hazard. Great earthquakes can occur in the subduction zone (e.g. Esmeraldas, 1906, Mw 8.8), whereas lower magnitude but shallower and potentially more destructive earthquakes can occur in the highlands. This study focuses on the historical crustal earthquakes of the Andean Cordillera. Several large cities are located in the Interandean Valley, among them Quito, the capital (~2.5 millions inhabitants). A total population of ~6 millions inhabitants currently live in the highlands, raising the seismic risk. At present, precise instrumental data for the Ecuadorian territory is not available for periods earlier than 1990 (beginning date of the revised instrumental Ecuadorian seismic catalogue); therefore historical data are of utmost importance for assessing seismic hazard. In this study, the Bakun & Wentworth method is applied in order to determine magnitudes, locations, and associated uncertainties for historical earthquakes of the Sierra over the period 1587-1976. An intensity-magnitude equation is derived from the four most reliable instrumental earthquakes (Mw between 5.3 and 7.1). Intensity data available per historical earthquake vary between 10 (Quito, 1587, Intensity >=VI) and 117 (Riobamba, 1797, Intensity >=III). The bootstrap resampling technique is coupled to the B&W method for deriving geographical confidence contours for the intensity centre depending on the data set of each earthquake, as well as confidence intervals for the magnitude. The extension of the area delineating the intensity centre location at the 67 per cent confidence level (+/-1σ) depends on the amount of intensity data, on their internal coherence, on the number of intensity degrees available, and on their spatial distribution. Special attention is dedicated to the few earthquakes described by intensities reaching IX, X and XI degrees. Twenty-five events are studied, and nineteen new epicentral locations are obtained, yielding equivalent moment magnitudes between 5.0 and 7.6. Large earthquakes seem to be related to strike slip faults between the North Andean Block and stable South America to the east, while moderate earthquakes (Mw <= 6) seem to be associated with to thrust faults located on the western internal slopes of the Interandean Valley.
Locations and magnitudes of historical earthquakes in the Sierra of Ecuador (1587–1996)
Beauval, Celine; Yepes, Hugo; Bakun, William H.; Egred, Jose; Alvarado, Alexandra; Singaucho, Juan-Carlos
2010-01-01
The whole territory of Ecuador is exposed to seismic hazard. Great earthquakes can occur in the subduction zone (e.g. Esmeraldas, 1906, Mw8.8), whereas lower magnitude but shallower and potentially more destructive earthquakes can occur in the highlands. This study focuses on the historical crustal earthquakes of the Andean Cordillera. Several large cities are located in the Interandean Valley, among them Quito, the capital (∼2.5 millions inhabitants). A total population of ∼6 millions inhabitants currently live in the highlands, raising the seismic risk. At present, precise instrumental data for the Ecuadorian territory is not available for periods earlier than 1990 (beginning date of the revised instrumental Ecuadorian seismic catalogue); therefore historical data are of utmost importance for assessing seismic hazard. In this study, the Bakun & Wentworth method is applied in order to determine magnitudes, locations, and associated uncertainties for historical earthquakes of the Sierra over the period 1587–1976. An intensity-magnitude equation is derived from the four most reliable instrumental earthquakes (Mwbetween 5.3 and 7.1). Intensity data available per historical earthquake vary between 10 (Quito, 1587, Intensity ≥VI) and 117 (Riobamba, 1797, Intensity ≥III). The bootstrap resampling technique is coupled to the B&W method for deriving geographical confidence contours for the intensity centre depending on the data set of each earthquake, as well as confidence intervals for the magnitude. The extension of the area delineating the intensity centre location at the 67 per cent confidence level (±1σ) depends on the amount of intensity data, on their internal coherence, on the number of intensity degrees available, and on their spatial distribution. Special attention is dedicated to the few earthquakes described by intensities reaching IX, X and XI degrees. Twenty-five events are studied, and nineteen new epicentral locations are obtained, yielding equivalent moment magnitudes between 5.0 and 7.6. Large earthquakes seem to be related to strike slip faults between the North Andean Block and stable South America to the east, while moderate earthquakes (Mw≤ 6) seem to be associated with to thrust faults located on the western internal slopes of the Interandean Valley.
Information-Based Analysis of Data Assimilation (Invited)
NASA Astrophysics Data System (ADS)
Nearing, G. S.; Gupta, H. V.; Crow, W. T.; Gong, W.
2013-12-01
Data assimilation is defined as the Bayesian conditioning of uncertain model simulations on observations for the purpose of reducing uncertainty about model states. Practical data assimilation methods make the application of Bayes' law tractable either by employing assumptions about the prior, posterior and likelihood distributions (e.g., the Kalman family of filters) or by using resampling methods (e.g., bootstrap filter). We propose to quantify the efficiency of these approximations in an OSSE setting using information theory and, in an OSSE or real-world validation setting, to measure the amount - and more importantly, the quality - of information extracted from observations during data assimilation. To analyze DA assumptions, uncertainty is quantified as the Shannon-type entropy of a discretized probability distribution. The maximum amount of information that can be extracted from observations about model states is the mutual information between states and observations, which is equal to the reduction in entropy in our estimate of the state due to Bayesian filtering. The difference between this potential and the actual reduction in entropy due to Kalman (or other type of) filtering measures the inefficiency of the filter assumptions. Residual uncertainty in DA posterior state estimates can be attributed to three sources: (i) non-injectivity of the observation operator, (ii) noise in the observations, and (iii) filter approximations. The contribution of each of these sources is measurable in an OSSE setting. The amount of information extracted from observations by data assimilation (or system identification, including parameter estimation) can also be measured by Shannon's theory. Since practical filters are approximations of Bayes' law, it is important to know whether the information that is extracted form observations by a filter is reliable. We define information as either good or bad, and propose to measure these two types of information using partial Kullback-Leibler divergences. Defined this way, good and bad information sum to total information. This segregation of information into good and bad components requires a validation target distribution; in a DA OSSE setting, this can be the true Bayesian posterior, but in a real-world setting the validation target might be determined by a set of in situ observations.
NASA Astrophysics Data System (ADS)
Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander
2017-04-01
Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.
Complex relationship between seasonal streamflow forecast skill and value in reservoir operations
NASA Astrophysics Data System (ADS)
Turner, Sean W. D.; Bennett, James C.; Robertson, David E.; Galelli, Stefano
2017-09-01
Considerable research effort has recently been directed at improving and operationalising ensemble seasonal streamflow forecasts. Whilst this creates new opportunities for improving the performance of water resources systems, there may also be associated risks. Here, we explore these potential risks by examining the sensitivity of forecast value (improvement in system performance brought about by adopting forecasts) to changes in the forecast skill for a range of hypothetical reservoir designs with contrasting operating objectives. Forecast-informed operations are simulated using rolling horizon, adaptive control and then benchmarked against optimised control rules to assess performance improvements. Results show that there exists a strong relationship between forecast skill and value for systems operated to maintain a target water level. But this relationship breaks down when the reservoir is operated to satisfy a target demand for water; good forecast accuracy does not necessarily translate into performance improvement. We show that the primary cause of this behaviour is the buffering role played by storage in water supply reservoirs, which renders the forecast superfluous for long periods of the operation. System performance depends primarily on forecast accuracy when critical decisions are made - namely during severe drought. As it is not possible to know in advance if a forecast will perform well at such moments, we advocate measuring the consistency of forecast performance, through bootstrap resampling, to indicate potential usefulness in storage operations. Our results highlight the need for sensitivity assessment in value-of-forecast studies involving reservoirs with supply objectives.
Duwe, Grant; Freske, Pamela J
2012-08-01
This study presents the results from efforts to revise the Minnesota Sex Offender Screening Tool-Revised (MnSOST-R), one of the most widely used sex offender risk-assessment tools. The updated instrument, the MnSOST-3, contains nine individual items, six of which are new. The population for this study consisted of the cross-validation sample for the MnSOST-R (N = 220) and a contemporary sample of 2,315 sex offenders released from Minnesota prisons between 2003 and 2006. To score and select items for the MnSOST-3, we used predicted probabilities generated from a multiple logistic regression model. We used bootstrap resampling to not only refine our selection of predictors but also internally validate the model. The results indicate the MnSOST-3 has a relatively high level of predictive discrimination, as evidenced by an apparent AUC of .821 and an optimism-corrected AUC of .796. The findings show the MnSOST-3 is well calibrated with actual recidivism rates for all but the highest risk offenders. Although estimating a penalized maximum likelihood model did not improve the overall calibration, the results suggest the MnSOST-3 may still be useful in helping identify high-risk offenders whose sexual recidivism risk exceeds 50%. Results from an interrater reliability assessment indicate the instrument, which is scored in a Microsoft Excel application, has an adequate degree of consistency across raters (ICC = .83 for both consistency and absolute agreement).
Complex relationship between seasonal streamflow forecast skill and value in reservoir operations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Turner, Sean W. D.; Bennett, James C.; Robertson, David E.
Considerable research effort has recently been directed at improving and operationalising ensemble seasonal streamflow forecasts. Whilst this creates new opportunities for improving the performance of water resources systems, there may also be associated risks. Here, we explore these potential risks by examining the sensitivity of forecast value (improvement in system performance brought about by adopting forecasts) to changes in the forecast skill for a range of hypothetical reservoir designs with contrasting operating objectives. Forecast-informed operations are simulated using rolling horizon, adaptive control and then benchmarked against optimised control rules to assess performance improvements. Results show that there exists a strongmore » relationship between forecast skill and value for systems operated to maintain a target water level. But this relationship breaks down when the reservoir is operated to satisfy a target demand for water; good forecast accuracy does not necessarily translate into performance improvement. We show that the primary cause of this behaviour is the buffering role played by storage in water supply reservoirs, which renders the forecast superfluous for long periods of the operation. System performance depends primarily on forecast accuracy when critical decisions are made – namely during severe drought. As it is not possible to know in advance if a forecast will perform well at such moments, we advocate measuring the consistency of forecast performance, through bootstrap resampling, to indicate potential usefulness in storage operations. Our results highlight the need for sensitivity assessment in value-of-forecast studies involving reservoirs with supply objectives.« less
Yoon, Jung-Hoon; Kang, So-Jung; Jung, Yong-Taek; Oh, Tae-Kwang
2008-09-01
A Gram-negative, non-motile, pleomorphic bacterial strain, designated SMK-142(T), was isolated from a tidal flat of the Yellow Sea, Korea, and was subjected to a polyphasic taxonomic study. Strain SMK-142(T) grew optimally at pH 7.0-8.0, 25 degrees C and in the presence of 2% (w/v) NaCl. Phylogenetic analyses based on 16S rRNA gene sequences showed that strain SMK-142(T) clustered with Lutibacter litoralis with which it exhibited a 16S rRNA gene sequence similarity value of 91.2%. This cluster joined the clade comprising the genera Tenacibaculum and Polaribacter at a high bootstrap resampling value. Strain SMK-142(T) contained MK-6 as the predominant menaquinone and iso-C(15:0), iso-C(15:1) and iso-C(17:0) 3-OH as the major fatty acids. The DNA G+C content was 37.2 mol%. Strain SMK-142(T) was differentiated from three phylogenetically related genera, Lutibacter, Tenacibaculum and Polaribacter, on the basis of low 16S rRNA gene sequence similarity values and differences in fatty acid profiles and in some phenotypic properties. On the basis of phenotypic, chemotaxonomic and phylogenetic data, strain SMK-142(T) represents a novel genus and species for which the name Aestuariicola saemankumensis gen. nov., sp. nov. is proposed (phylum Bacteroidetes, family Flavobacteriaceae). The type strain of the type species, Aestuariicola saemankumensis sp. nov., is SMK-142(T) (=KCTC 22171(T)=CCUG 55329(T)).
Dutcher, Christina D; Vujanovic, Anka A; Paulus, Daniel J; Bartlett, Brooke A
2017-09-01
Emotion regulation difficulties are a potentially key mechanism underlying the association between childhood maltreatment and alcohol use in adulthood. The current study examined the mediating role of emotion regulation difficulties in the association between childhood maltreatment severity (i.e., Childhood Trauma Questionnaire total score) and past-month alcohol use severity, including alcohol consumption frequency and alcohol-related problems (i.e., number of days of alcohol problems, ratings of "bother" caused by alcohol problems, ratings of treatment importance for alcohol problems). Participants included 111 acute-care psychiatric inpatients (45.0% female; Mage=33.5, SD=10.6), who reported at least one DSM-5 posttraumatic stress disorder Criterion A traumatic event, indexed via the Life Events Checklist for DSM-5. Participants completed questionnaires regarding childhood maltreatment, emotion regulation difficulties, and alcohol use. A significant indirect effect of childhood maltreatment severity via emotion regulation difficulties in relation to alcohol use severity (β=0.07, SE=0.04, 99% CI [0.01, 0.21]) was documented. Specifically, significant indirect effects were found for childhood maltreatment severity via emotion regulation difficulties in relation to alcohol problems (β's between 0.05 and 0.12; all 99% bootstrapped CIs with 10,000 resamples did not include 0) but not alcohol consumption. Emotion regulation difficulties may play a significant role in the association between childhood maltreatment severity and alcohol outcomes. Clinical implications are discussed. Copyright © 2017 Elsevier Inc. All rights reserved.
Khalaila, Rabia; Vitman-Schorr, Adi
2018-02-01
The increase in longevity of people on one hand, and on the other hand the fact that the social networks in later life become increasingly narrower, highlights the importance of Internet use to enhance quality of life (QoL). However, whether Internet use increases or decreases social networks, loneliness, and quality of life is not clear-cut. To explore the direct and/or indirect effects of Internet use on QoL, and to examine whether ethnicity and time the elderly spent with family moderate the mediation effect of Internet use on quality of life throughout loneliness. This descriptive-correlational study was carried out in 2016 by structured interviews with a convenience sample of 502 respondents aged 50 and older, living in northern Israel. Bootstrapping with resampling strategies was used for testing mediation a model. Use of the Internet was found to be positively associated with QoL. However, this relationship was mediated by loneliness, and moderated by the time the elderly spent with family members. In addition, respondents' ethnicity significantly moderated the mediation effect between Internet use and loneliness. Internet use can enhance QoL of older adults directly or indirectly by reducing loneliness. However, these effects are conditional on other variables. The indirect effect moderated by ethnicity, and the direct effect moderated by the time the elderly spend with their families. Researchers and practitioners should be aware of these interactions which can impact loneliness and quality of life of older persons differently.
Complex relationship between seasonal streamflow forecast skill and value in reservoir operations
Turner, Sean W. D.; Bennett, James C.; Robertson, David E.; ...
2017-09-28
Considerable research effort has recently been directed at improving and operationalising ensemble seasonal streamflow forecasts. Whilst this creates new opportunities for improving the performance of water resources systems, there may also be associated risks. Here, we explore these potential risks by examining the sensitivity of forecast value (improvement in system performance brought about by adopting forecasts) to changes in the forecast skill for a range of hypothetical reservoir designs with contrasting operating objectives. Forecast-informed operations are simulated using rolling horizon, adaptive control and then benchmarked against optimised control rules to assess performance improvements. Results show that there exists a strongmore » relationship between forecast skill and value for systems operated to maintain a target water level. But this relationship breaks down when the reservoir is operated to satisfy a target demand for water; good forecast accuracy does not necessarily translate into performance improvement. We show that the primary cause of this behaviour is the buffering role played by storage in water supply reservoirs, which renders the forecast superfluous for long periods of the operation. System performance depends primarily on forecast accuracy when critical decisions are made – namely during severe drought. As it is not possible to know in advance if a forecast will perform well at such moments, we advocate measuring the consistency of forecast performance, through bootstrap resampling, to indicate potential usefulness in storage operations. Our results highlight the need for sensitivity assessment in value-of-forecast studies involving reservoirs with supply objectives.« less
Environmental and Hydroclimatic Sensitivities of Greenhouse Gas (GHG) Fluxes from Coastal Wetlands
NASA Astrophysics Data System (ADS)
Abdul-Aziz, O. I.; Ishtiaq, K. S.
2016-12-01
We computed the reference environmental and hydroclimatic sensitivities of the greenhouse gas (GHG) fluxes (CO2 and CH4) from coastal salt marshes. Non-linear partial least squares regression models of CO2 (net uptake) and CH4 (net emissions) fluxes were developed with a bootstrap resampling approach using the photosynthetically active radiation (PAR), air and soil temperatures, water height, soil moisture, porewater salinity, and pH as predictors. Analytical sensitivity coefficients of different predictors were then analytically derived from the estimated models. The numerical sensitivities of the dominant drivers were determined by perturbing the variables individually and simultaneously to compute their individual and combined (respectively) effects on the GHG fluxes. Four tidal wetlands of Waquoit Bay, MA — incorporating a gradient in land-use, salinity and hydrology — were considered as the case study sites. The wetlands were dominated by native Spartina Alterniflora, and characterized by high salinity and frequent flooding. Results indicated a high sensitivity of CO2 fluxes to temperature and PAR, a moderate sensitivity to soil salinity and water height, and a weak sensitivity to pH and soil moisture. In contrast, the CH4 fluxes were more sensitive to temperature and salinity, compared to that of PAR, pH, and hydrologic variables. The estimated sensitivities and mechanistic insights can aid the management of coastal carbon under a changing climate and environment. The sensitivity coefficients also indicated the most dominant drivers of GHG fluxes for the development of a parsimonious predictive model.
Brain size growth in wild and captive chimpanzees (Pan troglodytes).
Cofran, Zachary
2018-05-24
Despite many studies of chimpanzee brain size growth, intraspecific variation is under-explored. Brain size data from chimpanzees of the Taï Forest and the Yerkes Primate Research Center enable a unique glimpse into brain growth variation as age at death is known for individuals, allowing cross-sectional growth curves to be estimated. Because Taï chimpanzees are from the wild but Yerkes apes are captive, potential environmental effects on neural development can also be explored. Previous research has revealed differences in growth and health between wild and captive primates, but such habitat effects have yet to be investigated for brain growth. Here, I use an iterative curve fitting procedure to estimate brain growth and regression parameters for each population, statistically comparing growth models using bootstrapped confidence intervals. Yerkes and Taï brain sizes overlap at all ages, although the sole Taï newborn is at the low end of captive neonatal variation. Growth rate and duration are statistically indistinguishable between the two populations. Resampling the Yerkes sample to match the Taï sample size and age group composition shows that ontogenetic variation in the two groups are remarkably similar despite the latter's limited size. Best fit growth curves for each sample indicate cessation of brain size growth at around 2 years, earlier than has previously been reported. The overall similarity between wild and captive chimpanzees points to the canalization of brain growth in this species. © 2018 Wiley Periodicals, Inc.
Mir, Taskia; Dirks, Peter; Mason, Warren P; Bernstein, Mark
2014-10-01
This is a qualitative study designed to examine patient acceptability of re-sampling surgery for glioblastoma multiforme (GBM) electively post-therapy or at asymptomatic relapse. Thirty patients were selected using the convenience sampling method and interviewed. Patients were presented with hypothetical scenarios including a scenario in which the surgery was offered to them routinely and a scenario in which the surgery was in a clinical trial. The results of the study suggest that about two thirds of the patients offered the surgery on a routine basis would be interested, and half of the patients would agree to the surgery as part of a clinical trial. Several overarching themes emerged, some of which include: patients expressed ethical concerns about offering financial incentives or compensation to the patients or surgeons involved in the study; patients were concerned about appropriate communication and full disclosure about the procedures involved, the legalities of tumor ownership and the use of the tumor post-surgery; patients may feel alone or vulnerable when they are approached about the surgery; patients and their families expressed immense trust in their surgeon and indicated that this trust is a major determinant of their agreeing to surgery. The overall positive response to re-sampling surgery suggests that this procedure, if designed with all the ethical concerns attended to, would be welcomed by most patients. This approach of asking patients beforehand if a treatment innovation is acceptable would appear to be more practical and ethically desirable than previous practice.
Reduced ion bootstrap current drive on NTM instability
NASA Astrophysics Data System (ADS)
Qu, Hongpeng; Wang, Feng; Wang, Aike; Peng, Xiaodong; Li, Jiquan
2018-05-01
The loss of bootstrap current inside magnetic island plays a dominant role in driving the neoclassical tearing mode (NTM) instability in tokamak plasmas. In this work, we investigate the finite-banana-width (FBW) effect on the profile of ion bootstrap current in the island vicinity via an analytical approach. The results show that even if the pressure gradient vanishes inside the island, the ion bootstrap current can partly survive due to the FBW effect. The efficiency of the FBW effect is higher when the island width becomes smaller. Nevertheless, even when the island width is comparable to the ion FBW, the unperturbed ion bootstrap current inside the island cannot be largely recovered by the FBW effect, and thus the current loss still exists. This suggests that FBW effect alone cannot dramatically reduce the ion bootstrap current drive on NTMs.
Bootstrap Percolation on Homogeneous Trees Has 2 Phase Transitions
NASA Astrophysics Data System (ADS)
Fontes, L. R. G.; Schonmann, R. H.
2008-09-01
We study the threshold θ bootstrap percolation model on the homogeneous tree with degree b+1, 2≤ θ≤ b, and initial density p. It is known that there exists a nontrivial critical value for p, which we call p f , such that a) for p> p f , the final bootstrapped configuration is fully occupied for almost every initial configuration, and b) if p< p f , then for almost every initial configuration, the final bootstrapped configuration has density of occupied vertices less than 1. In this paper, we establish the existence of a distinct critical value for p, p c , such that 0< p c < p f , with the following properties: 1) if p≤ p c , then for almost every initial configuration there is no infinite cluster of occupied vertices in the final bootstrapped configuration; 2) if p> p c , then for almost every initial configuration there are infinite clusters of occupied vertices in the final bootstrapped configuration. Moreover, we show that 3) for p< p c , the distribution of the occupied cluster size in the final bootstrapped configuration has an exponential tail; 4) at p= p c , the expected occupied cluster size in the final bootstrapped configuration is infinite; 5) the probability of percolation of occupied vertices in the final bootstrapped configuration is continuous on [0, p f ] and analytic on ( p c , p f ), admitting an analytic continuation from the right at p c and, only in the case θ= b, also from the left at p f .
Approximate median regression for complex survey data with skewed response.
Fraser, Raphael André; Lipsitz, Stuart R; Sinha, Debajyoti; Fitzmaurice, Garrett M; Pan, Yi
2016-12-01
The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling, and weighting. In this article, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS)'based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. © 2016, The International Biometric Society.
Approximate Median Regression for Complex Survey Data with Skewed Response
Fraser, Raphael André; Lipsitz, Stuart R.; Sinha, Debajyoti; Fitzmaurice, Garrett M.; Pan, Yi
2016-01-01
Summary The ready availability of public-use data from various large national complex surveys has immense potential for the assessment of population characteristics using regression models. Complex surveys can be used to identify risk factors for important diseases such as cancer. Existing statistical methods based on estimating equations and/or utilizing resampling methods are often not valid with survey data due to complex survey design features. That is, stratification, multistage sampling and weighting. In this paper, we accommodate these design features in the analysis of highly skewed response variables arising from large complex surveys. Specifically, we propose a double-transform-both-sides (DTBS) based estimating equations approach to estimate the median regression parameters of the highly skewed response; the DTBS approach applies the same Box-Cox type transformation twice to both the outcome and regression function. The usual sandwich variance estimate can be used in our approach, whereas a resampling approach would be needed for a pseudo-likelihood based on minimizing absolute deviations (MAD). Furthermore, the approach is relatively robust to the true underlying distribution, and has much smaller mean square error than a MAD approach. The method is motivated by an analysis of laboratory data on urinary iodine (UI) concentration from the National Health and Nutrition Examination Survey. PMID:27062562
Im, Subin; Min, Soonhong
2013-04-01
Exploratory factor analyses of the Kirton Adaption-Innovation Inventory (KAI), which serves to measure individual cognitive styles, generally indicate three factors: sufficiency of originality, efficiency, and rule/group conformity. In contrast, a 2005 study by Im and Hu using confirmatory factor analysis supported a four-factor structure, dividing the sufficiency of originality dimension into two subdimensions, idea generation and preference for change. This study extends Im and Hu's (2005) study of a derived version of the KAI by providing additional evidence of the four-factor structure. Specifically, the authors test the robustness of the parameter estimates to the violation of normality assumptions in the sample using bootstrap methods. A bias-corrected confidence interval bootstrapping procedure conducted among a sample of 356 participants--members of the Arkansas Household Research Panel, with middle SES and average age of 55.6 yr. (SD = 13.9)--showed that the four-factor model with two subdimensions of sufficiency of originality fits the data significantly better than the three-factor model in non-normality conditions.
Measuring and Benchmarking Technical Efficiency of Public Hospitals in Tianjin, China
Li, Hao; Dong, Siping
2015-01-01
China has long been stuck in applying traditional data envelopment analysis (DEA) models to measure technical efficiency of public hospitals without bias correction of efficiency scores. In this article, we have introduced the Bootstrap-DEA approach from the international literature to analyze the technical efficiency of public hospitals in Tianjin (China) and tried to improve the application of this method for benchmarking and inter-organizational learning. It is found that the bias corrected efficiency scores of Bootstrap-DEA differ significantly from those of the traditional Banker, Charnes, and Cooper (BCC) model, which means that Chinese researchers need to update their DEA models for more scientific calculation of hospital efficiency scores. Our research has helped shorten the gap between China and the international world in relative efficiency measurement and improvement of hospitals. It is suggested that Bootstrap-DEA be widely applied into afterward research to measure relative efficiency and productivity of Chinese hospitals so as to better serve for efficiency improvement and related decision making. PMID:26396090
Kanefuji, Tsutomu; Takano, Toru; Suda, Takeshi; Akazawa, Kouhei; Yokoo, Takeshi; Kamimura, Hiroteru; Kamimura, Kenya; Tsuchiya, Atsunori; Takamura, Masaaki; Kawai, Hirokazu; Yamagiwa, Satoshi; Aoyama, Hidefumi; Nomoto, Minoru; Terai, Shuji
2015-04-21
To establish a prognostic formula that distinguishes non-hypervascular hepatic nodules (NHNs) with higher aggressiveness from less hazardous one. Seventy-three NHNs were detected in gadolinium ethoxybenzyl diethylene-triamine-pentaacetic-acid magnetic resonance imaging (Gd-EOB-DTPA-MRI) study and confirmed to change 2 mm or more in size and/or to gain hypervascularity. All images were interpreted independently by an experienced, board-certified abdominal radiologist and hepatologist; both knew that the patients were at risk for hepatocellular carcinoma development but were blinded to the clinical information. A formula predicting NHN destiny was developed using a generalized estimating equation model with thirteen explanatory variables: age, gender, background liver diseases, Child-Pugh class, NHN diameter, T1-weighted imaging/T2-weighted imaging detectability, fat deposition, lower signal intensity in arterial phase, lower signal intensity in equilibrium phase, α-fetoprotein, des-γ-carboxy prothrombin, α-fetoprotein-L3, and coexistence of classical hepatocellular carcinoma. The accuracy of the formula was validated in bootstrap samples that were created by resampling of 1000 iterations. During a median follow-up period of 504 d, 73 NHNs with a median diameter of 9 mm (interquartile range: 8-12 mm) grew or shrank by 68.5% (fifty nodules) or 20.5% (fifteen nodules), respectively, whereas hypervascularity developed in 38.4% (twenty eight nodules). In the fifteen shrank nodules, twelve nodules disappeared, while 11.0% (eight nodules) were stable in size but acquired vascularity. A generalized estimating equation analysis selected five explanatories from the thirteen variables as significant factors to predict NHN progression. The estimated regression coefficients were 0.36 for age, 6.51 for lower signal intensity in arterial phase, 8.70 or 6.03 for positivity of hepatitis B virus or hepatitis C virus, 9.37 for des-γ-carboxy prothrombin, and -4.05 for fat deposition. A formula incorporating the five coefficients revealed sensitivity, specificity, and accuracy of 88.0%, 86.7%, and 87.7% in the formulating cohort, whereas these of 87.2% ± 5.7%, 83.8% ± 13.6%, and 87.3% ± 4.5% in the bootstrap samples. These data suggest that the formula helps Gd-EOB-DTPA-MRI detect a trend toward hepatocyte transformation by predicting NHN destiny.
Kappa statistic for clustered dichotomous responses from physicians and patients.
Kang, Chaeryon; Qaqish, Bahjat; Monaco, Jane; Sheridan, Stacey L; Cai, Jianwen
2013-09-20
The bootstrap method for estimating the standard error of the kappa statistic in the presence of clustered data is evaluated. Such data arise, for example, in assessing agreement between physicians and their patients regarding their understanding of the physician-patient interaction and discussions. We propose a computationally efficient procedure for generating correlated dichotomous responses for physicians and assigned patients for simulation studies. The simulation result demonstrates that the proposed bootstrap method produces better estimate of the standard error and better coverage performance compared with the asymptotic standard error estimate that ignores dependence among patients within physicians with at least a moderately large number of clusters. We present an example of an application to a coronary heart disease prevention study. Copyright © 2013 John Wiley & Sons, Ltd.
Accelerated spike resampling for accurate multiple testing controls.
Harrison, Matthew T
2013-02-01
Controlling for multiple hypothesis tests using standard spike resampling techniques often requires prohibitive amounts of computation. Importance sampling techniques can be used to accelerate the computation. The general theory is presented, along with specific examples for testing differences across conditions using permutation tests and for testing pairwise synchrony and precise lagged-correlation between many simultaneously recorded spike trains using interval jitter.
Exact and Monte carlo resampling procedures for the Wilcoxon-Mann-Whitney and Kruskal-Wallis tests.
Berry, K J; Mielke, P W
2000-12-01
Exact and Monte Carlo resampling FORTRAN programs are described for the Wilcoxon-Mann-Whitney rank sum test and the Kruskal-Wallis one-way analysis of variance for ranks test. The program algorithms compensate for tied values and do not depend on asymptotic approximations for probability values, unlike most algorithms contained in PC-based statistical software packages.
Darling, Stephen; Parker, Mary-Jane; Goodall, Karen E; Havelka, Jelena; Allen, Richard J
2014-03-01
When participants carry out visually presented digit serial recall, their performance is better if they are given the opportunity to encode extra visuospatial information at encoding-a phenomenon that has been termed visuospatial bootstrapping. This bootstrapping is the result of integration of information from different modality-specific short-term memory systems and visuospatial knowledge in long term memory, and it can be understood in the context of recent models of working memory that address multimodal binding (e.g., models incorporating an episodic buffer). Here we report a cross-sectional developmental study that demonstrated visuospatial bootstrapping in adults (n=18) and 9-year-old children (n=15) but not in 6-year-old children (n=18). This is the first developmental study addressing visuospatial bootstrapping, and results demonstrate that the developmental trajectory of bootstrapping is different from that of basic verbal and visuospatial working memory. This pattern suggests that bootstrapping (and hence integrative functions such as those associated with the episodic buffer) emerge independent of the development of basic working memory slave systems during childhood. Copyright © 2013 Elsevier Inc. All rights reserved.
Pesticides in Wyoming Groundwater, 2008-10
Eddy-Miller, Cheryl A.; Bartos, Timothy T.; Taylor, Michelle L.
2013-01-01
Groundwater samples were collected from 296 wells during 1995-2006 as part of a baseline study of pesticides in Wyoming groundwater. In 2009, a previous report summarized the results of the baseline sampling and the statistical evaluation of the occurrence of pesticides in relation to selected natural and anthropogenic (human-related) characteristics. During 2008-10, the U.S. Geological Survey, in cooperation with the Wyoming Department of Agriculture, resampled a subset (52) of the 296 wells sampled during 1995-2006 baseline study in order to compare detected compounds and respective concentrations between the two sampling periods and to evaluate the detections of new compounds. The 52 wells were distributed similarly to sites used in the 1995-2006 baseline study with respect to geographic area and land use within the geographic area of interest. Because of the use of different types of reporting levels and variability in reporting-level values during both the 1995-2006 baseline study and the 2008-10 resampling study, analytical results received from the laboratory were recensored. Two levels of recensoring were used to compare pesticides—a compound-specific assessment level (CSAL) that differed by compound and a common assessment level (CAL) of 0.07 microgram per liter. The recensoring techniques and values used for both studies, with the exception of the pesticide 2,4-D methyl ester, were the same. Twenty-eight different pesticides were detected in samples from the 52 wells during the 2008-10 resampling study. Pesticide concentrations were compared with several U.S. Environmental Protection Agency drinking-water standards or health advisories for finished (treated) water established under the Safe Drinking Water Act. All detected pesticides were measured at concentrations smaller than U.S. Environmental Protection Agency drinking-water standards or health advisories where applicable (many pesticides did not have standards or advisories). One or more pesticides were detected at concentrations greater than the CAL in water from 16 of 52 wells sampled (about 31 percent) during the resampling study. Detected pesticides were classified into one of six types: herbicides, herbicide degradates, insecticides, insecticide degradates, fungicides, or fungicide degradates. At least 95 percent of detected pesticides were classified as herbicides or herbicide degradates. The number of different pesticides detected in samples from the 52 wells was similar between the 1995-2006 baseline study (30 different pesticides) and 2008-2010 resampling study (28 different pesticides). Thirteen pesticides were detected during both studies. The change in the number of pesticides detected (without regard to which pesticide was detected) in groundwater samples from each of the 52 wells was evaluated and the number of pesticides detected in groundwater did not change for most of the wells (32). Of those that did have a difference between the two studies, 17 wells had more pesticide detections in groundwater during the 1995-2006 baseline study, whereas only 3 wells had more detections during the 2008-2010 resampling study. The difference in pesticide concentrations in groundwater samples from each of the 52 wells was determined. Few changes in concentration between the 1995-2006 baseline study and the 2008-2010 resampling study were seen for most detected pesticides. Seven pesticides had a greater concentration detected in the groundwater from the same well during the baseline sampling compared to the resampling study. Concentrations of prometon, which was detected in 17 wells, were greater in the baseline study sample compared to the resampling study sample from the same well 100 percent of the time. The change in the number of pesticides detected (without regard to which pesticide was detected) in groundwater samples from each of the 52 wells with respect to land use and geographic area was calculated. All wells with land use classified as agricultural had the same or a smaller number of pesticides detected in the resampling study compared to the baseline study. All wells in the Bighorn Basin geographic area also had the same or a smaller number of pesticides detected in the resampling study compared to the baseline study.
NASA Astrophysics Data System (ADS)
Wani, Omar; Beckers, Joost V. L.; Weerts, Albrecht H.; Solomatine, Dimitri P.
2017-08-01
A non-parametric method is applied to quantify residual uncertainty in hydrologic streamflow forecasting. This method acts as a post-processor on deterministic model forecasts and generates a residual uncertainty distribution. Based on instance-based learning, it uses a k nearest-neighbour search for similar historical hydrometeorological conditions to determine uncertainty intervals from a set of historical errors, i.e. discrepancies between past forecast and observation. The performance of this method is assessed using test cases of hydrologic forecasting in two UK rivers: the Severn and Brue. Forecasts in retrospect were made and their uncertainties were estimated using kNN resampling and two alternative uncertainty estimators: quantile regression (QR) and uncertainty estimation based on local errors and clustering (UNEEC). Results show that kNN uncertainty estimation produces accurate and narrow uncertainty intervals with good probability coverage. Analysis also shows that the performance of this technique depends on the choice of search space. Nevertheless, the accuracy and reliability of uncertainty intervals generated using kNN resampling are at least comparable to those produced by QR and UNEEC. It is concluded that kNN uncertainty estimation is an interesting alternative to other post-processors, like QR and UNEEC, for estimating forecast uncertainty. Apart from its concept being simple and well understood, an advantage of this method is that it is relatively easy to implement.
NASA Astrophysics Data System (ADS)
Carpenter, Matthew H.; Jernigan, J. G.
2007-05-01
We present examples of an analysis progression consisting of a synthesis of the Photon Clean Method (Carpenter, Jernigan, Brown, Beiersdorfer 2007) and bootstrap methods to quantify errors and variations in many-parameter models. The Photon Clean Method (PCM) works well for model spaces with large numbers of parameters proportional to the number of photons, therefore a Monte Carlo paradigm is a natural numerical approach. Consequently, PCM, an "inverse Monte-Carlo" method, requires a new approach for quantifying errors as compared to common analysis methods for fitting models of low dimensionality. This presentation will explore the methodology and presentation of analysis results derived from a variety of public data sets, including observations with XMM-Newton, Chandra, and other NASA missions. Special attention is given to the visualization of both data and models including dynamic interactive presentations. This work was performed under the auspices of the Department of Energy under contract No. W-7405-Eng-48. We thank Peter Beiersdorfer and Greg Brown for their support of this technical portion of a larger program related to science with the LLNL EBIT program.
Reliability of dose volume constraint inference from clinical data.
Lutz, C M; Møller, D S; Hoffmann, L; Knap, M M; Alber, M
2017-04-21
Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an 'ideal' cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a 'non-ideal' cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates >[Formula: see text] were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.
Reliability of dose volume constraint inference from clinical data
NASA Astrophysics Data System (ADS)
Lutz, C. M.; Møller, D. S.; Hoffmann, L.; Knap, M. M.; Alber, M.
2017-04-01
Dose volume histogram points (DVHPs) frequently serve as dose constraints in radiotherapy treatment planning. An experiment was designed to investigate the reliability of DVHP inference from clinical data for multiple cohort sizes and complication incidence rates. The experimental background was radiation pneumonitis in non-small cell lung cancer and the DVHP inference method was based on logistic regression. From 102 NSCLC real-life dose distributions and a postulated DVHP model, an ‘ideal’ cohort was generated where the most predictive model was equal to the postulated model. A bootstrap and a Cohort Replication Monte Carlo (CoRepMC) approach were applied to create 1000 equally sized populations each. The cohorts were then analyzed to establish inference frequency distributions. This was applied to nine scenarios for cohort sizes of 102 (1), 500 (2) to 2000 (3) patients (by sampling with replacement) and three postulated DVHP models. The Bootstrap was repeated for a ‘non-ideal’ cohort, where the most predictive model did not coincide with the postulated model. The Bootstrap produced chaotic results for all models of cohort size 1 for both the ideal and non-ideal cohorts. For cohort size 2 and 3, the distributions for all populations were more concentrated around the postulated DVHP. For the CoRepMC, the inference frequency increased with cohort size and incidence rate. Correct inference rates >85 % were only achieved by cohorts with more than 500 patients. Both Bootstrap and CoRepMC indicate that inference of the correct or approximate DVHP for typical cohort sizes is highly uncertain. CoRepMC results were less spurious than Bootstrap results, demonstrating the large influence that randomness in dose-response has on the statistical analysis.
Experimental study of digital image processing techniques for LANDSAT data
NASA Technical Reports Server (NTRS)
Rifman, S. S. (Principal Investigator); Allendoerfer, W. B.; Caron, R. H.; Pemberton, L. J.; Mckinnon, D. M.; Polanski, G.; Simon, K. W.
1976-01-01
The author has identified the following significant results. Results are reported for: (1) subscene registration, (2) full scene rectification and registration, (3) resampling techniques, (4) and ground control point (GCP) extraction. Subscenes (354 pixels x 234 lines) were registered to approximately 1/4 pixel accuracy and evaluated by change detection imagery for three cases: (1) bulk data registration, (2) precision correction of a reference subscene using GCP data, and (3) independently precision processed subscenes. Full scene rectification and registration results were evaluated by using a correlation technique to measure registration errors of 0.3 pixel rms thoughout the full scene. Resampling evaluations of nearest neighbor and TRW cubic convolution processed data included change detection imagery and feature classification. Resampled data were also evaluated for an MSS scene containing specular solar reflections.
ERIC Educational Resources Information Center
Enders, Craig K.
2005-01-01
The Bollen-Stine bootstrap can be used to correct for standard error and fit statistic bias that occurs in structural equation modeling (SEM) applications due to nonnormal data. The purpose of this article is to demonstrate the use of a custom SAS macro program that can be used to implement the Bollen-Stine bootstrap with existing SEM software.…
NASA Astrophysics Data System (ADS)
Monticello, D. A.; Reiman, A. H.; Watanabe, K. Y.; Nakajima, N.; Okamoto, M.
1997-11-01
The existence of bootstrap currents in both tokamaks and stellarators was confirmed, experimentally, more than ten years ago. Such currents can have significant effects on the equilibrium and stability of these MHD devices. In addition, stellarators, with the notable exception of W7-X, are predicted to have such large bootstrap currents that reliable equilibrium calculations require the self-consistent evaluation of bootstrap currents. Modeling of discharges which contain islands requires an algorithm that does not assume good surfaces. Only one of the two 3-D equilibrium codes that exist, PIES( Reiman, A. H., Greenside, H. S., Compt. Phys. Commun. 43), (1986)., can easily be modified to handle bootstrap current. Here we report on the coupling of the PIES 3-D equilibrium code and NIFS bootstrap code(Watanabe, K., et al., Nuclear Fusion 35) (1995), 335.
Synchronizing data from irregularly sampled sensors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uluyol, Onder
A system and method include receiving a set of sampled measurements for each of multiple sensors, wherein the sampled measurements are at irregular intervals or different rates, re-sampling the sampled measurements of each of the multiple sensors at a higher rate than one of the sensor's set of sampled measurements, and synchronizing the sampled measurements of each of the multiple sensors.
J. Travis Swaim; Daniel C. Dey; Michael R. Saunders; Dale R. Weigel; Christopher D. Thornton; John M. Kabrick; Michael A. Jenkins
2016-01-01
We resampled plots from a repeated measures study implemented on the Hoosier National Forest (HNF) in southern Indiana in 1988 to investigate the influence of site and seedling physical attributes on height growth and establishment success of oak species (Quercus spp.) reproduction in stands regenerated by the clearcut method. Before harvest, an...
Pham-The, Hai; Casañola-Martin, Gerardo; Garrigues, Teresa; Bermejo, Marival; González-Álvarez, Isabel; Nguyen-Hai, Nam; Cabrera-Pérez, Miguel Ángel; Le-Thi-Thu, Huong
2016-02-01
In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbalanced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, various strategies including cost-sensitive learning and resampling methods were studied to tackle the moderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were constructed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data where misclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. A consensus model with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to predict a set of randomly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strategies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems.
Deep learning ensemble with asymptotic techniques for oscillometric blood pressure estimation.
Lee, Soojeong; Chang, Joon-Hyuk
2017-11-01
This paper proposes a deep learning based ensemble regression estimator with asymptotic techniques, and offers a method that can decrease uncertainty for oscillometric blood pressure (BP) measurements using the bootstrap and Monte-Carlo approach. While the former is used to estimate SBP and DBP, the latter attempts to determine confidence intervals (CIs) for SBP and DBP based on oscillometric BP measurements. This work originally employs deep belief networks (DBN)-deep neural networks (DNN) to effectively estimate BPs based on oscillometric measurements. However, there are some inherent problems with these methods. First, it is not easy to determine the best DBN-DNN estimator, and worthy information might be omitted when selecting one DBN-DNN estimator and discarding the others. Additionally, our input feature vectors, obtained from only five measurements per subject, represent a very small sample size; this is a critical weakness when using the DBN-DNN technique and can cause overfitting or underfitting, depending on the structure of the algorithm. To address these problems, an ensemble with an asymptotic approach (based on combining the bootstrap with the DBN-DNN technique) is utilized to generate the pseudo features needed to estimate the SBP and DBP. In the first stage, the bootstrap-aggregation technique is used to create ensemble parameters. Afterward, the AdaBoost approach is employed for the second-stage SBP and DBP estimation. We then use the bootstrap and Monte-Carlo techniques in order to determine the CIs based on the target BP estimated using the DBN-DNN ensemble regression estimator with the asymptotic technique in the third stage. The proposed method can mitigate the estimation uncertainty such as large the standard deviation of error (SDE) on comparing the proposed DBN-DNN ensemble regression estimator with the DBN-DNN single regression estimator, we identify that the SDEs of the SBP and DBP are reduced by 0.58 and 0.57 mmHg, respectively. These indicate that the proposed method actually enhances the performance by 9.18% and 10.88% compared with the DBN-DNN single estimator. The proposed methodology improves the accuracy of BP estimation and reduces the uncertainty for BP estimation. Copyright © 2017 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Taylor, Ronald C.; Sanfilippo, Antonio P.; McDermott, Jason E.
2011-02-18
Transcriptional regulatory networks are being determined using “reverse engineering” methods that infer connections based on correlations in gene state. Corroboration of such networks through independent means such as evidence from the biomedical literature is desirable. Here, we explore a novel approach, a bootstrapping version of our previous Cross-Ontological Analytic method (XOA) that can be used for semi-automated annotation and verification of inferred regulatory connections, as well as for discovery of additional functional relationships between the genes. First, we use our annotation and network expansion method on a biological network learned entirely from the literature. We show how new relevant linksmore » between genes can be iteratively derived using a gene similarity measure based on the Gene Ontology that is optimized on the input network at each iteration. Second, we apply our method to annotation, verification, and expansion of a set of regulatory connections found by the Context Likelihood of Relatedness algorithm.« less
Resampling approach for anomalous change detection
NASA Astrophysics Data System (ADS)
Theiler, James; Perkins, Simon
2007-04-01
We investigate the problem of identifying pixels in pairs of co-registered images that correspond to real changes on the ground. Changes that are due to environmental differences (illumination, atmospheric distortion, etc.) or sensor differences (focus, contrast, etc.) will be widespread throughout the image, and the aim is to avoid these changes in favor of changes that occur in only one or a few pixels. Formal outlier detection schemes (such as the one-class support vector machine) can identify rare occurrences, but will be confounded by pixels that are "equally rare" in both images: they may be anomalous, but they are not changes. We describe a resampling scheme we have developed that formally addresses both of these issues, and reduces the problem to a binary classification, a problem for which a large variety of machine learning tools have been developed. In principle, the effects of misregistration will manifest themselves as pervasive changes, and our method will be robust against them - but in practice, misregistration remains a serious issue.
Why do workaholics experience depression? A study with Chinese University teachers.
Nie, Yingzhi; Sun, Haitao
2016-10-01
This study focuses on the relationships of workaholism to job burnout and depression of university teachers. The direct and indirect (via job burnout) effects of workaholism on depression were investigated in 412 Chinese university teachers. Structural equation modeling and bootstrap method were used. Results revealed that workaholism, job burnout, and depression significantly correlated with each other. Structural equation modeling and bootstrap test indicated the partial mediation role of job burnout on the relationship between workaholism and depression. The findings shed some light on how workaholism influenced depression and provided valuable evidence for prevention of depression in work. © The Author(s) 2015.
Blank, Jos L T; van Hulst, Bart Laurents
2011-10-01
This paper describes the efficiency of Dutch hospitals using the Data Envelopment Analysis (DEA) method with bootstrapping. In particular, the analysis focuses on accounting for cost inefficiency measures on the part of hospital corporate governance. We use bootstrap techniques, as introduced by Simar and Wilson (J. Econom. 136(1):31-64, 2007), in order to obtain more efficient estimates of the effects of governance on the efficiency. The results show that part of the cost efficiency can be explained with governance. In particular we find that a higher remuneration of the board as well as a higher remuneration of the supervisory board does not implicate better performance.
TU-CD-BRA-01: A Novel 3D Registration Method for Multiparametric Radiological Images
DOE Office of Scientific and Technical Information (OSTI.GOV)
Akhbardeh, A; Parekth, VS; Jacobs, MA
2015-06-15
Purpose: Multiparametric and multimodality radiological imaging methods, such as, magnetic resonance imaging(MRI), computed tomography(CT), and positron emission tomography(PET), provide multiple types of tissue contrast and anatomical information for clinical diagnosis. However, these radiological modalities are acquired using very different technical parameters, e.g.,field of view(FOV), matrix size, and scan planes, which, can lead to challenges in registering the different data sets. Therefore, we developed a hybrid registration method based on 3D wavelet transformation and 3D interpolations that performs 3D resampling and rotation of the target radiological images without loss of information Methods: T1-weighted, T2-weighted, diffusion-weighted-imaging(DWI), dynamic-contrast-enhanced(DCE) MRI and PET/CT were usedmore » in the registration algorithm from breast and prostate data at 3T MRI and multimodality(PET/CT) cases. The hybrid registration scheme consists of several steps to reslice and match each modality using a combination of 3D wavelets, interpolations, and affine registration steps. First, orthogonal reslicing is performed to equalize FOV, matrix sizes and the number of slices using wavelet transformation. Second, angular resampling of the target data is performed to match the reference data. Finally, using optimized angles from resampling, 3D registration is performed using similarity transformation(scaling and translation) between the reference and resliced target volume is performed. After registration, the mean-square-error(MSE) and Dice Similarity(DS) between the reference and registered target volumes were calculated. Results: The 3D registration method registered synthetic and clinical data with significant improvement(p<0.05) of overlap between anatomical structures. After transforming and deforming the synthetic data, the MSE and Dice similarity were 0.12 and 0.99. The average improvement of the MSE in breast was 62%(0.27 to 0.10) and prostate was 63%(0.13 to 0.04;p<0.05). The Dice similarity was in breast 8%(0.91 to 0.99) and for prostate was 89%(0.01 to 0.90;p<0.05) Conclusion: Our 3D wavelet hybrid registration approach registered diverse breast and prostate data of different radiological images(MR/PET/CT) with a high accuracy.« less
NASA Astrophysics Data System (ADS)
Tweedie, C. E.; Ebert-May, D.; Hollister, R. D.; Johnson, D. R.; Lara, M. J.; Villarreal, S.; Spasojevic, M.; Webber, P.
2010-12-01
The International Polar Year-Back to the Future (IPY-BTF) is an endorsed International Polar Year project (IPY project #214). The overarching goal of this program is to determine how key structural and functional characteristics of high latitude/altitude terrestrial ecosystems have changed over the past 25 or more years and assess if such trajectories of change are likely to continue in the future. By rescuing data, revisiting, re-sampling historic research sites and assessing environmental change over time, we aim to provide greater understanding of how tundra is changing and what the possible drivers of these changes are. Resampling of sites established by Patrick J. Webber between 1964 and 1975 in northern Baffin Island, Northern Alaska and in the Rocky Mountains form a key contribution to the BTF project. Here we report on resampling efforts at each of these locations and initial results of a synthesis effort that finds similarities and differences in change between sites. Results suggest that although shifts in plant community composition are detectable at each location, the magnitude and direction of change differ among locations. Vegetation shifts along soil moisture gradients is apparent at most of the sites resampled. Interestingly, however, wet communities seem to have changed more than dry communities in the Arctic locations, while plant communities at the alpine site appear to be becoming more distinct regardless of soil moisture status. Ecosystem function studies performed in conjunction with plant community change suggest that there has been an increase in plant productivity at most sites resampled, especially in wet and mesic land cover types.
Travelogue--a newcomer encounters statistics and the computer.
Bruce, Peter
2011-11-01
Computer-intensive methods have revolutionized statistics, giving rise to new areas of analysis and expertise in predictive analytics, image processing, pattern recognition, machine learning, genomic analysis, and more. Interest naturally centers on the new capabilities the computer allows the analyst to bring to the table. This article, instead, focuses on the account of how computer-based resampling methods, with their relative simplicity and transparency, enticed one individual, untutored in statistics or mathematics, on a long journey into learning statistics, then teaching it, then starting an education institution.
Kong, Tianzhu; He, Yini; Auerbach, Randy P.; McWhinnie, Chad M.; Xiao, Jing
2015-01-01
Objective In this study, we examined the mediator effects of overgeneral autobiographical memory (OGM) on the relationship between rumination and depression in 323 Chinese university students. Method 323 undergraduates completed the questionnaires measuring OGM (Autobiographical Memory Test), rumination (Ruminative Response Scale) and depression (Center for Epidemiologic Studies Depression Scale). Results Results using structural equation modeling showed that OGM partially-mediated the relationship between rumination and depression (χ2 = 88.61, p < .01; RMSEA = .051; SRMR = .040; and CFI = .91). Bootstrap methods were used to assess the magnitude of the indirect effects. The results of the bootstrap estimation procedure and subsequent analyses indicated that the indirect effects of OGM on the relationship between rumination and depressive symptoms were significant. Conclusion The results indicated that rumination and depression were partially mediated by OGM. PMID:25977594
Porter, Teresita M; Gibson, Joel F; Shokralla, Shadi; Baird, Donald J; Golding, G Brian; Hajibabaei, Mehrdad
2014-01-01
Current methods to identify unknown insect (class Insecta) cytochrome c oxidase (COI barcode) sequences often rely on thresholds of distances that can be difficult to define, sequence similarity cut-offs, or monophyly. Some of the most commonly used metagenomic classification methods do not provide a measure of confidence for the taxonomic assignments they provide. The aim of this study was to use a naïve Bayesian classifier (Wang et al. Applied and Environmental Microbiology, 2007; 73: 5261) to automate taxonomic assignments for large batches of insect COI sequences such as data obtained from high-throughput environmental sequencing. This method provides rank-flexible taxonomic assignments with an associated bootstrap support value, and it is faster than the blast-based methods commonly used in environmental sequence surveys. We have developed and rigorously tested the performance of three different training sets using leave-one-out cross-validation, two field data sets, and targeted testing of Lepidoptera, Diptera and Mantodea sequences obtained from the Barcode of Life Data system. We found that type I error rates, incorrect taxonomic assignments with a high bootstrap support, were already relatively low but could be lowered further by ensuring that all query taxa are actually present in the reference database. Choosing bootstrap support cut-offs according to query length and summarizing taxonomic assignments to more inclusive ranks can also help to reduce error while retaining the maximum number of assignments. Additionally, we highlight gaps in the taxonomic and geographic representation of insects in public sequence databases that will require further work by taxonomists to improve the quality of assignments generated using any method.
Carnegie, Nicole Bohme
2011-04-15
The incidence of new infections is a key measure of the status of the HIV epidemic, but accurate measurement of incidence is often constrained by limited data. Karon et al. (Statist. Med. 2008; 27:4617–4633) developed a model to estimate the incidence of HIV infection from surveillance data with biologic testing for recent infection for newly diagnosed cases. This method has been implemented by public health departments across the United States and is behind the new national incidence estimates, which are about 40 per cent higher than previous estimates. We show that the delta method approximation given for the variance of the estimator is incomplete, leading to an inflated variance estimate. This contributes to the generation of overly conservative confidence intervals, potentially obscuring important differences between populations. We demonstrate via simulation that an innovative model-based bootstrap method using the specified model for the infection and surveillance process improves confidence interval coverage and adjusts for the bias in the point estimate. Confidence interval coverage is about 94–97 per cent after correction, compared with 96–99 per cent before. The simulated bias in the estimate of incidence ranges from −6.3 to +14.6 per cent under the original model but is consistently under 1 per cent after correction by the model-based bootstrap. In an application to data from King County, Washington in 2007 we observe correction of 7.2 per cent relative bias in the incidence estimate and a 66 per cent reduction in the width of the 95 per cent confidence interval using this method. We provide open-source software to implement the method that can also be extended for alternate models.
A Bootstrap Metropolis-Hastings Algorithm for Bayesian Analysis of Big Data.
Liang, Faming; Kim, Jinsu; Song, Qifan
2016-01-01
Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively.
A Bootstrap Metropolis–Hastings Algorithm for Bayesian Analysis of Big Data
Kim, Jinsu; Song, Qifan
2016-01-01
Markov chain Monte Carlo (MCMC) methods have proven to be a very powerful tool for analyzing data of complex structures. However, their computer-intensive nature, which typically require a large number of iterations and a complete scan of the full dataset for each iteration, precludes their use for big data analysis. In this paper, we propose the so-called bootstrap Metropolis-Hastings (BMH) algorithm, which provides a general framework for how to tame powerful MCMC methods to be used for big data analysis; that is to replace the full data log-likelihood by a Monte Carlo average of the log-likelihoods that are calculated in parallel from multiple bootstrap samples. The BMH algorithm possesses an embarrassingly parallel structure and avoids repeated scans of the full dataset in iterations, and is thus feasible for big data problems. Compared to the popular divide-and-combine method, BMH can be generally more efficient as it can asymptotically integrate the whole data information into a single simulation run. The BMH algorithm is very flexible. Like the Metropolis-Hastings algorithm, it can serve as a basic building block for developing advanced MCMC algorithms that are feasible for big data problems. This is illustrated in the paper by the tempering BMH algorithm, which can be viewed as a combination of parallel tempering and the BMH algorithm. BMH can also be used for model selection and optimization by combining with reversible jump MCMC and simulated annealing, respectively. PMID:29033469
Asymptotic confidence intervals for the Pearson correlation via skewness and kurtosis.
Bishara, Anthony J; Li, Jiexiang; Nash, Thomas
2018-02-01
When bivariate normality is violated, the default confidence interval of the Pearson correlation can be inaccurate. Two new methods were developed based on the asymptotic sampling distribution of Fisher's z' under the general case where bivariate normality need not be assumed. In Monte Carlo simulations, the most successful of these methods relied on the (Vale & Maurelli, 1983, Psychometrika, 48, 465) family to approximate a distribution via the marginal skewness and kurtosis of the sample data. In Simulation 1, this method provided more accurate confidence intervals of the correlation in non-normal data, at least as compared to no adjustment of the Fisher z' interval, or to adjustment via the sample joint moments. In Simulation 2, this approximate distribution method performed favourably relative to common non-parametric bootstrap methods, but its performance was mixed relative to an observed imposed bootstrap and two other robust methods (PM1 and HC4). No method was completely satisfactory. An advantage of the approximate distribution method, though, is that it can be implemented even without access to raw data if sample skewness and kurtosis are reported, making the method particularly useful for meta-analysis. Supporting information includes R code. © 2017 The British Psychological Society.