Rasch fit statistics and sample size considerations for polytomous data.
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-05-29
Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire - 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges.
Rasch fit statistics and sample size considerations for polytomous data
Smith, Adam B; Rush, Robert; Fallowfield, Lesley J; Velikova, Galina; Sharpe, Michael
2008-01-01
Background Previous research on educational data has demonstrated that Rasch fit statistics (mean squares and t-statistics) are highly susceptible to sample size variation for dichotomously scored rating data, although little is known about this relationship for polytomous data. These statistics help inform researchers about how well items fit to a unidimensional latent trait, and are an important adjunct to modern psychometrics. Given the increasing use of Rasch models in health research the purpose of this study was therefore to explore the relationship between fit statistics and sample size for polytomous data. Methods Data were collated from a heterogeneous sample of cancer patients (n = 4072) who had completed both the Patient Health Questionnaire – 9 and the Hospital Anxiety and Depression Scale. Ten samples were drawn with replacement for each of eight sample sizes (n = 25 to n = 3200). The Rating and Partial Credit Models were applied and the mean square and t-fit statistics (infit/outfit) derived for each model. Results The results demonstrated that t-statistics were highly sensitive to sample size, whereas mean square statistics remained relatively stable for polytomous data. Conclusion It was concluded that mean square statistics were relatively independent of sample size for polytomous data and that misfit to the model could be identified using published recommended ranges. PMID:18510722
Currens, J.C.
1999-01-01
Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.Analytical data for nitrate and triazines from 566 samples collected over a 3-year period at Pleasant Grove Spring, Logan County, KY, were statistically analyzed to determine the minimum data set needed to calculate meaningful yearly averages for a conduit-flow karst spring. Results indicate that a biweekly sampling schedule augmented with bihourly samples from high-flow events will provide meaningful suspended-constituent and dissolved-constituent statistics. Unless collected over an extensive period of time, daily samples may not be representative and may also be autocorrelated. All high-flow events resulting in a significant deflection of a constituent from base-line concentrations should be sampled. Either the geometric mean or the flow-weighted average of the suspended constituents should be used. If automatic samplers are used, then they may be programmed to collect storm samples as frequently as every few minutes to provide details on the arrival time of constituents of interest. However, only samples collected bihourly should be used to calculate averages. By adopting a biweekly sampling schedule augmented with high-flow samples, the need to continuously monitor discharge, or to search for and analyze existing data to develop a statistically valid monitoring plan, is lessened.
NASA Astrophysics Data System (ADS)
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
In this study, we introduced an alternative approach for analysis of hydrological flow time series, using an adaptive sampling framework based on inter-amount times (IATs). The main difference with conventional flow time series is the rate at which low and high flows are sampled: the unit of analysis for IATs is a fixed flow amount, instead of a fixed time window. We analysed statistical distributions of flows and IATs across a wide range of sampling scales to investigate sensitivity of statistical properties such as quantiles, variance, skewness, scaling parameters and flashiness indicators to the sampling scale. We did this based on streamflow time series for 17 (semi)urbanised basins in North Carolina, US, ranging from 13 km2 to 238 km2 in size. Results showed that adaptive sampling of flow time series based on inter-amounts leads to a more balanced representation of low flow and peak flow values in the statistical distribution. While conventional sampling gives a lot of weight to low flows, as these are most ubiquitous in flow time series, IAT sampling gives relatively more weight to high flow values, when given flow amounts are accumulated in shorter time. As a consequence, IAT sampling gives more information about the tail of the distribution associated with high flows, while conventional sampling gives relatively more information about low flow periods. We will present results of statistical analyses across a range of subdaily to seasonal scales and will highlight some interesting insights that can be derived from IAT statistics with respect to basin flashiness and impact urbanisation on hydrological response.
Moore, B.L.; Evaldi, R.D.
1995-01-01
Bottom sediments from 25 stream sites in Jefferson County, Ky., were analyzed for percent volatile solids and concentrations of nutrients, major metals, trace elements, miscellaneous inorganic compounds, and selected organic compounds. Statistical high outliers of the constituent concentrations analyzed for in the bottom sediments were defined as a measure of possible elevated concentrations. Statistical high outliers were determined for at least 1 constituent at each of 12 sampling sites in Jefferson County. Of the 10 stream basins sampled in Jefferson County, the Middle Fork Beargrass Basin, Cedar Creek Basin, and Harrods Creek Basin were the only three basins where a statistical high outlier was not found for any of the measured constituents. In the Pennsylvania Run Basin, total volatile solids, nitrate plus nitrite, and endrin constituents were statistical high outliers. Pond Creek was the only basin where five constituents were statistical high outliers-barium, beryllium, cadmium, chromium, and silver. Nitrate plus nitrite and copper constituents were the only statistical high outliers found in the Mill Creek Basin. In the Floyds Fork Basin, nitrate plus nitrite, phosphorus, mercury, and silver constituents were the only statistical high outliers. Ammonia was the only statistical high outlier found in the South Fork Beargrass Basin. In the Goose Creek Basin, mercury and silver constituents were the only statistical high outliers. Cyanide was the only statistical high outlier in the Muddy Fork Basin.
Statistical Symbolic Execution with Informed Sampling
NASA Technical Reports Server (NTRS)
Filieri, Antonio; Pasareanu, Corina S.; Visser, Willem; Geldenhuys, Jaco
2014-01-01
Symbolic execution techniques have been proposed recently for the probabilistic analysis of programs. These techniques seek to quantify the likelihood of reaching program events of interest, e.g., assert violations. They have many promising applications but have scalability issues due to high computational demand. To address this challenge, we propose a statistical symbolic execution technique that performs Monte Carlo sampling of the symbolic program paths and uses the obtained information for Bayesian estimation and hypothesis testing with respect to the probability of reaching the target events. To speed up the convergence of the statistical analysis, we propose Informed Sampling, an iterative symbolic execution that first explores the paths that have high statistical significance, prunes them from the state space and guides the execution towards less likely paths. The technique combines Bayesian estimation with a partial exact analysis for the pruned paths leading to provably improved convergence of the statistical analysis. We have implemented statistical symbolic execution with in- formed sampling in the Symbolic PathFinder tool. We show experimentally that the informed sampling obtains more precise results and converges faster than a purely statistical analysis and may also be more efficient than an exact symbolic analysis. When the latter does not terminate symbolic execution with informed sampling can give meaningful results under the same time and memory limits.
Radar error statistics for the space shuttle
NASA Technical Reports Server (NTRS)
Lear, W. M.
1979-01-01
Radar error statistics of C-band and S-band that are recommended for use with the groundtracking programs to process space shuttle tracking data are presented. The statistics are divided into two parts: bias error statistics, using the subscript B, and high frequency error statistics, using the subscript q. Bias errors may be slowly varying to constant. High frequency random errors (noise) are rapidly varying and may or may not be correlated from sample to sample. Bias errors were mainly due to hardware defects and to errors in correction for atmospheric refraction effects. High frequency noise was mainly due to hardware and due to atmospheric scintillation. Three types of atmospheric scintillation were identified: horizontal, vertical, and line of sight. This was the first time that horizontal and line of sight scintillations were identified.
Kim, Sung-Min; Choi, Yosoon
2017-01-01
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168
Kim, Sung-Min; Choi, Yosoon
2017-06-18
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
Statistics for characterizing data on the periphery
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theiler, James P; Hush, Donald R
2010-01-01
We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.
Olives, Casey; Valadez, Joseph J; Brooker, Simon J; Pagano, Marcello
2012-01-01
Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n=15 and n=25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n=15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools.
NASA Astrophysics Data System (ADS)
Adams, T.; Batra, P.; Bugel, L.; Camilleri, L.; Conrad, J. M.; de Gouvêa, A.; Fisher, P. H.; Formaggio, J. A.; Jenkins, J.; Karagiorgi, G.; Kobilarcik, T. R.; Kopp, S.; Kyle, G.; Loinaz, W. A.; Mason, D. A.; Milner, R.; Moore, R.; Morfín, J. G.; Nakamura, M.; Naples, D.; Nienaber, P.; Olness, F. I.; Owens, J. F.; Pate, S. F.; Pronin, A.; Seligman, W. G.; Shaevitz, M. H.; Schellman, H.; Schienbein, I.; Syphers, M. J.; Tait, T. M. P.; Takeuchi, T.; Tan, C. Y.; van de Water, R. G.; Yamamoto, R. K.; Yu, J. Y.
We extend the physics case for a new high-energy, ultra-high statistics neutrino scattering experiment, NuSOnG (Neutrino Scattering On Glass) to address a variety of issues including precision QCD measurements, extraction of structure functions, and the derived Parton Distribution Functions (PDF's). This experiment uses a Tevatron-based neutrino beam to obtain a sample of Deep Inelastic Scattering (DIS) events which is over two orders of magnitude larger than past samples. We outline an innovative method for fitting the structure functions using a parametrized energy shift which yields reduced systematic uncertainties. High statistics measurements, in combination with improved systematics, will enable NuSOnG to perform discerning tests of fundamental Standard Model parameters as we search for deviations which may hint of "Beyond the Standard Model" physics.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
The prior statistics of object colors.
Koenderink, Jan J
2010-02-01
The prior statistics of object colors is of much interest because extensive statistical investigations of reflectance spectra reveal highly non-uniform structure in color space common to several very different databases. This common structure is due to the visual system rather than to the statistics of environmental structure. Analysis involves an investigation of the proper sample space of spectral reflectance factors and of the statistical consequences of the projection of spectral reflectances on the color solid. Even in the case of reflectance statistics that are translationally invariant with respect to the wavelength dimension, the statistics of object colors is highly non-uniform. The qualitative nature of this non-uniformity is due to trichromacy.
ERIC Educational Resources Information Center
Bailey, Thomas; Jenkins, Davis; Leinbach, Timothy
2005-01-01
This report summarizes statistics on access and attainment in higher education, focusing particularly on community college students, using data from the National Education Longitudinal Study of 1988 (NELS:88), which follows a nationally representative sample of individuals who were eighth graders in the spring of 1988. A sample of these…
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.
Dazard, Jean-Eudes; Rao, J Sunil
2012-07-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
Steganalysis of recorded speech
NASA Astrophysics Data System (ADS)
Johnson, Micah K.; Lyu, Siwei; Farid, Hany
2005-03-01
Digital audio provides a suitable cover for high-throughput steganography. At 16 bits per sample and sampled at a rate of 44,100 Hz, digital audio has the bit-rate to support large messages. In addition, audio is often transient and unpredictable, facilitating the hiding of messages. Using an approach similar to our universal image steganalysis, we show that hidden messages alter the underlying statistics of audio signals. Our statistical model begins by building a linear basis that captures certain statistical properties of audio signals. A low-dimensional statistical feature vector is extracted from this basis representation and used by a non-linear support vector machine for classification. We show the efficacy of this approach on LSB embedding and Hide4PGP. While no explicit assumptions about the content of the audio are made, our technique has been developed and tested on high-quality recorded speech.
Olives, Casey; Valadez, Joseph J.; Brooker, Simon J.; Pagano, Marcello
2012-01-01
Background Originally a binary classifier, Lot Quality Assurance Sampling (LQAS) has proven to be a useful tool for classification of the prevalence of Schistosoma mansoni into multiple categories (≤10%, >10 and <50%, ≥50%), and semi-curtailed sampling has been shown to effectively reduce the number of observations needed to reach a decision. To date the statistical underpinnings for Multiple Category-LQAS (MC-LQAS) have not received full treatment. We explore the analytical properties of MC-LQAS, and validate its use for the classification of S. mansoni prevalence in multiple settings in East Africa. Methodology We outline MC-LQAS design principles and formulae for operating characteristic curves. In addition, we derive the average sample number for MC-LQAS when utilizing semi-curtailed sampling and introduce curtailed sampling in this setting. We also assess the performance of MC-LQAS designs with maximum sample sizes of n = 15 and n = 25 via a weighted kappa-statistic using S. mansoni data collected in 388 schools from four studies in East Africa. Principle Findings Overall performance of MC-LQAS classification was high (kappa-statistic of 0.87). In three of the studies, the kappa-statistic for a design with n = 15 was greater than 0.75. In the fourth study, where these designs performed poorly (kappa-statistic less than 0.50), the majority of observations fell in regions where potential error is known to be high. Employment of semi-curtailed and curtailed sampling further reduced the sample size by as many as 0.5 and 3.5 observations per school, respectively, without increasing classification error. Conclusion/Significance This work provides the needed analytics to understand the properties of MC-LQAS for assessing the prevalance of S. mansoni and shows that in most settings a sample size of 15 children provides a reliable classification of schools. PMID:22970333
Hyatt, M.W.; Hubert, W.A.
2001-01-01
We assessed relative weight (Wr) distributions among 291 samples of stock-to-quality-length brook trout Salvelinus fontinalis, brown trout Salmo trutta, rainbow trout Oncorhynchus mykiss, and cutthroat trout O. clarki from lentic and lotic habitats. Statistics describing Wr sample distributions varied slightly among species and habitat types. The average sample was leptokurtotic and slightly skewed to the right with a standard deviation of about 10, but the shapes of Wr distributions varied widely among samples. Twenty-two percent of the samples had nonnormal distributions, suggesting the need to evaluate sample distributions before applying statistical tests to determine whether assumptions are met. In general, our findings indicate that samples of about 100 stock-to-quality-length fish are needed to obtain confidence interval widths of four Wr units around the mean. Power analysis revealed that samples of about 50 stock-to-quality-length fish are needed to detect a 2% change in mean Wr at a relatively high level of power (beta = 0.01, alpha = 0.05).
Systematic sampling for suspended sediment
Robert B. Thomas
1991-01-01
Abstract - Because of high costs or complex logistics, scientific populations cannot be measured entirely and must be sampled. Accepted scientific practice holds that sample selection be based on statistical principles to assure objectivity when estimating totals and variances. Probability sampling--obtaining samples with known probabilities--is the only method that...
Replicating studies in which samples of participants respond to samples of stimuli.
Westfall, Jacob; Judd, Charles M; Kenny, David A
2015-05-01
In a direct replication, the typical goal is to reproduce a prior experimental result with a new but comparable sample of participants in a high-powered replication study. Often in psychology, the research to be replicated involves a sample of participants responding to a sample of stimuli. In replicating such studies, we argue that the same criteria should be used in sampling stimuli as are used in sampling participants. Namely, a new but comparable sample of stimuli should be used to ensure that the original results are not due to idiosyncrasies of the original stimulus sample, and the stimulus sample must often be enlarged to ensure high statistical power. In support of the latter point, we discuss the fact that in experiments involving samples of stimuli, statistical power typically does not approach 1 as the number of participants goes to infinity. As an example of the importance of sampling new stimuli, we discuss the bygone literature on the risky shift phenomenon, which was almost entirely based on a single stimulus sample that was later discovered to be highly unrepresentative. We discuss the use of both resampled and expanded stimulus sets, that is, stimulus samples that include the original stimuli plus new stimuli. © The Author(s) 2015.
Nomogram for sample size calculation on a straightforward basis for the kappa statistic.
Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo
2014-09-01
Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.
Use Trends Indicated by Statistically Calibrated Recreational Sites in the National Forest System
Gary L. Tyre
1971-01-01
Trends in statistically sampled use of developed sites in the National Forest system indicate an average annual increase of 6.0 percent in the period 1966-69. The high variability of the measure precludes its use for projecting expected future use, but it can be important in gauging the credibility of annual use changes at both sampled and unsampled locations.
The (mis)reporting of statistical results in psychology journals.
Bakker, Marjan; Wicherts, Jelte M
2011-09-01
In order to study the prevalence, nature (direction), and causes of reporting errors in psychology, we checked the consistency of reported test statistics, degrees of freedom, and p values in a random sample of high- and low-impact psychology journals. In a second study, we established the generality of reporting errors in a random sample of recent psychological articles. Our results, on the basis of 281 articles, indicate that around 18% of statistical results in the psychological literature are incorrectly reported. Inconsistencies were more common in low-impact journals than in high-impact journals. Moreover, around 15% of the articles contained at least one statistical conclusion that proved, upon recalculation, to be incorrect; that is, recalculation rendered the previously significant result insignificant, or vice versa. These errors were often in line with researchers' expectations. We classified the most common errors and contacted authors to shed light on the origins of the errors.
NASA Technical Reports Server (NTRS)
Colarco, P. R.; Kahn, R. A.; Remer, L. A.; Levy, R. C.
2014-01-01
We use the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite aerosol optical thickness (AOT) product to assess the impact of reduced swath width on global and regional AOT statistics and trends. Alongtrack and across-track sampling strategies are employed, in which the full MODIS data set is sub-sampled with various narrow-swath (approximately 400-800 km) and single pixel width (approximately 10 km) configurations. Although view-angle artifacts in the MODIS AOT retrieval confound direct comparisons between averages derived from different sub-samples, careful analysis shows that with many portions of the Earth essentially unobserved, spatial sampling introduces uncertainty in the derived seasonal-regional mean AOT. These AOT spatial sampling artifacts comprise up to 60%of the full-swath AOT value under moderate aerosol loading, and can be as large as 0.1 in some regions under high aerosol loading. Compared to full-swath observations, narrower swath and single pixel width sampling exhibits a reduced ability to detect AOT trends with statistical significance. On the other hand, estimates of the global, annual mean AOT do not vary significantly from the full-swath values as spatial sampling is reduced. Aggregation of the MODIS data at coarse grid scales (10 deg) shows consistency in the aerosol trends across sampling strategies, with increased statistical confidence, but quantitative errors in the derived trends are found even for the full-swath data when compared to high spatial resolution (0.5 deg) aggregations. Using results of a model-derived aerosol reanalysis, we find consistency in our conclusions about a seasonal-regional spatial sampling artifact in AOT Furthermore, the model shows that reduced spatial sampling can amount to uncertainty in computed shortwave top-ofatmosphere aerosol radiative forcing of 2-3 W m(sup-2). These artifacts are lower bounds, as possibly other unconsidered sampling strategies would perform less well. These results suggest that future aerosol satellite missions having significantly less than full-swath viewing are unlikely to sample the true AOT distribution well enough to obtain the statistics needed to reduce uncertainty in aerosol direct forcing of climate.
The large sample size fallacy.
Lantz, Björn
2013-06-01
Significance in the statistical sense has little to do with significance in the common practical sense. Statistical significance is a necessary but not a sufficient condition for practical significance. Hence, results that are extremely statistically significant may be highly nonsignificant in practice. The degree of practical significance is generally determined by the size of the observed effect, not the p-value. The results of studies based on large samples are often characterized by extreme statistical significance despite small or even trivial effect sizes. Interpreting such results as significant in practice without further analysis is referred to as the large sample size fallacy in this article. The aim of this article is to explore the relevance of the large sample size fallacy in contemporary nursing research. Relatively few nursing articles display explicit measures of observed effect sizes or include a qualitative discussion of observed effect sizes. Statistical significance is often treated as an end in itself. Effect sizes should generally be calculated and presented along with p-values for statistically significant results, and observed effect sizes should be discussed qualitatively through direct and explicit comparisons with the effects in related literature. © 2012 Nordic College of Caring Science.
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
A statistical approach to selecting and confirming validation targets in -omics experiments
2012-01-01
Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Breaking Free of Sample Size Dogma to Perform Innovative Translational Research
Bacchetti, Peter; Deeks, Steven G.; McCune, Joseph M.
2011-01-01
Innovative clinical and translational research is often delayed or prevented by reviewers’ expectations that any study performed in humans must be shown in advance to have high statistical power. This supposed requirement is not justifiable and is contradicted by the reality that increasing sample size produces diminishing marginal returns. Studies of new ideas often must start small (sometimes even with an N of 1) because of cost and feasibility concerns, and recent statistical work shows that small sample sizes for such research can produce more projected scientific value per dollar spent than larger sample sizes. Renouncing false dogma about sample size would remove a serious barrier to innovation and translation. PMID:21677197
ERIC Educational Resources Information Center
Chang, Chun-Yen; Cheng, Wei-Ying
2008-01-01
The interrelationship between senior high school students' science achievement (SA) and their self-confidence and interest in science (SCIS) was explored with a representative sample of approximately 1,044 11th-grade students from 30 classes attending four high schools throughout Taiwan. Statistical analyses indicated that a statistically…
Robust functional statistics applied to Probability Density Function shape screening of sEMG data.
Boudaoud, S; Rix, H; Al Harrach, M; Marin, F
2014-01-01
Recent studies pointed out possible shape modifications of the Probability Density Function (PDF) of surface electromyographical (sEMG) data according to several contexts like fatigue and muscle force increase. Following this idea, criteria have been proposed to monitor these shape modifications mainly using High Order Statistics (HOS) parameters like skewness and kurtosis. In experimental conditions, these parameters are confronted with small sample size in the estimation process. This small sample size induces errors in the estimated HOS parameters restraining real-time and precise sEMG PDF shape monitoring. Recently, a functional formalism, the Core Shape Model (CSM), has been used to analyse shape modifications of PDF curves. In this work, taking inspiration from CSM method, robust functional statistics are proposed to emulate both skewness and kurtosis behaviors. These functional statistics combine both kernel density estimation and PDF shape distances to evaluate shape modifications even in presence of small sample size. Then, the proposed statistics are tested, using Monte Carlo simulations, on both normal and Log-normal PDFs that mimic observed sEMG PDF shape behavior during muscle contraction. According to the obtained results, the functional statistics seem to be more robust than HOS parameters to small sample size effect and more accurate in sEMG PDF shape screening applications.
Visual Sample Plan Version 7.0 User's Guide
DOE Office of Scientific and Technical Information (OSTI.GOV)
Matzke, Brett D.; Newburn, Lisa LN; Hathaway, John E.
2014-03-01
User's guide for VSP 7.0 This user's guide describes Visual Sample Plan (VSP) Version 7.0 and provides instructions for using the software. VSP selects the appropriate number and location of environmental samples to ensure that the results of statistical tests performed to provide input to risk decisions have the required confidence and performance. VSP Version 7.0 provides sample-size equations or algorithms needed by specific statistical tests appropriate for specific environmental sampling objectives. It also provides data quality assessment and statistical analysis functions to support evaluation of the data and determine whether the data support decisions regarding sites suspected of contamination.more » The easy-to-use program is highly visual and graphic. VSP runs on personal computers with Microsoft Windows operating systems (XP, Vista, Windows 7, and Windows 8). Designed primarily for project managers and users without expertise in statistics, VSP is applicable to two- and three-dimensional populations to be sampled (e.g., rooms and buildings, surface soil, a defined layer of subsurface soil, water bodies, and other similar applications) for studies of environmental quality. VSP is also applicable for designing sampling plans for assessing chem/rad/bio threat and hazard identification within rooms and buildings, and for designing geophysical surveys for unexploded ordnance (UXO) identification.« less
Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data
Dazard, Jean-Eudes; Rao, J. Sunil
2012-01-01
The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950
Simulation of Wind Profile Perturbations for Launch Vehicle Design
NASA Technical Reports Server (NTRS)
Adelfang, S. I.
2004-01-01
Ideally, a statistically representative sample of measured high-resolution wind profiles with wavelengths as small as tens of meters is required in design studies to establish aerodynamic load indicator dispersions and vehicle control system capability. At most potential launch sites, high- resolution wind profiles may not exist. Representative samples of Rawinsonde wind profiles to altitudes of 30 km are more likely to be available from the extensive network of measurement sites established for routine sampling in support of weather observing and forecasting activity. Such a sample, large enough to be statistically representative of relatively large wavelength perturbations, would be inadequate for launch vehicle design assessments because the Rawinsonde system accurately measures wind perturbations with wavelengths no smaller than 2000 m (1000 m altitude increment). The Kennedy Space Center (KSC) Jimsphere wind profiles (150/month and seasonal 2 and 3.5-hr pairs) are the only adequate samples of high resolution profiles approx. 150 to 300 m effective resolution, but over-sampled at 25 m intervals) that have been used extensively for launch vehicle design assessments. Therefore, a simulation process has been developed for enhancement of measured low-resolution Rawinsonde profiles that would be applicable in preliminary launch vehicle design studies at launch sites other than KSC.
Fernee, Christianne; Browne, Martin; Zakrzewski, Sonia
2017-01-01
This paper introduces statistical shape modelling (SSM) for use in osteoarchaeology research. SSM is a full field, multi-material analytical technique, and is presented as a supplementary geometric morphometric (GM) tool. Lower mandibular canines from two archaeological populations and one modern population were sampled, digitised using micro-CT, aligned, registered to a baseline and statistically modelled using principal component analysis (PCA). Sample material properties were incorporated as a binary enamel/dentin parameter. Results were assessed qualitatively and quantitatively using anatomical landmarks. Finally, the technique’s application was demonstrated for inter-sample comparison through analysis of the principal component (PC) weights. It was found that SSM could provide high detail qualitative and quantitative insight with respect to archaeological inter- and intra-sample variability. This technique has value for archaeological, biomechanical and forensic applications including identification, finite element analysis (FEA) and reconstruction from partial datasets. PMID:29216199
Characterizing and Improving Distributed Intrusion Detection Systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hurd, Steven A; Proebstel, Elliot P.
2007-11-01
Due to ever-increasing quantities of information traversing networks, network administrators are developing greater reliance upon statistically sampled packet information as the source for their intrusion detection systems (IDS). Our research is aimed at understanding IDS performance when statistical packet sampling is used. Using the Snort IDS and a variety of data sets, we compared IDS results when an entire data set is used to the results when a statistically sampled subset of the data set is used. Generally speaking, IDS performance with statistically sampled information was shown to drop considerably even under fairly high sampling rates (such as 1:5). Characterizingmore » and Improving Distributed Intrusion Detection Systems4AcknowledgementsThe authors wish to extend our gratitude to Matt Bishop and Chen-Nee Chuah of UC Davis for their guidance and support on this work. Our thanks are also extended to Jianning Mai of UC Davis and Tao Ye of Sprint Advanced Technology Labs for their generous assistance.We would also like to acknowledge our dataset sources, CRAWDAD and CAIDA, without which this work would not have been possible. Support for OC48 data collection is provided by DARPA, NSF, DHS, Cisco and CAIDA members.« less
Properties of different selection signature statistics and a new strategy for combining them.
Ma, Y; Ding, X; Qanbari, S; Weigend, S; Zhang, Q; Simianer, H
2015-11-01
Identifying signatures of recent or ongoing selection is of high relevance in livestock population genomics. From a statistical perspective, determining a proper testing procedure and combining various test statistics is challenging. On the basis of extensive simulations in this study, we discuss the statistical properties of eight different established selection signature statistics. In the considered scenario, we show that a reasonable power to detect selection signatures is achieved with high marker density (>1 SNP/kb) as obtained from sequencing, while rather small sample sizes (~15 diploid individuals) appear to be sufficient. Most selection signature statistics such as composite likelihood ratio and cross population extended haplotype homozogysity have the highest power when fixation of the selected allele is reached, while integrated haplotype score has the highest power when selection is ongoing. We suggest a novel strategy, called de-correlated composite of multiple signals (DCMS) to combine different statistics for detecting selection signatures while accounting for the correlation between the different selection signature statistics. When examined with simulated data, DCMS consistently has a higher power than most of the single statistics and shows a reliable positional resolution. We illustrate the new statistic to the established selective sweep around the lactase gene in human HapMap data providing further evidence of the reliability of this new statistic. Then, we apply it to scan selection signatures in two chicken samples with diverse skin color. Our analysis suggests that a set of well-known genes such as BCO2, MC1R, ASIP and TYR were involved in the divergent selection for this trait.
Local image statistics: maximum-entropy constructions and perceptual salience
Victor, Jonathan D.; Conte, Mary M.
2012-01-01
The space of visual signals is high-dimensional and natural visual images have a highly complex statistical structure. While many studies suggest that only a limited number of image statistics are used for perceptual judgments, a full understanding of visual function requires analysis not only of the impact of individual image statistics, but also, how they interact. In natural images, these statistical elements (luminance distributions, correlations of low and high order, edges, occlusions, etc.) are intermixed, and their effects are difficult to disentangle. Thus, there is a need for construction of stimuli in which one or more statistical elements are introduced in a controlled fashion, so that their individual and joint contributions can be analyzed. With this as motivation, we present algorithms to construct synthetic images in which local image statistics—including luminance distributions, pair-wise correlations, and higher-order correlations—are explicitly specified and all other statistics are determined implicitly by maximum-entropy. We then apply this approach to measure the sensitivity of the human visual system to local image statistics and to sample their interactions. PMID:22751397
Zhu, Wensheng; Yuan, Ying; Zhang, Jingwen; Zhou, Fan; Knickmeyer, Rebecca C; Zhu, Hongtu
2017-02-01
The aim of this paper is to systematically evaluate a biased sampling issue associated with genome-wide association analysis (GWAS) of imaging phenotypes for most imaging genetic studies, including the Alzheimer's Disease Neuroimaging Initiative (ADNI). Specifically, the original sampling scheme of these imaging genetic studies is primarily the retrospective case-control design, whereas most existing statistical analyses of these studies ignore such sampling scheme by directly correlating imaging phenotypes (called the secondary traits) with genotype. Although it has been well documented in genetic epidemiology that ignoring the case-control sampling scheme can produce highly biased estimates, and subsequently lead to misleading results and suspicious associations, such findings are not well documented in imaging genetics. We use extensive simulations and a large-scale imaging genetic data analysis of the Alzheimer's Disease Neuroimaging Initiative (ADNI) data to evaluate the effects of the case-control sampling scheme on GWAS results based on some standard statistical methods, such as linear regression methods, while comparing it with several advanced statistical methods that appropriately adjust for the case-control sampling scheme. Copyright © 2016 Elsevier Inc. All rights reserved.
In vitro corrosion behaviour and microhardness of high-copper amalgams with platinum and indium.
Ilikli, B G; Aydin, A; Işimer, A; Alpaslan, G
1999-02-01
Samples prepared from Luxalloy, GS-80, Permite-C and Logic and polished after 24 h by traditional methods were stored in polypropylene tubes containing phosphate-buffered saline solutions (pH 3.5 and 6.5) and distilled water. The amounts of mercury, silver, tin, copper, zinc, platinum and indium in the test solutions were determined at the first, second, eighth, 52nd and 78th week by atomic absorption spectrometry. At the end of the eighth week the amalgam samples were removed from solutions and evaluated by Rockwell Super Scial Microhardness tester. Statistically significant low amounts of metal ions were measured for Permite-C containing indium and Logic containing platinum. The microhardness test results showed that there were statistically significant increases in the microhardness of Permite-C and Logic. As a result it was shown that the amalgam samples were affected from corrosion conditions to different degrees. Sample of the Logic group that was stored in distilled water, showed smoother surface properties than other amalgam samples containing high copper. However, it was observed that samples of Permite-C group had the smoothest surface properties.
NASA Technical Reports Server (NTRS)
Gardner, Adrian
2010-01-01
National Aeronautical and Space Administration (NASA) weather and atmospheric environmental organizations are insatiable consumers of geophysical, hydrometeorological and solar weather statistics. The expanding array of internet-worked sensors producing targeted physical measurements has generated an almost factorial explosion of near real-time inputs to topical statistical datasets. Normalizing and value-based parsing of such statistical datasets in support of time-constrained weather and environmental alerts and warnings is essential, even with dedicated high-performance computational capabilities. What are the optimal indicators for advanced decision making? How do we recognize the line between sufficient statistical sampling and excessive, mission destructive sampling ? How do we assure that the normalization and parsing process, when interpolated through numerical models, yields accurate and actionable alerts and warnings? This presentation will address the integrated means and methods to achieve desired outputs for NASA and consumers of its data.
General statistical considerations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eberhardt, L L; Gilbert, R O
From NAEG plutonium environmental studies program meeting; Las Vegas, Nevada, USA (2 Oct 1973). The high sampling variability encountered in environmental plutonium studies along with high analytical costs makes it very important that efficient soil sampling plans be used. However, efficient sampling depends on explicit and simple statements of the objectives of the study. When there are multiple objectives it may be difficult to devise a wholly suitable sampling scheme. Sampling for long-term changes in plutonium concentration in soils may also be complex and expensive. Further attention to problems associated with compositing samples is recommended, as is the consistent usemore » of random sampling as a basic technique. (auth)« less
H.E. Anderson; J. Breidenbach
2007-01-01
Airborne laser scanning (LIDAR) can be a valuable tool in double-sampling forest survey designs. LIDAR-derived forest structure metrics are often highly correlated with important forest inventory variables, such as mean stand biomass, and LIDAR-based synthetic regression estimators have the potential to be highly efficient compared to single-stage estimators, which...
VizieR Online Data Catalog: The ESO DIBs Large Exploration Survey (Cox+, 2017)
NASA Astrophysics Data System (ADS)
Cox, N. L. J.; Cami, J.; Farhang, A.; Smoker, J.; Monreal-Ibero, A.; Lallement, R.; Sarre, P. J.; Marshall, C. C. M.; Smith, K. T.; Evans, C. J.; Royer, P.; Linnartz, H.; Cordiner, M. A.; Joblin, C.; van Loon, J. T.; Foing, B. H.; Bhatt, N. H.; Bron, E.; Elyajouri, M.; de Koter, A.; Ehrenfreund, P.; Javadi, A.; Kaper, L.; Khosroshadi, H. G.; Laverick, M.; Le Petit, F.; Mulas, G.; Roueff, E.; Salama, F.; Spaans, M.
2018-01-01
We constructed a statistically representative survey sample that probes a wide range of interstellar environment parameters including reddening E(B-V), visual extinction AV, total-to-selective extinction ratio RV, and molecular hydrogen fraction fH2. EDIBLES provides the community with optical (~305-1042nm) spectra at high spectral resolution (R~70000 in the blue arm and 100000 in the red arm) and high signal-to-noise (S/N; median value ~500-1000), for a statistically significant sample of interstellar sightlines. Many of the >100 sightlines included in the survey already have auxiliary available ultraviolet, infrared and/or polarisation data on the dust and gas components. (2 data files).
42 CFR 402.109 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 2 2011-10-01 2011-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...
42 CFR 402.109 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false Statistical sampling. 402.109 Section 402.109... Statistical sampling. (a) Purpose. CMS or OIG may introduce the results of a statistical sampling study to... or caused to be presented. (b) Prima facie evidence. The results of the statistical sampling study...
ERIC Educational Resources Information Center
Rojas-Kramer, Carlos; Limón-Suárez, Enrique; Moreno-García, Elena; García-Santillán, Arturo
2018-01-01
The aim of this paper was to analyze attitude towards statistics in high-school students using the SATS scale designed by Auzmendi (1992). The sample was 200 students from the sixth semester of the afternoon shift, who were enrolled in technical careers from the Technological Study Center of the Sea (Centro de Estudios Tecnológicos del Mar 07…
Influence of light curing and sample thickness on microhardness of a composite resin
Aguiar, Flávio HB; Andrade, Kelly RM; Leite Lima, Débora AN; Ambrosano, Gláucia MB; Lovadino, José R
2009-01-01
The aim of this in vitro study was to evaluate the influence of light-curing units and different sample thicknesses on the microhardness of a composite resin. Composite resin specimens were randomly prepared and assigned to nine experimental groups (n = 5): considering three light-curing units (conventional quartz tungsten halogen [QTH]: 550 mW/cm2 – 20 s; high irradiance QTH: 1160 mW/cm2 – 10 s; and light-emitting diode [LED]: 360 mW/cm2 – 40 s) and three sample thicknesses (0.5 mm, 1 mm, and 2 mm). All samples were polymerized with the light tip 8 mm away from the specimen. Knoop microhardness was then measured on the top and bottom surfaces of each sample. The top surfaces, with some exceptions, were almost similar; however, in relation to the bottom surfaces, statistical differences were found between curing units and thicknesses. In all experimental groups, the 0.5-mm-thick increments showed microhardness values statistically higher than those observed for 1- and -2-mm increments. The conventional and LED units showed higher hardness mean values and were statistically different from the high irradiance unit. In all experimental groups, microhardness mean values obtained for the top surface were higher than those observed for the bottom surface. In conclusion, higher levels of irradiance or thinner increments would help improve hybrid composite resin polymerization. PMID:23674901
Drug Use and Crime. Bureau of Justice Statistics Special Report.
ERIC Educational Resources Information Center
Innes, Christopher A.
In 1974, 1979, and 1986, the Bureau of Justice Statistics sponsored surveys of nationally representative samples of inmates of state correctional facilities. Results from the 1986 Survey of Inmates of State Correctional Facilities which included 13,711 inmates, indicated that inmates reported high levels of drug use prior to the commission of the…
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
NASA Astrophysics Data System (ADS)
Gill, Andrew B.; Black, Richard T.; Bowden, David J.; Priest, Andrew N.; Graves, Martin J.; Lomas, David J.
2014-06-01
This study investigated the effect of temporal resolution on the dual-input pharmacokinetic (PK) modelling of dynamic contrast-enhanced MRI (DCE-MRI) data from normal volunteer livers and from patients with hepatocellular carcinoma. Eleven volunteers and five patients were examined at 3 T. Two sections, one optimized for the vascular input functions (VIF) and one for the tissue, were imaged within a single heart-beat (HB) using a saturation-recovery fast gradient echo sequence. The data was analysed using a dual-input single-compartment PK model. The VIFs and/or uptake curves were then temporally sub-sampled (at interval ▵t = [2-20] s) before being subject to the same PK analysis. Statistical comparisons of tumour and normal tissue PK parameter values using a 5% significance level gave rise to the same study results when temporally sub-sampling the VIFs to HB < ▵t <4 s. However, sub-sampling to ▵t > 4 s did adversely affect the statistical comparisons. Temporal sub-sampling of just the liver/tumour tissue uptake curves at ▵t ≤ 20 s, whilst using high temporal resolution VIFs, did not substantially affect PK parameter statistical comparisons. In conclusion, there is no practical advantage to be gained from acquiring very high temporal resolution hepatic DCE-MRI data. Instead the high temporal resolution could be usefully traded for increased spatial resolution or SNR.
Abercrombie, M L; Jewell, J S
1986-01-01
Results of EMIT, Abuscreen RIA, and GC/MS tests for THC metabolites in a high volume random urinalysis program are compared. Samples were field tested by non-laboratory personnel with an EMIT system using a 100 ng/mL cutoff. Samples were then sent to the Army Forensic Toxicology Drug Testing Laboratory (WRAMC) at Fort Meade, Maryland, where they were tested by RIA (Abuscreen) using a statistical 100 ng/mL cutoff. Confirmations of all RIA positives were accomplished using a GC/MS procedure. EMIT and RIA results agreed for 91% of samples. Data indicated a 4% false positive rate and a 10% false negative rate for EMIT field testing. In a related study, results for samples which tested positive by RIA for THC metabolites using a statistical 100 ng/mL cutoff were compared with results by GC/MS utilizing a 20 ng/mL cutoff for the THCA metabolite. Presence of THCA metabolite was detected in 99.7% of RIA positive samples. No relationship between quantitations determined by the two tests was found.
Use of the Analysis of the Volatile Faecal Metabolome in Screening for Colorectal Cancer
2015-01-01
Diagnosis of colorectal cancer is an invasive and expensive colonoscopy, which is usually carried out after a positive screening test. Unfortunately, existing screening tests lack specificity and sensitivity, hence many unnecessary colonoscopies are performed. Here we report on a potential new screening test for colorectal cancer based on the analysis of volatile organic compounds (VOCs) in the headspace of faecal samples. Faecal samples were obtained from subjects who had a positive faecal occult blood sample (FOBT). Subjects subsequently had colonoscopies performed to classify them into low risk (non-cancer) and high risk (colorectal cancer) groups. Volatile organic compounds were analysed by selected ion flow tube mass spectrometry (SIFT-MS) and then data were analysed using both univariate and multivariate statistical methods. Ions most likely from hydrogen sulphide, dimethyl sulphide and dimethyl disulphide are statistically significantly higher in samples from high risk rather than low risk subjects. Results using multivariate methods show that the test gives a correct classification of 75% with 78% specificity and 72% sensitivity on FOBT positive samples, offering a potentially effective alternative to FOBT. PMID:26086914
Rapp, J.B.
1991-01-01
Q-mode factor analysis was used to quantitate the distribution of the major aliphatic hydrocarbon (n-alkanes, pristane, phytane) systems in sediments from a variety of marine environments. The compositions of the pure end members of the systems were obtained from factor scores and the distribution of the systems within each sample was obtained from factor loadings. All the data, from the diverse environments sampled (estuarine (San Francisco Bay), fresh-water (San Francisco Peninsula), polar-marine (Antarctica) and geothermal-marine (Gorda Ridge) sediments), were reduced to three major systems: a terrestrial system (mostly high molecular weight aliphatics with odd-numbered-carbon predominance), a mature system (mostly low molecular weight aliphatics without predominance) and a system containing mostly high molecular weight aliphatics with even-numbered-carbon predominance. With this statistical approach, it is possible to assign the percentage contribution from various sources to the observed distribution of aliphatic hydrocarbons in each sediment sample. ?? 1991.
A Science and Risk-Based Pragmatic Methodology for Blend and Content Uniformity Assessment.
Sayeed-Desta, Naheed; Pazhayattil, Ajay Babu; Collins, Jordan; Doshi, Chetan
2018-04-01
This paper describes a pragmatic approach that can be applied in assessing powder blend and unit dosage uniformity of solid dose products at Process Design, Process Performance Qualification, and Continued/Ongoing Process Verification stages of the Process Validation lifecycle. The statistically based sampling, testing, and assessment plan was developed due to the withdrawal of the FDA draft guidance for industry "Powder Blends and Finished Dosage Units-Stratified In-Process Dosage Unit Sampling and Assessment." This paper compares the proposed Grouped Area Variance Estimate (GAVE) method with an alternate approach outlining the practicality and statistical rationalization using traditional sampling and analytical methods. The approach is designed to fit solid dose processes assuring high statistical confidence in both powder blend uniformity and dosage unit uniformity during all three stages of the lifecycle complying with ASTM standards as recommended by the US FDA.
NASA Astrophysics Data System (ADS)
Naylor, M.; Main, I. G.; Greenhough, J.; Bell, A. F.; McCloskey, J.
2009-04-01
The Sumatran Boxing Day earthquake and subsequent large events provide an opportunity to re-evaluate the statistical evidence for characteristic earthquake events in frequency-magnitude distributions. Our aims are to (i) improve intuition regarding the properties of samples drawn from power laws, (ii) illustrate using random samples how appropriate Poisson confidence intervals can both aid the eye and provide an appropriate statistical evaluation of data drawn from power-law distributions, and (iii) apply these confidence intervals to test for evidence of characteristic earthquakes in subduction-zone frequency-magnitude distributions. We find no need for a characteristic model to describe frequency magnitude distributions in any of the investigated subduction zones, including Sumatra, due to an emergent skew in residuals of power law count data at high magnitudes combined with a sample bias for examining large earthquakes as candidate characteristic events.
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu
2015-01-01
Abstract Flow cytometry (FCM) is a fluorescence‐based single‐cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap‐FR, a novel method for cell population mapping across FCM samples. FlowMap‐FR is based on the Friedman–Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap‐FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap‐FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap‐FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap‐FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap‐FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback–Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL‐distance in distinguishing equivalent from nonequivalent cell populations. FlowMap‐FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F‐measure of 0.88 was obtained, indicating high precision and recall of the FR‐based population matching results. FlowMap‐FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © 2015 International Society for Advancement of Cytometry PMID:26274018
Hsiao, Chiaowen; Liu, Mengya; Stanton, Rick; McGee, Monnie; Qian, Yu; Scheuermann, Richard H
2016-01-01
Flow cytometry (FCM) is a fluorescence-based single-cell experimental technology that is routinely applied in biomedical research for identifying cellular biomarkers of normal physiological responses and abnormal disease states. While many computational methods have been developed that focus on identifying cell populations in individual FCM samples, very few have addressed how the identified cell populations can be matched across samples for comparative analysis. This article presents FlowMap-FR, a novel method for cell population mapping across FCM samples. FlowMap-FR is based on the Friedman-Rafsky nonparametric test statistic (FR statistic), which quantifies the equivalence of multivariate distributions. As applied to FCM data by FlowMap-FR, the FR statistic objectively quantifies the similarity between cell populations based on the shapes, sizes, and positions of fluorescence data distributions in the multidimensional feature space. To test and evaluate the performance of FlowMap-FR, we simulated the kinds of biological and technical sample variations that are commonly observed in FCM data. The results show that FlowMap-FR is able to effectively identify equivalent cell populations between samples under scenarios of proportion differences and modest position shifts. As a statistical test, FlowMap-FR can be used to determine whether the expression of a cellular marker is statistically different between two cell populations, suggesting candidates for new cellular phenotypes by providing an objective statistical measure. In addition, FlowMap-FR can indicate situations in which inappropriate splitting or merging of cell populations has occurred during gating procedures. We compared the FR statistic with the symmetric version of Kullback-Leibler divergence measure used in a previous population matching method with both simulated and real data. The FR statistic outperforms the symmetric version of KL-distance in distinguishing equivalent from nonequivalent cell populations. FlowMap-FR was also employed as a distance metric to match cell populations delineated by manual gating across 30 FCM samples from a benchmark FlowCAP data set. An F-measure of 0.88 was obtained, indicating high precision and recall of the FR-based population matching results. FlowMap-FR has been implemented as a standalone R/Bioconductor package so that it can be easily incorporated into current FCM data analytical workflows. © The Authors. Published by Wiley Periodicals, Inc. on behalf of ISAC.
NASA Technical Reports Server (NTRS)
Chen, Y.; Nguyen, D.; Guertin, S.; Berstein, J.; White, M.; Menke, R.; Kayali, S.
2003-01-01
This paper presents a reliability evaluation methodology to obtain the statistical reliability information of memory chips for space applications when the test sample size needs to be kept small because of the high cost of the radiation hardness memories.
Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E
2013-11-15
Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
ERIC Educational Resources Information Center
Jandaghi, Gholamreza
2010-01-01
The purpose of the research is to determine high school teachers' skill rate in designing exam questions in physics subject. The statistical population was all of physics exam shits for two semesters in one school year from which a sample of 364 exam shits was drawn using multistage cluster sampling. Two experts assessed the shits and by using…
Water flow in high-speed handpieces.
Cavalcanti, Bruno Neves; Serairdarian, Paulo Isaías; Rode, Sigmar Mello
2005-05-01
This study measured the water flow commonly used in high-speed handpieces to evaluate the water flow's influence on temperature generation. Different flow speeds were evaluated between turbines that had different numbers of cooling apertures. Two water samples were collected from each high-speed handpiece at private practices and at the School of Dentistry at São José dos Campos. The first sample was collected at the customary flow and the second was collected with the terminal opened for maximum flow. The two samples were collected into weighed glass receptacles after 15 seconds of turbine operation. The glass receptacles were reweighed and the difference between weights was recorded to calculate the water flow in mL/min and for further statistical analysis. The average water flow for 137 samples was 29.48 mL/min. The flow speeds obtained were 42.38 mL/min for turbines with one coolant aperture; 34.31 mL/min for turbines with two coolant apertures; and 30.44 mL/min for turbines with three coolant apertures. There were statistical differences between turbines with one and three coolant apertures (Tukey-Kramer multiple comparisons test with P < .05). Turbine handpieces with one cooling aperture distributed more water for the burs than high-speed handpieces with more than one aperture.
The Asymmetry Parameter and Branching Ratio of Sigma Plus Radiative Decay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Foucher, Maurice Emile
1992-05-01
We have measured the asymmetry parameter and branching ratio of themore » $$\\Sigma^+$$ radiative decay. This high statistics experiment (FNAL 761) was performed in the Proton Center charged hyperon beam at Fermi National Accelerator Laboratory in Batavia, Illinois. We find for the asymmetry parameter -0.720 $$\\pm$$ 0.086 $$\\pm$$ 0.045 where the first error is statistical and the second is systematic. This result is based on a sample of 34754 $$\\pm$$ 212 events. We find a preliminary value for the branching ratio $$Br ( \\Sigma^+ \\to p\\gamma )$$ $$/ Br ( \\Sigma^+ \\to p \\pi^0 )$$ = (2.14 $$\\pm$$ 0.07 $$\\pm$$ 0.11) x $$10^{-3}$$ where the first error is statistical and the second is systematic. This result is based on a sample of 31040 $$\\pm$$ 650 events. Both results are in agreement with previous low statistics measurements.« less
Osma, Jorge; García-Palacios, Azucena; Botella, Cristina; Barrada, Juan Ramón
2014-05-01
No studies have been found that compared the psychopathology features, including personality disorders, of Panic Disorder (PD) and Panic Disorder with Agoraphobia (PDA), and a nonclinical sample with anxiety vulnerability. The total sample included 152 participants, 52 in the PD/PDA, 45 in the high anxiety sensitivity (AS) sample, and 55 in the nonclinical sample. The participants in PD/PDA sample were evaluated with the structured interview ADIS-IV. The Brief Symptom Inventory and the MCMI-III were used in all three samples. Statistically significant differences were found between the PD/PDA and the nonclinical sample in all MCMI-III scales except for antisocial and compulsive. No significant differences were found between PD/PDA and the sample with high scores in AS. Phobic Anxiety and Paranoid Ideation were the only scales where there were significant differences between the PD/PDA sample and the high AS sample. Our findings showed that people who scored high on AS, despite not having a diagnosis of PD/PDA, were similar in regard to psychopathology features and personality to individuals with PD/PDA.
Wang, Ling-jia; Kissler, Hermann J; Wang, Xiaojun; Cochet, Olivia; Krzystyniak, Adam; Misawa, Ryosuke; Golab, Karolina; Tibudan, Martin; Grzanka, Jakub; Savari, Omid; Grose, Randall; Kaufman, Dixon B; Millis, Michael; Witkowski, Piotr
2015-01-01
Pancreatic islet mass, represented by islet equivalent (IEQ), is the most important parameter in decision making for clinical islet transplantation. To obtain IEQ, the sample of islets is routinely counted manually under a microscope and discarded thereafter. Islet purity, another parameter in islet processing, is routinely acquired by estimation only. In this study, we validated our digital image analysis (DIA) system developed using the software of Image Pro Plus for islet mass and purity assessment. Application of the DIA allows to better comply with current good manufacturing practice (cGMP) standards. Human islet samples were captured as calibrated digital images for the permanent record. Five trained technicians participated in determination of IEQ and purity by manual counting method and DIA. IEQ count showed statistically significant correlations between the manual method and DIA in all sample comparisons (r >0.819 and p < 0.0001). Statistically significant difference in IEQ between both methods was found only in High purity 100μL sample group (p = 0.029). As far as purity determination, statistically significant differences between manual assessment and DIA measurement was found in High and Low purity 100μL samples (p<0.005), In addition, islet particle number (IPN) and the IEQ/IPN ratio did not differ statistically between manual counting method and DIA. In conclusion, the DIA used in this study is a reliable technique in determination of IEQ and purity. Islet sample preserved as a digital image and results produced by DIA can be permanently stored for verification, technical training and islet information exchange between different islet centers. Therefore, DIA complies better with cGMP requirements than the manual counting method. We propose DIA as a quality control tool to supplement the established standard manual method for islets counting and purity estimation. PMID:24806436
How Good Are Statistical Models at Approximating Complex Fitness Landscapes?
du Plessis, Louis; Leventhal, Gabriel E.; Bonhoeffer, Sebastian
2016-01-01
Fitness landscapes determine the course of adaptation by constraining and shaping evolutionary trajectories. Knowledge of the structure of a fitness landscape can thus predict evolutionary outcomes. Empirical fitness landscapes, however, have so far only offered limited insight into real-world questions, as the high dimensionality of sequence spaces makes it impossible to exhaustively measure the fitness of all variants of biologically meaningful sequences. We must therefore revert to statistical descriptions of fitness landscapes that are based on a sparse sample of fitness measurements. It remains unclear, however, how much data are required for such statistical descriptions to be useful. Here, we assess the ability of regression models accounting for single and pairwise mutations to correctly approximate a complex quasi-empirical fitness landscape. We compare approximations based on various sampling regimes of an RNA landscape and find that the sampling regime strongly influences the quality of the regression. On the one hand it is generally impossible to generate sufficient samples to achieve a good approximation of the complete fitness landscape, and on the other hand systematic sampling schemes can only provide a good description of the immediate neighborhood of a sequence of interest. Nevertheless, we obtain a remarkably good and unbiased fit to the local landscape when using sequences from a population that has evolved under strong selection. Thus, current statistical methods can provide a good approximation to the landscape of naturally evolving populations. PMID:27189564
Sinko, William; de Oliveira, César Augusto F; Pierce, Levi C T; McCammon, J Andrew
2012-01-10
Molecular dynamics (MD) is one of the most common tools in computational chemistry. Recently, our group has employed accelerated molecular dynamics (aMD) to improve the conformational sampling over conventional molecular dynamics techniques. In the original aMD implementation, sampling is greatly improved by raising energy wells below a predefined energy level. Recently, our group presented an alternative aMD implementation where simulations are accelerated by lowering energy barriers of the potential energy surface. When coupled with thermodynamic integration simulations, this implementation showed very promising results. However, when applied to large systems, such as proteins, the simulation tends to be biased to high energy regions of the potential landscape. The reason for this behavior lies in the boost equation used since the highest energy barriers are dramatically more affected than the lower ones. To address this issue, in this work, we present a new boost equation that prevents oversampling of unfavorable high energy conformational states. The new boost potential provides not only better recovery of statistics throughout the simulation but also enhanced sampling of statistically relevant regions in explicit solvent MD simulations.
Erus, Guray; Zacharaki, Evangelia I; Davatzikos, Christos
2014-04-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a "target-specific" feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject's images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an "estimability" criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. Copyright © 2014 Elsevier B.V. All rights reserved.
Erus, Guray; Zacharaki, Evangelia I.; Davatzikos, Christos
2014-01-01
This paper presents a method for capturing statistical variation of normal imaging phenotypes, with emphasis on brain structure. The method aims to estimate the statistical variation of a normative set of images from healthy individuals, and identify abnormalities as deviations from normality. A direct estimation of the statistical variation of the entire volumetric image is challenged by the high-dimensionality of images relative to smaller sample sizes. To overcome this limitation, we iteratively sample a large number of lower dimensional subspaces that capture image characteristics ranging from fine and localized to coarser and more global. Within each subspace, a “target-specific” feature selection strategy is applied to further reduce the dimensionality, by considering only imaging characteristics present in a test subject’s images. Marginal probability density functions of selected features are estimated through PCA models, in conjunction with an “estimability” criterion that limits the dimensionality of estimated probability densities according to available sample size and underlying anatomy variation. A test sample is iteratively projected to the subspaces of these marginals as determined by PCA models, and its trajectory delineates potential abnormalities. The method is applied to segmentation of various brain lesion types, and to simulated data on which superiority of the iterative method over straight PCA is demonstrated. PMID:24607564
45 CFR 160.536 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 45 Public Welfare 1 2010-10-01 2010-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 5 2011-10-01 2011-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...
45 CFR 160.536 - Statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 45 Public Welfare 1 2011-10-01 2011-10-01 false Statistical sampling. 160.536 Section 160.536... REQUIREMENTS GENERAL ADMINISTRATIVE REQUIREMENTS Procedures for Hearings § 160.536 Statistical sampling. (a) In... statistical sampling study as evidence of the number of violations under § 160.406 of this part, or the...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 5 2010-10-01 2010-10-01 false Statistical sampling. 1003.133 Section 1003.133... AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting... statistical sampling study as evidence of the number and amount of claims and/or requests for payment as...
42 CFR 405.1064 - ALJ decisions involving statistical samples.
Code of Federal Regulations, 2011 CFR
2011-10-01
... 42 Public Health 2 2011-10-01 2011-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...
42 CFR 405.1064 - ALJ decisions involving statistical samples.
Code of Federal Regulations, 2010 CFR
2010-10-01
... 42 Public Health 2 2010-10-01 2010-10-01 false ALJ decisions involving statistical samples. 405... Medicare Coverage Policies § 405.1064 ALJ decisions involving statistical samples. When an appeal from the QIC involves an overpayment issue and the QIC used a statistical sample in reaching its...
Introducing Undergraduate Students to Metabolomics Using a NMR-Based Analysis of Coffee Beans
ERIC Educational Resources Information Center
Sandusky, Peter Olaf
2017-01-01
Metabolomics applies multivariate statistical analysis to sets of high-resolution spectra taken over a population of biologically derived samples. The objective is to distinguish subpopulations within the overall sample population, and possibly also to identify biomarkers. While metabolomics has become part of the standard analytical toolbox in…
NASA Astrophysics Data System (ADS)
McNamara, Mark W.
The study tested the hypothesis that self-efficacy, stress, and acculturation are useful predictors of academic achievement in first year university science, independent of high school GPA and SAT scores, in a sample of Latino students at a South Texas Hispanic serving institution of higher education. The correlational study employed a mixed methods explanatory sequential model. The non-probability sample consisted of 98 university science and engineering students. The study participants had high science self-efficacy, low number of stressors, and were slightly Anglo-oriented bicultural to strongly Anglo-oriented. As expected, the control variables of SAT score and high school GPA were statistically significant predictors of the outcome measures. Together, they accounted for 19.80% of the variation in first year GPA, 13.80% of the variation in earned credit hours, and 11.30% of the variation in intent to remain in the science major. After controlling for SAT scores and high school GPAs, self-efficacy was a statistically significant predictor of credit hours earned and accounted for 5.60% of the variation; its unique contribution in explaining the variation in first year GPA and intent to remain in the science major was not statistically significant. Stress and acculturation were not statistically significant predictors of any of the outcome measures. Analysis of the qualitative data resulted in six themes (a) high science self-efficacy, (b) stressors, (c) positive role of stress, (d) Anglo-oriented, (e) bicultural, and (f) family. The quantitative and qualitative results were synthesized and practical implications were discussed.
Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
This report provides comparative information derived from a national sample of 516 public two-year colleges, highlighting financial statistics for fiscal year, 1992-93. This report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and statistics presented in a…
7 CFR 52.38a - Definitions of terms applicable to statistical sampling.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...
7 CFR 52.38a - Definitions of terms applicable to statistical sampling.
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Definitions of terms applicable to statistical... Sampling § 52.38a Definitions of terms applicable to statistical sampling. (a) Terms applicable to both on... acceptable as a process average. At the AQL's contained in the statistical sampling plans of this subpart...
Evidence for a Global Sampling Process in Extraction of Summary Statistics of Item Sizes in a Set.
Tokita, Midori; Ueda, Sachiyo; Ishiguchi, Akira
2016-01-01
Several studies have shown that our visual system may construct a "summary statistical representation" over groups of visual objects. Although there is a general understanding that human observers can accurately represent sets of a variety of features, many questions on how summary statistics, such as an average, are computed remain unanswered. This study investigated sampling properties of visual information used by human observers to extract two types of summary statistics of item sets, average and variance. We presented three models of ideal observers to extract the summary statistics: a global sampling model without sampling noise, global sampling model with sampling noise, and limited sampling model. We compared the performance of an ideal observer of each model with that of human observers using statistical efficiency analysis. Results suggest that summary statistics of items in a set may be computed without representing individual items, which makes it possible to discard the limited sampling account. Moreover, the extraction of summary statistics may not necessarily require the representation of individual objects with focused attention when the sets of items are larger than 4.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.
Hero, Alfred O; Rajaratnam, Bala
2016-01-01
When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.
Power calculation for overall hypothesis testing with high-dimensional commensurate outcomes.
Chi, Yueh-Yun; Gribbin, Matthew J; Johnson, Jacqueline L; Muller, Keith E
2014-02-28
The complexity of system biology means that any metabolic, genetic, or proteomic pathway typically includes so many components (e.g., molecules) that statistical methods specialized for overall testing of high-dimensional and commensurate outcomes are required. While many overall tests have been proposed, very few have power and sample size methods. We develop accurate power and sample size methods and software to facilitate study planning for high-dimensional pathway analysis. With an account of any complex correlation structure between high-dimensional outcomes, the new methods allow power calculation even when the sample size is less than the number of variables. We derive the exact (finite-sample) and approximate non-null distributions of the 'univariate' approach to repeated measures test statistic, as well as power-equivalent scenarios useful to generalize our numerical evaluations. Extensive simulations of group comparisons support the accuracy of the approximations even when the ratio of number of variables to sample size is large. We derive a minimum set of constants and parameters sufficient and practical for power calculation. Using the new methods and specifying the minimum set to determine power for a study of metabolic consequences of vitamin B6 deficiency helps illustrate the practical value of the new results. Free software implementing the power and sample size methods applies to a wide range of designs, including one group pre-intervention and post-intervention comparisons, multiple parallel group comparisons with one-way or factorial designs, and the adjustment and evaluation of covariate effects. Copyright © 2013 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Kucharik, C.; Roth, J.
2002-12-01
The threat of global climate change has provoked policy-makers to consider plausible strategies to slow the accumulation of greenhouse gases, especially carbon dioxide, in the atmosphere. One such idea involves the sequestration of atmospheric carbon (C) in degraded agricultural soils as part of the Conservation Reserve Program (CRP). While the potential for significant C sequestration in CRP grassland ecosystems has been demonstrated, the paired-site sampling approach traditionally used to quantify soil C changes has not been evaluated with robust statistical analysis. In this study, 14 paired CRP (> 8 years old) and cropland sites in Dane County, Wisconsin (WI) were used to assess whether a paired-site sampling design could detect statistically significant differences (ANOVA) in mean soil organic C and total nitrogen (N) storage. We compared surface (0 to 10 cm) bulk density, and sampled soils (0 to 5, 5 to 10, and 10 to 25 cm) for textural differences and chemical analysis of organic matter (OM), soil organic C (SOC), total N, and pH. The CRP contributed to lowering soil bulk density by 13% (p < 0.0001) and increased SOC and OM storage (kg m-2) by 13 to 17% in the 0 to 5 cm layer (p = 0.1). We tested the statistical power associated with ANOVA for measured soil properties, and calculated minimum detectable differences (MDD). We concluded that 40 to 65 paired sites and soil sampling in 5 cm increments near the surface were needed to achieve an 80% confidence level (α = 0.05; β = 0.20) in soil C and N sequestration rates. Because soil C and total N storage was highly variable among these sites (CVs > 20%), only a 23 to 29% change in existing total organic C and N pools could be reliably detected. While C and N sequestration (247 kg C ha{-1 } yr-1 and 17 kg N ha-1 yr-1) may be occurring and confined to the surface 5 cm as part of the WI CRP, our sampling design did not statistically support the desired 80% confidence level. We conclude that usage of statistical power analysis is essential to insure a high level of confidence in soil C and N sequestration rates that are quantified using paired plots.
High risk human papillomavirus in the periodontium : A case control study.
Shipilova, Anna; Dayakar, Manjunath Mundoor; Gupta, Dinesh
2017-01-01
Human papilloma viruses (HPVs) are small DNA viruses that have been identified in periodontal pocket as well as gingival sulcus. High risk HPVs are also associated with a subset of head and neck carcinomas. It is thought that the periodontium could be a reservoir for HPV. 1. Detection of Human Papilloma virus (HPV) in periodontal pocket as well as gingival of patients having localized chronic periodontitis and gingival sulcus of periodontally healthy subjects. 2. Quantitative estimation of E6 and E7 mRNA in subjects showing presence of HPV3. To assess whether periodontal pocket is a reservoir for HPV. This case-control study included 30 subjects with localized chronic Periodontitis (cases) and 30 periodontally healthy subjects (controls). Two samples were taken from cases, one from periodontal pocket and one from gingival sulcus and one sample was taken from controls. Samples were collected in the form of pocket scrapings and gingival sulcus scrapings from cases and controls respectively. These samples were sent in storage media for identification and estimation of E6/E7 mRNA of HPV using in situ hybridization and flow cytometry. Statistical analysis was done by using, mean, percentage and Chi Square test. A statistical package SPSS version 13.0 was used to analyze the data. P value < 0.05 was considered as statistically significant. pocket samples as well as sulcus samples for both cases and controls were found to contain HPV E6/E7 mRNAInterpretation and. Presence of HPV E6/E7 mRNA in periodontium supports the hypothesis that periodontal tissues serve as a reservoir for latent HPV and there may be a synergy between oral cancer, periodontitis and HPV. However prospective studies are required to further explore this link.
2013-01-01
Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated significant improvement of the new method over the non-parametric trend test, which is the most popularly implemented in the literature of GWAS. PMID:23394771
A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix.
Hu, Zongliang; Dong, Kai; Dai, Wenlin; Tong, Tiejun
2017-09-21
The determinant of the covariance matrix for high-dimensional data plays an important role in statistical inference and decision. It has many real applications including statistical tests and information theory. Due to the statistical and computational challenges with high dimensionality, little work has been proposed in the literature for estimating the determinant of high-dimensional covariance matrix. In this paper, we estimate the determinant of the covariance matrix using some recent proposals for estimating high-dimensional covariance matrix. Specifically, we consider a total of eight covariance matrix estimation methods for comparison. Through extensive simulation studies, we explore and summarize some interesting comparison results among all compared methods. We also provide practical guidelines based on the sample size, the dimension, and the correlation of the data set for estimating the determinant of high-dimensional covariance matrix. Finally, from a perspective of the loss function, the comparison study in this paper may also serve as a proxy to assess the performance of the covariance matrix estimation.
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J.; Yanes, Oscar
2012-01-01
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples. PMID:24957762
A Guideline to Univariate Statistical Analysis for LC/MS-Based Untargeted Metabolomics-Derived Data.
Vinaixa, Maria; Samino, Sara; Saez, Isabel; Duran, Jordi; Guinovart, Joan J; Yanes, Oscar
2012-10-18
Several metabolomic software programs provide methods for peak picking, retention time alignment and quantification of metabolite features in LC/MS-based metabolomics. Statistical analysis, however, is needed in order to discover those features significantly altered between samples. By comparing the retention time and MS/MS data of a model compound to that from the altered feature of interest in the research sample, metabolites can be then unequivocally identified. This paper reports on a comprehensive overview of a workflow for statistical analysis to rank relevant metabolite features that will be selected for further MS/MS experiments. We focus on univariate data analysis applied in parallel on all detected features. Characteristics and challenges of this analysis are discussed and illustrated using four different real LC/MS untargeted metabolomic datasets. We demonstrate the influence of considering or violating mathematical assumptions on which univariate statistical test rely, using high-dimensional LC/MS datasets. Issues in data analysis such as determination of sample size, analytical variation, assumption of normality and homocedasticity, or correction for multiple testing are discussed and illustrated in the context of our four untargeted LC/MS working examples.
Output statistics of laser anemometers in sparsely seeded flows
NASA Technical Reports Server (NTRS)
Edwards, R. V.; Jensen, A. S.
1982-01-01
It is noted that until very recently, research on this topic concentrated on the particle arrival statistics and the influence of the optical parameters on them. Little attention has been paid to the influence of subsequent processing on the measurement statistics. There is also controversy over whether the effects of the particle statistics can be measured. It is shown here that some of the confusion derives from a lack of understanding of the experimental parameters that are to be controlled or known. A rigorous framework is presented for examining the measurement statistics of such systems. To provide examples, two problems are then addressed. The first has to do with a sample and hold processor, the second with what is called a saturable processor. The sample and hold processor converts the output to a continuous signal by holding the last reading until a new one is obtained. The saturable system is one where the maximum processable rate is arrived at by the dead time of some unit in the system. At high particle rates, the processed rate is determined through the dead time.
Understanding the Sampling Distribution and the Central Limit Theorem.
ERIC Educational Resources Information Center
Lewis, Charla P.
The sampling distribution is a common source of misuse and misunderstanding in the study of statistics. The sampling distribution, underlying distribution, and the Central Limit Theorem are all interconnected in defining and explaining the proper use of the sampling distribution of various statistics. The sampling distribution of a statistic is…
ERIC Educational Resources Information Center
Garfield, Joan; Le, Laura; Zieffler, Andrew; Ben-Zvi, Dani
2015-01-01
This paper describes the importance of developing students' reasoning about samples and sampling variability as a foundation for statistical thinking. Research on expert-novice thinking as well as statistical thinking is reviewed and compared. A case is made that statistical thinking is a type of expert thinking, and as such, research…
NASA Astrophysics Data System (ADS)
Kerr, Laura T.; Adams, Aine; O'Dea, Shirley; Domijan, Katarina; Cullen, Ivor; Hennelly, Bryan M.
2014-05-01
Raman microspectroscopy can be applied to the urinary bladder for highly accurate classification and diagnosis of bladder cancer. This technique can be applied in vitro to bladder epithelial cells obtained from urine cytology or in vivo as an optical biopsy" to provide results in real-time with higher sensitivity and specificity than current clinical methods. However, there exists a high degree of variability across experimental parameters which need to be standardised before this technique can be utilized in an everyday clinical environment. In this study, we investigate different laser wavelengths (473 nm and 532 nm), sample substrates (glass, fused silica and calcium fluoride) and multivariate statistical methods in order to gain insight into how these various experimental parameters impact on the sensitivity and specificity of Raman cytology.
Wu, Robert; Glen, Peter; Ramsay, Tim; Martel, Guillaume
2014-06-28
Observational studies dominate the surgical literature. Statistical adjustment is an important strategy to account for confounders in observational studies. Research has shown that published articles are often poor in statistical quality, which may jeopardize their conclusions. The Statistical Analyses and Methods in the Published Literature (SAMPL) guidelines have been published to help establish standards for statistical reporting.This study will seek to determine whether the quality of statistical adjustment and the reporting of these methods are adequate in surgical observational studies. We hypothesize that incomplete reporting will be found in all surgical observational studies, and that the quality and reporting of these methods will be of lower quality in surgical journals when compared with medical journals. Finally, this work will seek to identify predictors of high-quality reporting. This work will examine the top five general surgical and medical journals, based on a 5-year impact factor (2007-2012). All observational studies investigating an intervention related to an essential component area of general surgery (defined by the American Board of Surgery), with an exposure, outcome, and comparator, will be included in this systematic review. Essential elements related to statistical reporting and quality were extracted from the SAMPL guidelines and include domains such as intent of analysis, primary analysis, multiple comparisons, numbers and descriptive statistics, association and correlation analyses, linear regression, logistic regression, Cox proportional hazard analysis, analysis of variance, survival analysis, propensity analysis, and independent and correlated analyses. Each article will be scored as a proportion based on fulfilling criteria in relevant analyses used in the study. A logistic regression model will be built to identify variables associated with high-quality reporting. A comparison will be made between the scores of surgical observational studies published in medical versus surgical journals. Secondary outcomes will pertain to individual domains of analysis. Sensitivity analyses will be conducted. This study will explore the reporting and quality of statistical analyses in surgical observational studies published in the most referenced surgical and medical journals in 2013 and examine whether variables (including the type of journal) can predict high-quality reporting.
An evaluation of grease type ball bearing lubricants operating in various environments
NASA Technical Reports Server (NTRS)
Mcmurtrey, E. L.
1979-01-01
Grease type lubricants in bearings were evaluated in five different adverse environments for a one year period. Four repetitions of each test were made to provide statistical samples. These tests were then used to select four lubricants for five year tests in selected environments with five repetitions of each test for statistical samples. Three five year tests were started in (1) continuous operation; (2) start-stop operation, with both in vacuum at ambient temperatures; and (3) continuous operation at 93.3 C. To date, in the one year tests, the best results in all environments were obtained with a high viscosity index perfluoroalkylpolyether grease.
The Relationship between the Catholic Teacher's Faith and Commitment in the Catholic High School
ERIC Educational Resources Information Center
Cho, Young Kwan
2012-01-01
This study investigates the relationship between Catholic teachers' faith and their school commitment in Catholic high schools. A national sample of 751 teachers from 39 Catholic high schools in 15 archdioceses in the United States participated in a self-administered website survey. Data were analyzed using descriptive statistics and the Pearson…
A Window into South Korean Culture: Stress and Coping in Female High School Students
ERIC Educational Resources Information Center
VanderGast, Tim S.; Foxx, Sejal Parikh; Flowers, Claudia; Rouse, Andrew Thomas; Decker, Karen M.
2015-01-01
In an effort to increase multicultural competence, professional counselors in the United States analyzed archival data from high school students from Seoul, South Korea. A sample of all female (N = 577) high school students responded to survey questions related to stress and coping. Results demonstrated statistical significance in levels of stress…
SERE: single-parameter quality control and sample comparison for RNA-Seq.
Schulze, Stefan K; Kanwar, Rahul; Gölzenleuchter, Meike; Therneau, Terry M; Beutler, Andreas S
2012-10-03
Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson's correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson's r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen's simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter.
SERE: Single-parameter quality control and sample comparison for RNA-Seq
2012-01-01
Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915
Friedman, David B
2012-01-01
All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.
Interpretation of correlations in clinical research.
Hung, Man; Bounsanga, Jerry; Voss, Maren Wright
2017-11-01
Critically analyzing research is a key skill in evidence-based practice and requires knowledge of research methods, results interpretation, and applications, all of which rely on a foundation based in statistics. Evidence-based practice makes high demands on trained medical professionals to interpret an ever-expanding array of research evidence. As clinical training emphasizes medical care rather than statistics, it is useful to review the basics of statistical methods and what they mean for interpreting clinical studies. We reviewed the basic concepts of correlational associations, violations of normality, unobserved variable bias, sample size, and alpha inflation. The foundations of causal inference were discussed and sound statistical analyses were examined. We discuss four ways in which correlational analysis is misused, including causal inference overreach, over-reliance on significance, alpha inflation, and sample size bias. Recent published studies in the medical field provide evidence of causal assertion overreach drawn from correlational findings. The findings present a primer on the assumptions and nature of correlational methods of analysis and urge clinicians to exercise appropriate caution as they critically analyze the evidence before them and evaluate evidence that supports practice. Critically analyzing new evidence requires statistical knowledge in addition to clinical knowledge. Studies can overstate relationships, expressing causal assertions when only correlational evidence is available. Failure to account for the effect of sample size in the analyses tends to overstate the importance of predictive variables. It is important not to overemphasize the statistical significance without consideration of effect size and whether differences could be considered clinically meaningful.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G. Ostrouchov; W.E.Doll; D.A.Wolf
2003-07-01
Unexploded ordnance(UXO)surveys encompass large areas, and the cost of surveying these areas can be high. Enactment of earlier protocols for sampling UXO sites have shown the shortcomings of these procedures and led to a call for development of scientifically defensible statistical procedures for survey design and analysis. This project is one of three funded by SERDP to address this need.
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data
Chen, Yi-Hau
2017-01-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA. PMID:28622336
A knowledge-based T2-statistic to perform pathway analysis for quantitative proteomic data.
Lai, En-Yu; Chen, Yi-Hau; Wu, Kun-Pin
2017-06-01
Approaches to identify significant pathways from high-throughput quantitative data have been developed in recent years. Still, the analysis of proteomic data stays difficult because of limited sample size. This limitation also leads to the practice of using a competitive null as common approach; which fundamentally implies genes or proteins as independent units. The independent assumption ignores the associations among biomolecules with similar functions or cellular localization, as well as the interactions among them manifested as changes in expression ratios. Consequently, these methods often underestimate the associations among biomolecules and cause false positives in practice. Some studies incorporate the sample covariance matrix into the calculation to address this issue. However, sample covariance may not be a precise estimation if the sample size is very limited, which is usually the case for the data produced by mass spectrometry. In this study, we introduce a multivariate test under a self-contained null to perform pathway analysis for quantitative proteomic data. The covariance matrix used in the test statistic is constructed by the confidence scores retrieved from the STRING database or the HitPredict database. We also design an integrating procedure to retain pathways of sufficient evidence as a pathway group. The performance of the proposed T2-statistic is demonstrated using five published experimental datasets: the T-cell activation, the cAMP/PKA signaling, the myoblast differentiation, and the effect of dasatinib on the BCR-ABL pathway are proteomic datasets produced by mass spectrometry; and the protective effect of myocilin via the MAPK signaling pathway is a gene expression dataset of limited sample size. Compared with other popular statistics, the proposed T2-statistic yields more accurate descriptions in agreement with the discussion of the original publication. We implemented the T2-statistic into an R package T2GA, which is available at https://github.com/roqe/T2GA.
Methodological quality of behavioural weight loss studies: a systematic review
Lemon, S. C.; Wang, M. L.; Haughton, C. F.; Estabrook, D. P.; Frisard, C. F.; Pagoto, S. L.
2018-01-01
Summary This systematic review assessed the methodological quality of behavioural weight loss intervention studies conducted among adults and associations between quality and statistically significant weight loss outcome, strength of intervention effectiveness and sample size. Searches for trials published between January, 2009 and December, 2014 were conducted using PUBMED, MEDLINE and PSYCINFO and identified ninety studies. Methodological quality indicators included study design, anthropometric measurement approach, sample size calculations, intent-to-treat (ITT) analysis, loss to follow-up rate, missing data strategy, sampling strategy, report of treatment receipt and report of intervention fidelity (mean = 6.3). Indicators most commonly utilized included randomized design (100%), objectively measured anthropometrics (96.7%), ITT analysis (86.7%) and reporting treatment adherence (76.7%). Most studies (62.2%) had a follow-up rate >75% and reported a loss to follow-up analytic strategy or minimal missing data (69.9%). Describing intervention fidelity (34.4%) and sampling from a known population (41.1%) were least common. Methodological quality was not associated with reporting a statistically significant result, effect size or sample size. This review found the published literature of behavioural weight loss trials to be of high quality for specific indicators, including study design and measurement. Identified for improvement include utilization of more rigorous statistical approaches to loss to follow up and better fidelity reporting. PMID:27071775
Challenges of Big Data Analysis.
Fan, Jianqing; Han, Fang; Liu, Han
2014-06-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Challenges of Big Data Analysis
Fan, Jianqing; Han, Fang; Liu, Han
2014-01-01
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions. PMID:25419469
Power of tests for comparing trend curves with application to national immunization survey (NIS).
Zhao, Zhen
2011-02-28
To develop statistical tests for comparing trend curves of study outcomes between two socio-demographic strata across consecutive time points, and compare statistical power of the proposed tests under different trend curves data, three statistical tests were proposed. For large sample size with independent normal assumption among strata and across consecutive time points, the Z and Chi-square test statistics were developed, which are functions of outcome estimates and the standard errors at each of the study time points for the two strata. For small sample size with independent normal assumption, the F-test statistic was generated, which is a function of sample size of the two strata and estimated parameters across study period. If two trend curves are approximately parallel, the power of Z-test is consistently higher than that of both Chi-square and F-test. If two trend curves cross at low interaction, the power of Z-test is higher than or equal to the power of both Chi-square and F-test; however, at high interaction, the powers of Chi-square and F-test are higher than that of Z-test. The measurement of interaction of two trend curves was defined. These tests were applied to the comparison of trend curves of vaccination coverage estimates of standard vaccine series with National Immunization Survey (NIS) 2000-2007 data. Copyright © 2011 John Wiley & Sons, Ltd.
A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...
The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article
NASA Astrophysics Data System (ADS)
Magazù, Salvatore; Mezei, Ferenc; Migliardo, Federica
2018-05-01
In a variety of applications of inelastic neutron scattering spectroscopy the goal is to single out the elastic scattering contribution from the total scattered spectrum as a function of momentum transfer and sample environment parameters. The elastic part of the spectrum is defined in such a case by the energy resolution of the spectrometer. Variable elastic energy resolution offers a way to distinguish between elastic and quasi-elastic intensities. Correlation spectroscopy lends itself as an efficient, high intensity approach for accomplishing this both at continuous and pulsed neutron sources. On the one hand, in beam modulation methods the Liouville theorem coupling between intensity and resolution is relaxed and time-of-flight velocity analysis of the neutron velocity distribution can be performed with 50 % duty factor exposure for all available resolutions. On the other hand, the (quasi)elastic part of the spectrum generally contains the major part of the integrated intensity at a given detector, and thus correlation spectroscopy can be applied with most favorable signal to statistical noise ratio. The novel spectrometer CORELLI at SNS is an example for this type of application of the correlation technique at a pulsed source. On a continuous neutron source a statistical chopper can be used for quasi-random time dependent beam modulation and the total time-of-flight of the neutron from the statistical chopper to detection is determined by the analysis of the correlation between the temporal fluctuation of the neutron detection rate and the statistical chopper beam modulation pattern. The correlation analysis can either be used for the determination of the incoming neutron velocity or for the scattered neutron velocity, depending of the position of the statistical chopper along the neutron trajectory. These two options are considered together with an evaluation of spectrometer performance compared to conventional spectroscopy, in particular for variable resolution elastic neutron scattering (RENS) studies of relaxation processes and the evolution of mean square displacements. A particular focus of our analysis is the unique feature of correlation spectroscopy of delivering high and resolution independent beam intensity, thus the same statistical chopper scan contains both high intensity and high resolution information at the same time, and can be evaluated both ways. This flexibility for variable resolution data handling represents an additional asset for correlation spectroscopy in variable resolution work. Changing the beam width for the same statistical chopper allows us to additionally trade resolution for intensity in two different experimental runs, similarly for conventional single slit chopper spectroscopy. The combination of these two approaches is a capability of particular value in neutron spectroscopy studies requiring variable energy resolution, such as the systematic study of quasi-elastic scattering and mean square displacement. Furthermore the statistical chopper approach is particularly advantageous for studying samples with low scattering intensity in the presence of a high, sample independent background.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piepel, Gregory F.; Matzke, Brett D.; Sego, Landon H.
2013-04-27
This report discusses the methodology, formulas, and inputs needed to make characterization and clearance decisions for Bacillus anthracis-contaminated and uncontaminated (or decontaminated) areas using a statistical sampling approach. Specifically, the report includes the methods and formulas for calculating the • number of samples required to achieve a specified confidence in characterization and clearance decisions • confidence in making characterization and clearance decisions for a specified number of samples for two common statistically based environmental sampling approaches. In particular, the report addresses an issue raised by the Government Accountability Office by providing methods and formulas to calculate the confidence that amore » decision area is uncontaminated (or successfully decontaminated) if all samples collected according to a statistical sampling approach have negative results. Key to addressing this topic is the probability that an individual sample result is a false negative, which is commonly referred to as the false negative rate (FNR). The two statistical sampling approaches currently discussed in this report are 1) hotspot sampling to detect small isolated contaminated locations during the characterization phase, and 2) combined judgment and random (CJR) sampling during the clearance phase. Typically if contamination is widely distributed in a decision area, it will be detectable via judgment sampling during the characterization phrase. Hotspot sampling is appropriate for characterization situations where contamination is not widely distributed and may not be detected by judgment sampling. CJR sampling is appropriate during the clearance phase when it is desired to augment judgment samples with statistical (random) samples. The hotspot and CJR statistical sampling approaches are discussed in the report for four situations: 1. qualitative data (detect and non-detect) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 2. qualitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0 3. quantitative data (e.g., contaminant concentrations expressed as CFU/cm2) when the FNR = 0 or when using statistical sampling methods that account for FNR > 0 4. quantitative data when the FNR > 0 but statistical sampling methods are used that assume the FNR = 0. For Situation 2, the hotspot sampling approach provides for stating with Z% confidence that a hotspot of specified shape and size with detectable contamination will be found. Also for Situation 2, the CJR approach provides for stating with X% confidence that at least Y% of the decision area does not contain detectable contamination. Forms of these statements for the other three situations are discussed in Section 2.2. Statistical methods that account for FNR > 0 currently only exist for the hotspot sampling approach with qualitative data (or quantitative data converted to qualitative data). This report documents the current status of methods and formulas for the hotspot and CJR sampling approaches. Limitations of these methods are identified. Extensions of the methods that are applicable when FNR = 0 to account for FNR > 0, or to address other limitations, will be documented in future revisions of this report if future funding supports the development of such extensions. For quantitative data, this report also presents statistical methods and formulas for 1. quantifying the uncertainty in measured sample results 2. estimating the true surface concentration corresponding to a surface sample 3. quantifying the uncertainty of the estimate of the true surface concentration. All of the methods and formulas discussed in the report were applied to example situations to illustrate application of the methods and interpretation of the results.« less
NASA Astrophysics Data System (ADS)
Dennison, Andrew G.
Classification of the seafloor substrate can be done with a variety of methods. These methods include Visual (dives, drop cameras); mechanical (cores, grab samples); acoustic (statistical analysis of echosounder returns). Acoustic methods offer a more powerful and efficient means of collecting useful information about the bottom type. Due to the nature of an acoustic survey, larger areas can be sampled, and by combining the collected data with visual and mechanical survey methods provide greater confidence in the classification of a mapped region. During a multibeam sonar survey, both bathymetric and backscatter data is collected. It is well documented that the statistical characteristic of a sonar backscatter mosaic is dependent on bottom type. While classifying the bottom-type on the basis on backscatter alone can accurately predict and map bottom-type, i.e a muddy area from a rocky area, it lacks the ability to resolve and capture fine textural details, an important factor in many habitat mapping studies. Statistical processing of high-resolution multibeam data can capture the pertinent details about the bottom-type that are rich in textural information. Further multivariate statistical processing can then isolate characteristic features, and provide the basis for an accurate classification scheme. The development of a new classification method is described here. It is based upon the analysis of textural features in conjunction with ground truth sampling. The processing and classification result of two geologically distinct areas in nearshore regions of Lake Superior; off the Lester River,MN and Amnicon River, WI are presented here, using the Minnesota Supercomputer Institute's Mesabi computing cluster for initial processing. Processed data is then calibrated using ground truth samples to conduct an accuracy assessment of the surveyed areas. From analysis of high-resolution bathymetry data collected at both survey sites is was possible to successfully calculate a series of measures that describe textural information about the lake floor. Further processing suggests that the features calculated capture a significant amount of statistical information about the lake floor terrain as well. Two sources of error, an anomalous heave and refraction error significantly deteriorated the quality of the processed data and resulting validate results. Ground truth samples used to validate the classification methods utilized for both survey sites, however, resulted in accuracy values ranging from 5 -30 percent at the Amnicon River, and between 60-70 percent for the Lester River. The final results suggest that this new processing methodology does adequately capture textural information about the lake floor and does provide an acceptable classification in the absence of significant data quality issues.
Framework for making better predictions by directly estimating variables' predictivity.
Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa
2016-12-13
We propose approaching prediction from a framework grounded in the theoretical correct prediction rate of a variable set as a parameter of interest. This framework allows us to define a measure of predictivity that enables assessing variable sets for, preferably high, predictivity. We first define the prediction rate for a variable set and consider, and ultimately reject, the naive estimator, a statistic based on the observed sample data, due to its inflated bias for moderate sample size and its sensitivity to noisy useless variables. We demonstrate that the [Formula: see text]-score of the PR method of VS yields a relatively unbiased estimate of a parameter that is not sensitive to noisy variables and is a lower bound to the parameter of interest. Thus, the PR method using the [Formula: see text]-score provides an effective approach to selecting highly predictive variables. We offer simulations and an application of the [Formula: see text]-score on real data to demonstrate the statistic's predictive performance on sample data. We conjecture that using the partition retention and [Formula: see text]-score can aid in finding variable sets with promising prediction rates; however, further research in the avenue of sample-based measures of predictivity is much desired.
An audit of the statistics and the comparison with the parameter in the population
NASA Astrophysics Data System (ADS)
Bujang, Mohamad Adam; Sa'at, Nadiah; Joys, A. Reena; Ali, Mariana Mohamad
2015-10-01
The sufficient sample size that is needed to closely estimate the statistics for particular parameters are use to be an issue. Although sample size might had been calculated referring to objective of the study, however, it is difficult to confirm whether the statistics are closed with the parameter for a particular population. All these while, guideline that uses a p-value less than 0.05 is widely used as inferential evidence. Therefore, this study had audited results that were analyzed from various sub sample and statistical analyses and had compared the results with the parameters in three different populations. Eight types of statistical analysis and eight sub samples for each statistical analysis were analyzed. Results found that the statistics were consistent and were closed to the parameters when the sample study covered at least 15% to 35% of population. Larger sample size is needed to estimate parameter that involve with categorical variables compared with numerical variables. Sample sizes with 300 to 500 are sufficient to estimate the parameters for medium size of population.
Counihan, T.D.; Miller, Allen I.; Parsley, M.J.
1999-01-01
The development of recruitment monitoring programs for age-0 white sturgeons Acipenser transmontanus is complicated by the statistical properties of catch-per-unit-effort (CPUE) data. We found that age-0 CPUE distributions from bottom trawl surveys violated assumptions of statistical procedures based on normal probability theory. Further, no single data transformation uniformly satisfied these assumptions because CPUE distribution properties varied with the sample mean (??(CPUE)). Given these analytic problems, we propose that an additional index of age-0 white sturgeon relative abundance, the proportion of positive tows (Ep), be used to estimate sample sizes before conducting age-0 recruitment surveys and to evaluate statistical hypothesis tests comparing the relative abundance of age-0 white sturgeons among years. Monte Carlo simulations indicated that Ep was consistently more precise than ??(CPUE), and because Ep is binomially rather than normally distributed, surveys can be planned and analyzed without violating the assumptions of procedures based on normal probability theory. However, we show that Ep may underestimate changes in relative abundance at high levels and confound our ability to quantify responses to management actions if relative abundance is consistently high. If data suggest that most samples will contain age-0 white sturgeons, estimators of relative abundance other than Ep should be considered. Because Ep may also obscure correlations to climatic and hydrologic variables if high abundance levels are present in time series data, we recommend ??(CPUE) be used to describe relations to environmental variables. The use of both Ep and ??(CPUE) will facilitate the evaluation of hypothesis tests comparing relative abundance levels and correlations to variables affecting age-0 recruitment. Estimated sample sizes for surveys should therefore be based on detecting predetermined differences in Ep, but data necessary to calculate ??(CPUE) should also be collected.
The Statistics and Mathematics of High Dimension Low Sample Size Asymptotics.
Shen, Dan; Shen, Haipeng; Zhu, Hongtu; Marron, J S
2016-10-01
The aim of this paper is to establish several deep theoretical properties of principal component analysis for multiple-component spike covariance models. Our new results reveal an asymptotic conical structure in critical sample eigendirections under the spike models with distinguishable (or indistinguishable) eigenvalues, when the sample size and/or the number of variables (or dimension) tend to infinity. The consistency of the sample eigenvectors relative to their population counterparts is determined by the ratio between the dimension and the product of the sample size with the spike size. When this ratio converges to a nonzero constant, the sample eigenvector converges to a cone, with a certain angle to its corresponding population eigenvector. In the High Dimension, Low Sample Size case, the angle between the sample eigenvector and its population counterpart converges to a limiting distribution. Several generalizations of the multi-spike covariance models are also explored, and additional theoretical results are presented.
Teachers and Student Achievement in the Chicago Public High Schools. WP 2002-28. Revised
ERIC Educational Resources Information Center
Aaronson, Daniel; Barrow, Lisa; Sander, William
2003-01-01
Using unique administrative data on Chicago public high school students and their teachers, we are able to estimate the importance of teachers on student mathematical achievement. We find that teachers are educationally and statistically important. To be sure, sampling variation and other measurement issues can strongly influence estimates of…
Statistical variability and confidence intervals for planar dose QA pass rates
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, Daniel W.; Nelms, Benjamin E.; Attwood, Kristopher
Purpose: The most common metric for comparing measured to calculated dose, such as for pretreatment quality assurance of intensity-modulated photon fields, is a pass rate (%) generated using percent difference (%Diff), distance-to-agreement (DTA), or some combination of the two (e.g., gamma evaluation). For many dosimeters, the grid of analyzed points corresponds to an array with a low areal density of point detectors. In these cases, the pass rates for any given comparison criteria are not absolute but exhibit statistical variability that is a function, in part, on the detector sampling geometry. In this work, the authors analyze the statistics ofmore » various methods commonly used to calculate pass rates and propose methods for establishing confidence intervals for pass rates obtained with low-density arrays. Methods: Dose planes were acquired for 25 prostate and 79 head and neck intensity-modulated fields via diode array and electronic portal imaging device (EPID), and matching calculated dose planes were created via a commercial treatment planning system. Pass rates for each dose plane pair (both centered to the beam central axis) were calculated with several common comparison methods: %Diff/DTA composite analysis and gamma evaluation, using absolute dose comparison with both local and global normalization. Specialized software was designed to selectively sample the measured EPID response (very high data density) down to discrete points to simulate low-density measurements. The software was used to realign the simulated detector grid at many simulated positions with respect to the beam central axis, thereby altering the low-density sampled grid. Simulations were repeated with 100 positional iterations using a 1 detector/cm{sup 2} uniform grid, a 2 detector/cm{sup 2} uniform grid, and similar random detector grids. For each simulation, %/DTA composite pass rates were calculated with various %Diff/DTA criteria and for both local and global %Diff normalization techniques. Results: For the prostate and head/neck cases studied, the pass rates obtained with gamma analysis of high density dose planes were 2%-5% higher than respective %/DTA composite analysis on average (ranging as high as 11%), depending on tolerances and normalization. Meanwhile, the pass rates obtained via local normalization were 2%-12% lower than with global maximum normalization on average (ranging as high as 27%), depending on tolerances and calculation method. Repositioning of simulated low-density sampled grids leads to a distribution of possible pass rates for each measured/calculated dose plane pair. These distributions can be predicted using a binomial distribution in order to establish confidence intervals that depend largely on the sampling density and the observed pass rate (i.e., the degree of difference between measured and calculated dose). These results can be extended to apply to 3D arrays of detectors, as well. Conclusions: Dose plane QA analysis can be greatly affected by choice of calculation metric and user-defined parameters, and so all pass rates should be reported with a complete description of calculation method. Pass rates for low-density arrays are subject to statistical uncertainty (vs. the high-density pass rate), but these sampling errors can be modeled using statistical confidence intervals derived from the sampled pass rate and detector density. Thus, pass rates for low-density array measurements should be accompanied by a confidence interval indicating the uncertainty of each pass rate.« less
Sampling and Data Analysis for Environmental Microbiology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Murray, Christopher J.
2001-06-01
A brief review of the literature indicates the importance of statistical analysis in applied and environmental microbiology. Sampling designs are particularly important for successful studies, and it is highly recommended that researchers review their sampling design before heading to the laboratory or the field. Most statisticians have numerous stories of scientists who approached them after their study was complete only to have to tell them that the data they gathered could not be used to test the hypothesis they wanted to address. Once the data are gathered, a large and complex body of statistical techniques are available for analysis ofmore » the data. Those methods include both numerical and graphical techniques for exploratory characterization of the data. Hypothesis testing and analysis of variance (ANOVA) are techniques that can be used to compare the mean and variance of two or more groups of samples. Regression can be used to examine the relationships between sets of variables and is often used to examine the dependence of microbiological populations on microbiological parameters. Multivariate statistics provides several methods that can be used for interpretation of datasets with a large number of variables and to partition samples into similar groups, a task that is very common in taxonomy, but also has applications in other fields of microbiology. Geostatistics and other techniques have been used to examine the spatial distribution of microorganisms. The objectives of this chapter are to provide a brief survey of some of the statistical techniques that can be used for sample design and data analysis of microbiological data in environmental studies, and to provide some examples of their use from the literature.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ackermann, M.; Ajello, M.; Allafort, A.
We present a detailed statistical analysis of the correlation between radio and gamma-ray emission of the active galactic nuclei (AGNs) detected by Fermi during its first year of operation, with the largest data sets ever used for this purpose. We use both archival interferometric 8.4 GHz data (from the Very Large Array and ATCA, for the full sample of 599 sources) and concurrent single-dish 15 GHz measurements from the Owens Valley Radio Observatory (OVRO, for a sub sample of 199 objects). Our unprecedentedly large sample permits us to assess with high accuracy the statistical significance of the correlation, using amore » surrogate data method designed to simultaneously account for common-distance bias and the effect of a limited dynamical range in the observed quantities. We find that the statistical significance of a positive correlation between the centimeter radio and the broadband (E > 100 MeV) gamma-ray energy flux is very high for the whole AGN sample, with a probability of <10{sup -7} for the correlation appearing by chance. Using the OVRO data, we find that concurrent data improve the significance of the correlation from 1.6 x 10{sup -6} to 9.0 x 10{sup -8}. Our large sample size allows us to study the dependence of correlation strength and significance on specific source types and gamma-ray energy band. We find that the correlation is very significant (chance probability < 10{sup -7}) for both flat spectrum radio quasars and BL Lac objects separately; a dependence of the correlation strength on the considered gamma-ray energy band is also present, but additional data will be necessary to constrain its significance.« less
Ackermann, M.; Ajello, M.; Allafort, A.; ...
2011-10-12
We present a detailed statistical analysis of the correlation between radio and gamma-ray emission of the active galactic nuclei (AGNs) detected by Fermi during its first year of operation, with the largest data sets ever used for this purpose. We use both archival interferometric 8.4 GHz data (from the Very Large Array and ATCA, for the full sample of 599 sources) and concurrent single-dish 15 GHz measurements from the Owens Valley Radio Observatory (OVRO, for a sub sample of 199 objects). Our unprecedentedly large sample permits us to assess with high accuracy the statistical significance of the correlation, using amore » surrogate data method designed to simultaneously account for common-distance bias and the effect of a limited dynamical range in the observed quantities. We find that the statistical significance of a positive correlation between the centimeter radio and the broadband (E > 100 MeV) gamma-ray energy flux is very high for the whole AGN sample, with a probability of <10 –7 for the correlation appearing by chance. Using the OVRO data, we find that concurrent data improve the significance of the correlation from 1.6 × 10 –6 to 9.0 × 10 –8. Our large sample size allows us to study the dependence of correlation strength and significance on specific source types and gamma-ray energy band. As a result, we find that the correlation is very significant (chance probability < 10 –7) for both flat spectrum radio quasars and BL Lac objects separately; a dependence of the correlation strength on the considered gamma-ray energy band is also present, but additional data will be necessary to constrain its significance.« less
NASA Technical Reports Server (NTRS)
Ackermann, M.; Ajello, M.; Allafort, A.; Angelakis, E.; Axelsson, M.; Baldini, L.; Ballet, J.; Barbiellini, G.; Bastieri, D.; Bellazzini, R.;
2011-01-01
We present a detailed statistical analysis of the correlation between radio and gamma-ray emission of the active galactic nuclei (AGNs) detected by Fermi during its first year of operation, with the largest data sets ever used for this purpose.We use both archival interferometric 8.4 GHz data (from the Very Large Array and ATCA, for the full sample of 599 sources) and concurrent single-dish 15 GHz measurements from the OwensValley RadioObservatory (OVRO, for a sub sample of 199 objects). Our unprecedentedly large sample permits us to assess with high accuracy the statistical significance of the correlation, using a surrogate data method designed to simultaneously account for common-distance bias and the effect of a limited dynamical range in the observed quantities. We find that the statistical significance of a positive correlation between the centimeter radio and the broadband (E > 100 MeV) gamma-ray energy flux is very high for the whole AGN sample, with a probability of <10(exp -7) for the correlation appearing by chance. Using the OVRO data, we find that concurrent data improve the significance of the correlation from 1.6 10(exp -6) to 9.0 10(exp -8). Our large sample size allows us to study the dependence of correlation strength and significance on specific source types and gamma-ray energy band. We find that the correlation is very significant (chance probability < 10(exp -7)) for both flat spectrum radio quasars and BL Lac objects separately; a dependence of the correlation strength on the considered gamma-ray energy band is also present, but additional data will be necessary to constrain its significance.
Guo, Jing; Yuan, Yahong; Dou, Pei; Yue, Tianli
2017-10-01
Fifty-one kiwifruit juice samples of seven kiwifruit varieties from five regions in China were analyzed to determine their polyphenols contents and to trace fruit varieties and geographical origins by multivariate statistical analysis. Twenty-one polyphenols belonging to four compound classes were determined by ultra-high-performance liquid chromatography coupled with ultra-high-resolution TOF mass spectrometry. (-)-Epicatechin, (+)-catechin, procyanidin B1 and caffeic acid derivatives were the predominant phenolic compounds in the juices. Principal component analysis (PCA) allowed a clear separation of the juices according to kiwifruit varieties. Stepwise linear discriminant analysis (SLDA) yielded satisfactory categorization of samples, provided 100% success rate according to kiwifruit varieties and 92.2% success rate according to geographical origins. The result showed that polyphenolic profiles of kiwifruit juices contain enough information to trace fruit varieties and geographical origins. Copyright © 2017 Elsevier Ltd. All rights reserved.
Statistical power comparisons at 3T and 7T with a GO / NOGO task.
Torrisi, Salvatore; Chen, Gang; Glen, Daniel; Bandettini, Peter A; Baker, Chris I; Reynolds, Richard; Yen-Ting Liu, Jeffrey; Leshin, Joseph; Balderston, Nicholas; Grillon, Christian; Ernst, Monique
2018-07-15
The field of cognitive neuroscience is weighing evidence about whether to move from standard field strength to ultra-high field (UHF). The present study contributes to the evidence by comparing a cognitive neuroscience paradigm at 3 Tesla (3T) and 7 Tesla (7T). The goal was to test and demonstrate the practical effects of field strength on a standard GO/NOGO task using accessible preprocessing and analysis tools. Two independent matched healthy samples (N = 31 each) were analyzed at 3T and 7T. Results show gains at 7T in statistical strength, the detection of smaller effects and group-level power. With an increased availability of UHF scanners, these gains may be exploited by cognitive neuroscientists and other neuroimaging researchers to develop more efficient or comprehensive experimental designs and, given the same sample size, achieve greater statistical power at 7T. Published by Elsevier Inc.
NASA Technical Reports Server (NTRS)
Melbourne, William G.
1986-01-01
In double differencing a regression system obtained from concurrent Global Positioning System (GPS) observation sequences, one either undersamples the system to avoid introducing colored measurement statistics, or one fully samples the system incurring the resulting non-diagonal covariance matrix for the differenced measurement errors. A suboptimal estimation result will be obtained in the undersampling case and will also be obtained in the fully sampled case unless the color noise statistics are taken into account. The latter approach requires a least squares weighting matrix derived from inversion of a non-diagonal covariance matrix for the differenced measurement errors instead of inversion of the customary diagonal one associated with white noise processes. Presented is the so-called fully redundant double differencing algorithm for generating a weighted double differenced regression system that yields equivalent estimation results, but features for certain cases a diagonal weighting matrix even though the differenced measurement error statistics are highly colored.
Comparison of statistical sampling methods with ScannerBit, the GAMBIT scanning module
NASA Astrophysics Data System (ADS)
Martinez, Gregory D.; McKay, James; Farmer, Ben; Scott, Pat; Roebber, Elinore; Putze, Antje; Conrad, Jan
2017-11-01
We introduce ScannerBit, the statistics and sampling module of the public, open-source global fitting framework GAMBIT. ScannerBit provides a standardised interface to different sampling algorithms, enabling the use and comparison of multiple computational methods for inferring profile likelihoods, Bayesian posteriors, and other statistical quantities. The current version offers random, grid, raster, nested sampling, differential evolution, Markov Chain Monte Carlo (MCMC) and ensemble Monte Carlo samplers. We also announce the release of a new standalone differential evolution sampler, Diver, and describe its design, usage and interface to ScannerBit. We subject Diver and three other samplers (the nested sampler MultiNest, the MCMC GreAT, and the native ScannerBit implementation of the ensemble Monte Carlo algorithm T-Walk) to a battery of statistical tests. For this we use a realistic physical likelihood function, based on the scalar singlet model of dark matter. We examine the performance of each sampler as a function of its adjustable settings, and the dimensionality of the sampling problem. We evaluate performance on four metrics: optimality of the best fit found, completeness in exploring the best-fit region, number of likelihood evaluations, and total runtime. For Bayesian posterior estimation at high resolution, T-Walk provides the most accurate and timely mapping of the full parameter space. For profile likelihood analysis in less than about ten dimensions, we find that Diver and MultiNest score similarly in terms of best fit and speed, outperforming GreAT and T-Walk; in ten or more dimensions, Diver substantially outperforms the other three samplers on all metrics.
NASA Astrophysics Data System (ADS)
ten Veldhuis, Marie-Claire; Schleiss, Marc
2017-04-01
Urban catchments are typically characterised by a more flashy nature of the hydrological response compared to natural catchments. Predicting flow changes associated with urbanisation is not straightforward, as they are influenced by interactions between impervious cover, basin size, drainage connectivity and stormwater management infrastructure. In this study, we present an alternative approach to statistical analysis of hydrological response variability and basin flashiness, based on the distribution of inter-amount times. We analyse inter-amount time distributions of high-resolution streamflow time series for 17 (semi-)urbanised basins in North Carolina, USA, ranging from 13 to 238 km2 in size. We show that in the inter-amount-time framework, sampling frequency is tuned to the local variability of the flow pattern, resulting in a different representation and weighting of high and low flow periods in the statistical distribution. This leads to important differences in the way the distribution quantiles, mean, coefficient of variation and skewness vary across scales and results in lower mean intermittency and improved scaling. Moreover, we show that inter-amount-time distributions can be used to detect regulation effects on flow patterns, identify critical sampling scales and characterise flashiness of hydrological response. The possibility to use both the classical approach and the inter-amount-time framework to identify minimum observable scales and analyse flow data opens up interesting areas for future research.
Design-based Sample and Probability Law-Assumed Sample: Their Role in Scientific Investigation.
ERIC Educational Resources Information Center
Ojeda, Mario Miguel; Sahai, Hardeo
2002-01-01
Discusses some key statistical concepts in probabilistic and non-probabilistic sampling to provide an overview for understanding the inference process. Suggests a statistical model constituting the basis of statistical inference and provides a brief review of the finite population descriptive inference and a quota sampling inferential theory.…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
This report provides comparative financial information derived from a national sample of 503 public two-year colleges. The report includes space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the national sample; and statistics presented in various formats, including tables,…
Gene coexpression measures in large heterogeneous samples using count statistics.
Wang, Y X Rachel; Waterman, Michael S; Huang, Haiyan
2014-11-18
With the advent of high-throughput technologies making large-scale gene expression data readily available, developing appropriate computational tools to process these data and distill insights into systems biology has been an important part of the "big data" challenge. Gene coexpression is one of the earliest techniques developed that is still widely in use for functional annotation, pathway analysis, and, most importantly, the reconstruction of gene regulatory networks, based on gene expression data. However, most coexpression measures do not specifically account for local features in expression profiles. For example, it is very likely that the patterns of gene association may change or only exist in a subset of the samples, especially when the samples are pooled from a range of experiments. We propose two new gene coexpression statistics based on counting local patterns of gene expression ranks to take into account the potentially diverse nature of gene interactions. In particular, one of our statistics is designed for time-course data with local dependence structures, such as time series coupled over a subregion of the time domain. We provide asymptotic analysis of their distributions and power, and evaluate their performance against a wide range of existing coexpression measures on simulated and real data. Our new statistics are fast to compute, robust against outliers, and show comparable and often better general performance.
Schoolchildren with Learning Difficulties Have Low Iron Status and High Anemia Prevalence
Arcanjo, C. P. C.; Santos, P. R.
2016-01-01
Background. In developing countries there is high prevalence of iron deficiency anemia, which reduces cognitive performance, work performance, and endurance; it also causes learning difficulties and negative impact on development for infant population. Methods. The study concerns a case-control study; data was collected from an appropriate sample consisting of schoolchildren aged 8 years. The sample was divided into two subgroups: those with deficient initial reading skills (DIRS) (case) and those without (control). Blood samples were taken to analyze hemoglobin and serum ferritin levels. These results were then used to compare the two groups with Student's t-test. Association between DIRS and anemia was analyzed using odds ratio (OR). Results. Hemoglobin and serum ferritin levels of schoolchildren with DIRS were statistically lower when compared to those without, hemoglobin p = 0.02 and serum ferritin p = 0.04. DIRS was statistically associated with a risk of anemia with a weighted OR of 1.62. Conclusions. In this study, schoolchildren with DIRS had lower hemoglobin and serum ferritin levels when compared to those without. PMID:27703806
Schoolchildren with Learning Difficulties Have Low Iron Status and High Anemia Prevalence.
Arcanjo, F P N; Arcanjo, C P C; Santos, P R
2016-01-01
Background . In developing countries there is high prevalence of iron deficiency anemia, which reduces cognitive performance, work performance, and endurance; it also causes learning difficulties and negative impact on development for infant population. Methods . The study concerns a case-control study; data was collected from an appropriate sample consisting of schoolchildren aged 8 years. The sample was divided into two subgroups: those with deficient initial reading skills (DIRS) (case) and those without (control). Blood samples were taken to analyze hemoglobin and serum ferritin levels. These results were then used to compare the two groups with Student's t -test. Association between DIRS and anemia was analyzed using odds ratio (OR). Results . Hemoglobin and serum ferritin levels of schoolchildren with DIRS were statistically lower when compared to those without, hemoglobin p = 0.02 and serum ferritin p = 0.04. DIRS was statistically associated with a risk of anemia with a weighted OR of 1.62. Conclusions . In this study, schoolchildren with DIRS had lower hemoglobin and serum ferritin levels when compared to those without.
Acidity of fine sulfate particles at Great Smokey Mountains National Park
DOE Office of Scientific and Technical Information (OSTI.GOV)
Day, D.; Malm, W.C.; Kreidenweis, S.
1995-12-31
The acidity of ambient particles is of interest from the perspectives of human health, visibility, and ecology. This paper reports on the acidity of fine (< 2.5{mu}m) particles measured during August 1994 at Look Rock observation tower in Great Smokey Mountains National Park. This site is located at latitude 35{degrees} 37 feet 56 inches, longitude 83{degrees} 56 feet 32 inches, and at an elevation of 808m above sea level. All samples were collected using the IMPROVE (Interagency Monitoring of Protected Visual Environments) sampler. The sampling periods included: (1) 4-hour samples collected three times daily with starting times of 8:00 AM,more » 12:00 noon, and 4:00 PM; (2) 12-hour samples collected twice daily with starting times of 8:00 AM and 8:00 PM (all times reported are eastern daylight savings time). The IMPROVE sampler, collecting 4-hour samples, employed a citric acid/glycerol coated annular denuder to remove ammonia gas while the 12-hour sampler did not use a citric acid denuder. The intensive monitoring effort, conducted during August 1994, showed that: (1) the fine aerosol mass is generally dominated by sulfate and its associated water; (2) there was no statistically significant difference in average sulfate concentration between the 12-hour samples nor was there a statistically significant difference in average sulfate concentration between the 4-hour samples; (3) the aerosol is highly acidic, ranging from almost pure sulfuric acid to pure ammonium bisulfate, with an average molar ammonium ion to sulfate ratio of about 0.75 which suggests the ambient sulfate aerosol was a mixture of ammonium bisulfate and sulfuric acid; and (4) there was no statistically significant diurnal variation in particle acidity nor was there a statistically significant difference in particle acidity between the 4 hour samples.« less
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Compressive sampling of polynomial chaos expansions: Convergence analysis and sampling strategies
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hampton, Jerrad; Doostan, Alireza, E-mail: alireza.doostan@colorado.edu
2015-01-01
Sampling orthogonal polynomial bases via Monte Carlo is of interest for uncertainty quantification of models with random inputs, using Polynomial Chaos (PC) expansions. It is known that bounding a probabilistic parameter, referred to as coherence, yields a bound on the number of samples necessary to identify coefficients in a sparse PC expansion via solution to an ℓ{sub 1}-minimization problem. Utilizing results for orthogonal polynomials, we bound the coherence parameter for polynomials of Hermite and Legendre type under their respective natural sampling distribution. In both polynomial bases we identify an importance sampling distribution which yields a bound with weaker dependence onmore » the order of the approximation. For more general orthonormal bases, we propose the coherence-optimal sampling: a Markov Chain Monte Carlo sampling, which directly uses the basis functions under consideration to achieve a statistical optimality among all sampling schemes with identical support. We demonstrate these different sampling strategies numerically in both high-order and high-dimensional, manufactured PC expansions. In addition, the quality of each sampling method is compared in the identification of solutions to two differential equations, one with a high-dimensional random input and the other with a high-order PC expansion. In both cases, the coherence-optimal sampling scheme leads to similar or considerably improved accuracy.« less
Nazir, Nausheen; Jan, Muhammad Rasul; Ali, Amjad; Asif, Muhammad; Idrees, Muhammad; Nisar, Mohammad; Zahoor, Muhammad; Abd El-Salam, Naser M
2017-08-22
Hepatitis C virus (HCV) is a leading cause of chronic liver disease and frequently progresses towards liver cirrhosis and Hepatocellular Carcinoma (HCC). This study aimed to determine the prevalence of HCV genotypes and their association with possible transmission risks in the general population of Malakand Division. Sum of 570 serum samples were collected during March 2011 to January 2012 from suspected patients visited to different hospitals of Malakand. The suspected sera were tested using qualitative PCR and were then subjected to molecular genotype specific assay. Quantitative PCR was also performed for determination of pre-treatment viral load in confirmed positive patients. Out of 570 serum samples 316 sera were seen positive while 254 sera were found negative using qualitative PCR. The positive samples were then subjected to genotyping assay out of 316, type-specific PCR fragments were seen in 271 sera while 45 samples were found untypable genotypes. Genotype 3a was seen as a predominant genotype (63.3%) with a standard error of ±2.7%. Cramer's V statistic and Liklihood-Ratio statistical procedures are used to measure the strength and to test the association, respectively, between the dependent variable, genotype, and explanatory variables (e.g. gender, risk, age and area/districts). The dependent variable, genotype, is observed statistically significant association with variable risk factors. This implies that the genotype is highly dependent on how the patient was infected. In contrast, the other covariates, for example, gender, age, and district (area) no statistical significant association are observed. The association between gender-age indicates that the mean age of female was older by 10.5 ± 2.3 years with 95% confidence level using t-statistic. It was concluded from the present study that the predominant genotype was 3a in the infected population of Malakand. This study also highlights the high prevalence rate of untypable genotypes which an important issue of health care setup in Malakand and create complications in therapy of infected patients. Major mode of HCV transmission is multiple uses and re-uses of needles/injections. ISRCTN ISRCTN73824458. Registered: 28 September 2014.
Smooth quantile normalization.
Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada
2018-04-01
Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
ERIC Educational Resources Information Center
Seo, Dong Gi; Hao, Shiqi
2016-01-01
Differential item/test functioning (DIF/DTF) are routine procedures to detect item/test unfairness as an explanation for group performance difference. However, unequal sample sizes and small sample sizes have an impact on the statistical power of the DIF/DTF detection procedures. Furthermore, DIF/DTF cannot be used for two test forms without…
Ryberg, Karen R.; Hiemenz, Gregory
2009-01-01
The Bureau of Reclamation collected water-quality samples at 16 sites on the James River and the Arrowwood National Wildlife Refuge, N. Dak., as part of its refuge-monitoring program from 1987-93 and as part of an environmental impact statement commitment from 1999-2004. Climatic and hydrologic conditions varied greatly during both sampling periods. The first period was dominated by drought conditions, which abruptly changed to cooler and wetter conditions in 1992-93. During the second period, conditions were near normal to very wet and included higher inflow from the James River into the refuge. The two periods also differed in the sites sampled, seasons sampled, and properties and constituent concentrations measured. Summary statistics were reported separately for the two sampling periods for all physical properties and constituents. Nonparametric statistical tests were used to further analyze some of the water-quality data. During the first sampling period, 1987-93, specific conductance, turbidity, hardness, alkalinity, total dissolved solids, total suspended solids, nonvolatile suspended solids, calcium, magnesium, sodium, potassium, sulfate, chloride, phosphate, total phosphorus, total organic carbon, chlorophyll a, and arsenic were determined to have significantly different medians among the sites tested. During the second sampling period, 1999-2004, the medians of pH, sodium, chloride, barium, and boron varied significantly among sites. Sites sampled and period of record varied between the two sampling periods and the period of record varied among the sites. Also, some constituents analyzed during the first period (1987-93) were not analyzed during the second period (1999-2004), and winter sampling was done during the second sampling period only. This variability reduces the number of direct comparisons that can be made between the two periods. Three sites had complete periods of record for both sampling periods and were compared. Differences in variability and median concentration were identified between the two time periods. Sites representing inflow to the refuge and outflow were compared statistically for the period when data were available for both sites, 1999-2004. Of the nutrients tested - ammonia plus organic nitrogen, phosphate, and total phosphorus - no significant statistical differences were found between the inflow samples and the outflow samples. Statistically significant differences were found for pH, sulfate, chloride, barium, and manganese. Nutrients are of particular interest in the refuge because of the aquatic plant and animal life and the use of the wetland resources by waterfowl. However, the nutrient data were highly censored and there were differences in the seasonal timing of sample collection between the two sampling periods. Therefore, the nutrient data were examined graphically with stripplots that highlighted differences in the seasonal timing of sample collection and concentration differences likely related to the differences in climatic and hydrologic conditions between the two periods.
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hero, Alfred O.; Rajaratnam, Bala
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-01-01
When can reliable inference be drawn in fue “Big Data” context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for “Big Data”. Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks. PMID:27087700
Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining
Hero, Alfred O.; Rajaratnam, Bala
2015-12-09
When can reliable inference be drawn in the ‘‘Big Data’’ context? This article presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large-scale inference. In large-scale data applications like genomics, connectomics, and eco-informatics, the data set is often variable rich but sample starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than the number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for ‘‘Big Data.’’ Sample complexity, however, hasmore » received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; and 3) the purely high-dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high-dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. We demonstrate various regimes of correlation mining based on the unifying perspective of high-dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.« less
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...
Code of Federal Regulations, 2011 CFR
2011-01-01
... 7 Agriculture 2 2011-01-01 2011-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for on-line inspection by attributes of processed fruits and vegetables. 52.38b Section 52.38b Agriculture Regulations of... Regulations Governing Inspection and Certification Sampling § 52.38b Statistical sampling procedures for on...
Code of Federal Regulations, 2010 CFR
2010-01-01
... 7 Agriculture 2 2010-01-01 2010-01-01 false Statistical sampling procedures for lot inspection of processed fruits and vegetables by attributes. 52.38c Section 52.38c Agriculture Regulations of the... Regulations Governing Inspection and Certification Sampling § 52.38c Statistical sampling procedures for lot...
75 FR 38871 - Proposed Collection; Comment Request for Revenue Procedure 2004-29
Federal Register 2010, 2011, 2012, 2013, 2014
2010-07-06
... comments concerning Revenue Procedure 2004-29, Statistical Sampling in Sec. 274 Context. DATES: Written... Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling in Sec...: Revenue Procedure 2004-29 prescribes the statistical sampling methodology by which taxpayers under...
ERIC Educational Resources Information Center
Tyler, John; White, Susan
2014-01-01
During the 2012-13 academic year, the Statistic Research Center (SRC) collected data from a representative national sample of over 3,500 public and private high schools across the U.S. to inquire about physics availabilities and offerings. This report describes their findings. SRC takes two different approaches to describe the characteristics of…
Preparation of Protein Samples for NMR Structure, Function, and Small Molecule Screening Studies
Acton, Thomas B.; Xiao, Rong; Anderson, Stephen; Aramini, James; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Kornhaber, Gregory; Lau, Jessica; Lee, Dong Yup; Liu, Gaohua; Maglaqui, Melissa; Ma, Lichung; Mao, Lei; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Shastry, Ritu; Swapna, G.V.T.; Tang, Yeufeng; Tong, Saichiu; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.
2014-01-01
In this chapter, we concentrate on the production of high quality protein samples for NMR studies. In particular, we provide an in-depth description of recent advances in the production of NMR samples and their synergistic use with recent advancements in NMR hardware. We describe the protein production platform of the Northeast Structural Genomics Consortium, and outline our high-throughput strategies for producing high quality protein samples for nuclear magnetic resonance (NMR) studies. Our strategy is based on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems and isotope enrichment in minimal media. We describe 96-well ligation-independent cloning and analytical expression systems, parallel preparative scale fermentation, and high-throughput purification protocols. The 6X-His affinity tag allows for a similar two-step purification procedure implemented in a parallel high-throughput fashion that routinely results in purity levels sufficient for NMR studies (> 97% homogeneity). Using this platform, the protein open reading frames of over 17,500 different targeted proteins (or domains) have been cloned as over 28,000 constructs. Nearly 5,000 of these proteins have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html), resulting in more than 950 new protein structures, including more than 400 NMR structures, deposited in the Protein Data Bank. The Northeast Structural Genomics Consortium pipeline has been effective in producing protein samples of both prokaryotic and eukaryotic origin. Although this paper describes our entire pipeline for producing isotope-enriched protein samples, it focuses on the major updates introduced during the last 5 years (Phase 2 of the National Institute of General Medical Sciences Protein Structure Initiative). Our advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are suitable for implementation in a large individual laboratory or by a small group of collaborating investigators for structural biology, functional proteomics, ligand screening and structural genomics research. PMID:21371586
Time Series Analysis Based on Running Mann Whitney Z Statistics
USDA-ARS?s Scientific Manuscript database
A sensitive and objective time series analysis method based on the calculation of Mann Whitney U statistics is described. This method samples data rankings over moving time windows, converts those samples to Mann-Whitney U statistics, and then normalizes the U statistics to Z statistics using Monte-...
Hansen, John P
2003-01-01
Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
Detecting cell death with optical coherence tomography and envelope statistics
NASA Astrophysics Data System (ADS)
Farhat, Golnaz; Yang, Victor X. D.; Czarnota, Gregory J.; Kolios, Michael C.
2011-02-01
Currently no standard clinical or preclinical noninvasive method exists to monitor cell death based on morphological changes at the cellular level. In our past work we have demonstrated that quantitative high frequency ultrasound imaging can detect cell death in vitro and in vivo. In this study we apply quantitative methods previously used with high frequency ultrasound to optical coherence tomography (OCT) to detect cell death. The ultimate goal of this work is to use these methods for optically-based clinical and preclinical cancer treatment monitoring. Optical coherence tomography data were acquired from acute myeloid leukemia cells undergoing three modes of cell death. Significant increases in integrated backscatter were observed for cells undergoing apoptosis and mitotic arrest, while necrotic cells induced a decrease. These changes appear to be linked to structural changes observed in histology obtained from the cell samples. Signal envelope statistics were analyzed from fittings of the generalized gamma distribution to histograms of envelope intensities. The parameters from this distribution demonstrated sensitivities to morphological changes in the cell samples. These results indicate that OCT integrated backscatter and first order envelope statistics can be used to detect and potentially differentiate between modes of cell death in vitro.
Liem, Franziskus; Mérillat, Susan; Bezzola, Ladina; Hirsiger, Sarah; Philipp, Michel; Madhyastha, Tara; Jäncke, Lutz
2015-03-01
FreeSurfer is a tool to quantify cortical and subcortical brain anatomy automatically and noninvasively. Previous studies have reported reliability and statistical power analyses in relatively small samples or only selected one aspect of brain anatomy. Here, we investigated reliability and statistical power of cortical thickness, surface area, volume, and the volume of subcortical structures in a large sample (N=189) of healthy elderly subjects (64+ years). Reliability (intraclass correlation coefficient) of cortical and subcortical parameters is generally high (cortical: ICCs>0.87, subcortical: ICCs>0.95). Surface-based smoothing increases reliability of cortical thickness maps, while it decreases reliability of cortical surface area and volume. Nevertheless, statistical power of all measures benefits from smoothing. When aiming to detect a 10% difference between groups, the number of subjects required to test effects with sufficient power over the entire cortex varies between cortical measures (cortical thickness: N=39, surface area: N=21, volume: N=81; 10mm smoothing, power=0.8, α=0.05). For subcortical regions this number is between 16 and 76 subjects, depending on the region. We also demonstrate the advantage of within-subject designs over between-subject designs. Furthermore, we publicly provide a tool that allows researchers to perform a priori power analysis and sensitivity analysis to help evaluate previously published studies and to design future studies with sufficient statistical power. Copyright © 2014 Elsevier Inc. All rights reserved.
Felmlee, J.K.; Cadigan, R.A.
1982-01-01
Multivariate statistical analyses were performed on data from 156 mineral-spring sites in nine Western States to analyze relationships among the various parameters measured in the spring waters. Correlation analysis and R-mode factor analysis indicate that three major factors affect water composition in the spring systems studied: (1) duration of water circulation, (2) depth of water circulation, and (3) partial pressure of carbon dioxide. An examination of factor scores indicates that several types of hydrogeologic systems were sampled. Most of the samples are (1) older water from deeper circulating systems having relatively high salinity, high temperature, and low Eh or (2) younger water from shallower circulating systems having relatively low salinity, low temperature, and high Eh. The rest of the samples are from more complex systems. Any of the systems can have a relatively high or low content of dissolved carbonate species, resulting in a low or high pH, respectively. Uranium concentrations are commonly higher in waters of relatively low temperature and high Eh, and radium concentrations are commonly higher in waters having a relatively high carbonate content (low pH) and, secondarily, relatively high salinity. Water samples were collected and (or) measurements were taken at 156 of the 171 mineral-spring sites visited. Various samples were analyzed for radium, uranium, radon, helium, and radium-228 as well as major ions and numerous trace elements. On-site measurements for physical properties including temperature, specific conductance, pH, Eh, and dissolved oxygen were made. All constituents and properties show a wide range of values. Radium concentrations range from less than 0.01 to 300 picocuries per liter; they average 1.48 picocuries per liter and have an anomaly threshold value of 171 picocuries per liter for the samples studied. Uranium concentrations range from less than 0.01 to 120 micrograms per liter and average 0.26 micrograms per liter; they have an anomaly threshold value of 48.1 micrograms per liter. Radon content ranges from less than 10 to 110,000 picocuries per liter, averages 549 picocuries per liter and has an anomaly threshold of 20,400 picocuries per liter. Helium content ranges from -1,300 to +13,000 parts per billion relative to atmospheric helium; it averages +725 parts per billion and has an anomaly threshold of 10,000 parts per billion. Radium-228 concentrations range from less than 2.0 to 33 picocuries per liter; no anomaly threshold was determined owing to the small number of samples. All of the anomaly thresholds may be somewhat high because the sampling was biased toward springs likely to be radioactive. The statistical variance in radium and uranium concentrations unaccounted for by the identified factors testifies to the complexity of some hydrogeologic systems. Unidentified factors related to geologic setting and the presence of uranium-rich rocks in the systems also affect the observed concentrations of the radioactive elements in the water. The association of anomalous radioactivity in several springs with nearby known uranium occurrences indicates that other springs having anomalous radioactivity may also be associated with uranium occurrences as yet undiscovered.
Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai
2014-11-10
Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.
Needs of the Learning Effect on Instructional Website for Vocational High School Students
ERIC Educational Resources Information Center
Lo, Hung-Jen; Fu, Gwo-Liang; Chuang, Kuei-Chih
2013-01-01
The purpose of study was to understand the correlation between the needs of the learning effect on instructional website for the vocational high school students. Our research applied the statistic methods of product-moment correlation, stepwise regression, and structural equation method to analyze the questionnaire with the sample size of 377…
75 FR 53738 - Proposed Collection; Comment Request for Rev. Proc. 2007-35
Federal Register 2010, 2011, 2012, 2013, 2014
2010-09-01
... Revenue Procedure Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES... through the Internet, at [email protected] . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling...: This revenue procedure provides for determining when statistical sampling may be used in purposes of...
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions
NASA Astrophysics Data System (ADS)
Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping
2016-10-01
Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.
Sample size and power considerations in network meta-analysis
2012-01-01
Background Network meta-analysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network meta-analysis inherits all methodological challenges of standard pairwise meta-analysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise meta-analysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network meta-analysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network. Findings In this article, we develop easy-to-use flexible methods for estimating the ‘effective sample size’ in indirect comparison meta-analysis and network meta-analysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise meta-analysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network meta-analysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network meta-analysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network meta-analysis on interventions for smoking cessation including over 100 trials. Conclusion The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates. PMID:22992327
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cucchiara, A.; Prochaska, J. X.; Zhu, G.
2013-08-20
In 2006, Prochter et al. reported a statistically significant enhancement of very strong Mg II absorption systems intervening the sight lines to gamma-ray bursts (GRBs) relative to the incidence of such absorption along quasar sight lines. This counterintuitive result has inspired a diverse set of astrophysical explanations (e.g., dust, gravitational lensing) but none of these has obviously resolved the puzzle. Using the largest set of GRB afterglow spectra available, we reexamine the purported enhancement. In an independent sample of GRB spectra with a survey path three times larger than Prochter et al., we measure the incidence per unit redshift ofmore » {>=}1 A rest-frame equivalent width Mg II absorbers at z Almost-Equal-To 1 to be l(z) = 0.18 {+-} 0.06. This is fully consistent with current estimates for the incidence of such absorbers along quasar sight lines. Therefore, we do not confirm the original enhancement and suggest those results suffered from a statistical fluke. Signatures of the original result do remain in our full sample (l(z) shows an Almost-Equal-To 1.5 enhancement over l(z){sub QSO}), but the statistical significance now lies at Almost-Equal-To 90% c.l. Restricting our analysis to the subset of high-resolution spectra of GRB afterglows (which overlaps substantially with Prochter et al.), we still reproduce a statistically significant enhancement of Mg II absorption. The reason for this excess, if real, is still unclear since there is no connection between the rapid afterglow follow-up process with echelle (or echellette) spectrographs and the detectability of strong Mg II doublets. Only a larger sample of such high-resolution data will shed some light on this matter.« less
Perugini, Monia; Visciano, Pierina; Manera, Maurizio; Abete, Maria Cesarina; Gavinelli, Stefania; Amorena, Michele
2013-11-01
The aim of this study was to evaluate mercury and selenium distribution in different portions (exoskeleton, white meat and brown meat) of Norway lobster (Nephrops norvegicus). Some samples were also analysed as whole specimens. The same portions were also examined after boiling, in order to observe if this cooking practice could affect mercury and selenium concentrations. The highest mercury concentrations were detected in white meat, exceeding in all cases the maximum levels established by European legislation. The brown meat reported the highest selenium concentrations. In all boiled samples, mercury levels showed a statistically significant increase compared to raw portions. On the contrary, selenium concentrations detected in boiled samples of white meat, brown meat and whole specimen showed a statistically significant decrease compared to the corresponding raw samples. These results indicate that boiling modifies mercury and selenium concentrations. The high mercury levels detected represent a possible risk for consumers, and the publication and diffusion of specific advisories concerning seafood consumption is recommended.
Efficient bootstrap estimates for tail statistics
NASA Astrophysics Data System (ADS)
Breivik, Øyvind; Aarnes, Ole Johan
2017-03-01
Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.
Ten-Doménech, Isabel; Beltrán-Iturat, Eduardo; Herrero-Martínez, José Manuel; Sancho-Llopis, Juan Vicente; Simó-Alfonso, Ernesto Francisco
2015-06-24
In this work, a method for the separation of triacylglycerols (TAGs) present in human milk and from other mammalian species by reversed-phase high-performance liquid chromatography using a core-shell particle packed column with UV and evaporative light-scattering detectors is described. Under optimal conditions, a mobile phase containing acetonitrile/n-pentanol at 10 °C gave an excellent resolution among more than 50 TAG peaks. A small-scale method for fat extraction in these milks (particularly of interest for human milk samples) using minimal amounts of sample and reagents was also developed. The proposed extraction protocol and the traditional method were compared, giving similar results, with respect to the total fat and relative TAG contents. Finally, a statistical study based on linear discriminant analysis on the TAG composition of different types of milks (human, cow, sheep, and goat) was carried out to differentiate the samples according to their mammalian origin.
Hagell, Peter; Westergren, Albert
Sample size is a major factor in statistical null hypothesis testing, which is the basis for many approaches to testing Rasch model fit. Few sample size recommendations for testing fit to the Rasch model concern the Rasch Unidimensional Measurement Models (RUMM) software, which features chi-square and ANOVA/F-ratio based fit statistics, including Bonferroni and algebraic sample size adjustments. This paper explores the occurrence of Type I errors with RUMM fit statistics, and the effects of algebraic sample size adjustments. Data with simulated Rasch model fitting 25-item dichotomous scales and sample sizes ranging from N = 50 to N = 2500 were analysed with and without algebraically adjusted sample sizes. Results suggest the occurrence of Type I errors with N less then or equal to 500, and that Bonferroni correction as well as downward algebraic sample size adjustment are useful to avoid such errors, whereas upward adjustment of smaller samples falsely signal misfit. Our observations suggest that sample sizes around N = 250 to N = 500 may provide a good balance for the statistical interpretation of the RUMM fit statistics studied here with respect to Type I errors and under the assumption of Rasch model fit within the examined frame of reference (i.e., about 25 item parameters well targeted to the sample).
78 FR 43002 - Proposed Collection; Comment Request for Revenue Procedure 2004-29
Federal Register 2010, 2011, 2012, 2013, 2014
2013-07-18
... comments concerning statistical sampling in Sec. 274 Context. DATES: Written comments should be received on... INFORMATION: Title: Statistical Sampling in Sec. 274 Contest. OMB Number: 1545-1847. Revenue Procedure Number: Revenue Procedure 2004-29. Abstract: Revenue Procedure 2004-29 prescribes the statistical sampling...
42 CFR 1003.133 - Statistical sampling.
Code of Federal Regulations, 2014 CFR
2014-10-01
... 42 Public Health 5 2014-10-01 2014-10-01 false Statistical sampling. 1003.133 Section 1003.133 Public Health OFFICE OF INSPECTOR GENERAL-HEALTH CARE, DEPARTMENT OF HEALTH AND HUMAN SERVICES OIG AUTHORITIES CIVIL MONEY PENALTIES, ASSESSMENTS AND EXCLUSIONS § 1003.133 Statistical sampling. (a) In meeting...
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.
Tong, Xiaoxiao; Bentler, Peter M
2013-01-01
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses
Liu, Ruijie; Holik, Aliaksei Z.; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E.; Asselin-Labat, Marie-Liesse; Smyth, Gordon K.; Ritchie, Matthew E.
2015-01-01
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package. PMID:25925576
Caballero Morales, Santiago Omar
2013-01-01
The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC. PMID:23527082
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics
NASA Technical Reports Server (NTRS)
Pohorille, Andrew
2006-01-01
The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
NASA Astrophysics Data System (ADS)
Deperas-Standylo, Joanna; Lee, Ryonfa; Nasonova, Elena; Ritter, Sylvia; Gudowska-Nowak, Ewa; Kac, M.; Smoluchowski, M.
Differences in the track structure of high LET (Linear Energy Transfer) particles are clearly visible on chromosomal level, in particular in the number of lesions produced by an ion traversal through a cell nucleus and in the distribution of aberrations among the cells. In the present study we focus on the effects of low energy C-and Cr-ions (<10 MeV/u) in comparison with high energy C-ions (90 MeV/u). For the experiments human lymphocytes were exposed to 9.5 MeV/u C-ions, 4.1 MeV/u Cr-ions or 90 MeV/u C-ions with LET values of 175 keV/µm, 3160 keV/µm and 29 keV/µm, respectively. Chromosome aberrations were measured at several post-irradiation sampling times (48, 60, 72 and 84h) in first cycle metaphases following Giemsa-staining. For 90 MeV/u C-ions, where the track radius is larger than the cell nucleus, the distribution of aberrations did not change significantly with sampling time and has been well described by Poisson statistics. In contrast, for low energy C-ions, where the track radius is smaller than the cell nucleus, distribution of aberration strongly deviates from uni-modal and displays two peaks representative for subpopulations of non-hit and hit cells, respectively. Following this pattern, also damage-dependent cell cycle delay was observed. At 48 h after irradiation a high number of undamaged and probably unhit cells was found to reach mitosis. This number of undamaged cells decreased further with sampling time, while the frequencies of cells carrying aberrations (1-11 per cell) were increasing. All distributions were found to conform a compound Poisson (Neyman-type A) statistics which allows estimating the average number of particle traversals through a cell nucleus and the average number of aberrations induced by one particle traversal. Similar response has also been observed at 48h after Cr-ion exposure. In this case, however, non-aberrant cells have been found to dominate in the population even at later sampling times and a low number of heavily damaged cells up to 24 aberrations have been detected. Accordingly, the distribution of aberrations in cells collected at >48 h could not be then described by a standard Neyman statistics. Obtained results suggest that most cells hit by more than one Cr-ion do not reach mitosis. This observation was confirmed by parallel measurements showing that Cr-ion exposure produces a high fraction of apoptotic cells.
Orton, Dennis J.; Doucette, Alan A.
2013-01-01
Identification of biomarkers capable of differentiating between pathophysiological states of an individual is a laudable goal in the field of proteomics. Protein biomarker discovery generally employs high throughput sample characterization by mass spectrometry (MS), being capable of identifying and quantifying thousands of proteins per sample. While MS-based technologies have rapidly matured, the identification of truly informative biomarkers remains elusive, with only a handful of clinically applicable tests stemming from proteomic workflows. This underlying lack of progress is attributed in large part to erroneous experimental design, biased sample handling, as well as improper statistical analysis of the resulting data. This review will discuss in detail the importance of experimental design and provide some insight into the overall workflow required for biomarker identification experiments. Proper balance between the degree of biological vs. technical replication is required for confident biomarker identification. PMID:28250400
Nogueira, F; Couto, E G; Bernardi, C J
2002-11-01
The Pantanal of Mato Grosso presents distinct landscape units: permanently, occasionally and periodically flooded areas. In the last ones, sampling is especially difficult due to the high heterogeneity occurring inter and intrastratas. This paper presents a comparison of different methodological approaches showing that they can influence decisively the knowledge of distribution organic matter dynamics. In such an area in order to understand the role of the flood pulse in the distribution dynamics of organic matter in a wetland at the Pantanal, we considered that there is spatial dependence between points. This consideration contradicts the classical statistic principle that focuses on the aleatority, and allowed the obtainment of a larger volume of information from a minor sampling effort, which means better performance, with time and money economy.
Texture Classification by Texton: Statistical versus Binary
Guo, Zhenhua; Zhang, Zhongcheng; Li, Xiu; Li, Qin; You, Jane
2014-01-01
Using statistical textons for texture classification has shown great success recently. The maximal response 8 (Statistical_MR8), image patch (Statistical_Joint) and locally invariant fractal (Statistical_Fractal) are typical statistical texton algorithms and state-of-the-art texture classification methods. However, there are two limitations when using these methods. First, it needs a training stage to build a texton library, thus the recognition accuracy will be highly depended on the training samples; second, during feature extraction, local feature is assigned to a texton by searching for the nearest texton in the whole library, which is time consuming when the library size is big and the dimension of feature is high. To address the above two issues, in this paper, three binary texton counterpart methods were proposed, Binary_MR8, Binary_Joint, and Binary_Fractal. These methods do not require any training step but encode local feature into binary representation directly. The experimental results on the CUReT, UIUC and KTH-TIPS databases show that binary texton could get sound results with fast feature extraction, especially when the image size is not big and the quality of image is not poor. PMID:24520346
Design of portable ultraminiature flow cytometers for medical diagnostics
NASA Astrophysics Data System (ADS)
Leary, James F.
2018-02-01
Design of portable microfluidic flow/image cytometry devices for measurements in the field (e.g. initial medical diagnostics) requires careful design in terms of power requirements and weight to allow for realistic portability. True portability with high-throughput microfluidic systems also requires sampling systems without the need for sheath hydrodynamic focusing both to avoid the need for sheath fluid and to enable higher volumes of actual sample, rather than sheath/sample combinations. Weight/power requirements dictate use of super-bright LEDs with top-hat excitation beam architectures and very small silicon photodiodes or nanophotonic sensors that can both be powered by small batteries. Signal-to-noise characteristics can be greatly improved by appropriately pulsing the LED excitation sources and sampling and subtracting noise in between excitation pulses. Microfluidic cytometry also requires judicious use of small sample volumes and appropriate statistical sampling by microfluidic cytometry or imaging for adequate statistical significance to permit real-time (typically in less than 15 minutes) initial medical decisions for patients in the field. This is not something conventional cytometry traditionally worries about, but is very important for development of small, portable microfluidic devices with small-volume throughputs. It also provides a more reasonable alternative to conventional tubes of blood when sampling geriatric and newborn patients for whom a conventional peripheral blood draw can be problematical. Instead one or two drops of blood obtained by pin-prick should be able to provide statistically meaningful results for use in making real-time medical decisions without the need for blood fractionation, which is not realistic in the doctor's office or field.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Callister, Stephen J.; Barry, Richard C.; Adkins, Joshua N.
2006-02-01
Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample setmore » were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias, assigned ranks among the techniques revealed significant trends. For most LC-FTICR MS analyses, linear regression normalization ranked either first or second among the four techniques, suggesting that this technique was more generally suitable for reducing systematic biases.« less
Statistical issues in reporting quality data: small samples and casemix variation.
Zaslavsky, A M
2001-12-01
To present two key statistical issues that arise in analysis and reporting of quality data. Casemix variation is relevant to quality reporting when the units being measured have differing distributions of patient characteristics that also affect the quality outcome. When this is the case, adjustment using stratification or regression may be appropriate. Such adjustments may be controversial when the patient characteristic does not have an obvious relationship to the outcome. Stratified reporting poses problems for sample size and reporting format, but may be useful when casemix effects vary across units. Although there are no absolute standards of reliability, high reliabilities (interunit F > or = 10 or reliability > or = 0.9) are desirable for distinguishing above- and below-average units. When small or unequal sample sizes complicate reporting, precision may be improved using indirect estimation techniques that incorporate auxiliary information, and 'shrinkage' estimation can help to summarize the strength of evidence about units with small samples. With broader understanding of casemix adjustment and methods for analyzing small samples, quality data can be analysed and reported more accurately.
78 FR 63568 - Proposed Collection; Comment Request for Rev. Proc. 2007-35
Federal Register 2010, 2011, 2012, 2013, 2014
2013-10-24
... Revenue Procedure 2007-35, Statistical Sampling for purposes of Section 199. DATES: Written comments... . SUPPLEMENTARY INFORMATION: Title: Statistical Sampling for purposes of Section 199. OMB Number: 1545-2072... statistical sampling may be used in purposes of section 199, which provides a deduction for income...
Kotani, Akira; Tsutsumi, Risa; Shoji, Asaki; Hayashi, Yuzuru; Kusu, Fumiyo; Yamamoto, Kazuhiro; Hakamata, Hideki
2016-07-08
This paper puts forward a time and material-saving method for evaluating the repeatability of area measurements in gradient HPLC with UV detection (HPLC-UV), based on the function of mutual information (FUMI) theory which can theoretically provide the measurement standard deviation (SD) and detection limits through the stochastic properties of baseline noise with no recourse to repetitive measurements of real samples. The chromatographic determination of terbinafine hydrochloride and enalapril maleate is taken as an example. The best choice of the number of noise data points, inevitable for the theoretical evaluation, is shown to be 512 data points (10.24s at 50 point/s sampling rate of an A/D converter). Coupled with the relative SD (RSD) of sample injection variability in the instrument used, the theoretical evaluation is proved to give identical values of area measurement RSDs to those estimated by the usual repetitive method (n=6) over a wide concentration range of the analytes within the 95% confidence intervals of the latter RSD. The FUMI theory is not a statistical one, but the "statistical" reliability of its SD estimates (n=1) is observed to be as high as that attained by thirty-one measurements of the same samples (n=31). Copyright © 2016 Elsevier B.V. All rights reserved.
Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr
2012-01-01
Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.
[Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].
Suzukawa, Yumi; Toyoda, Hideki
2012-04-01
This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.
NASA Technical Reports Server (NTRS)
Bavassano, B.; Dobrowolny, H.; Fanfoni, G.; Mariani, F.; Ness, N. F.
1981-01-01
Helios 2 magnetic data were used to obtain several statistical properties of MHD fluctuations associated with the trailing edge of a given stream served in different solar rotations. Eigenvalues and eigenvectors of the variance matrix, total power and degree of compressibility of the fluctuations were derived and discussed both as a function of distance from the Sun and as a function of the frequency range included in the sample. The results obtained add new information to the picture of MHD turbulence in the solar wind. In particular, a dependence from frequency range of the radial gradients of various statistical quantities is obtained.
Non-parametric early seizure detection in an animal model of temporal lobe epilepsy
NASA Astrophysics Data System (ADS)
Talathi, Sachin S.; Hwang, Dong-Uk; Spano, Mark L.; Simonotto, Jennifer; Furman, Michael D.; Myers, Stephen M.; Winters, Jason T.; Ditto, William L.; Carney, Paul R.
2008-03-01
The performance of five non-parametric, univariate seizure detection schemes (embedding delay, Hurst scale, wavelet scale, nonlinear autocorrelation and variance energy) were evaluated as a function of the sampling rate of EEG recordings, the electrode types used for EEG acquisition, and the spatial location of the EEG electrodes in order to determine the applicability of the measures in real-time closed-loop seizure intervention. The criteria chosen for evaluating the performance were high statistical robustness (as determined through the sensitivity and the specificity of a given measure in detecting a seizure) and the lag in seizure detection with respect to the seizure onset time (as determined by visual inspection of the EEG signal by a trained epileptologist). An optimality index was designed to evaluate the overall performance of each measure. For the EEG data recorded with microwire electrode array at a sampling rate of 12 kHz, the wavelet scale measure exhibited better overall performance in terms of its ability to detect a seizure with high optimality index value and high statistics in terms of sensitivity and specificity.
Mayer, B; Muche, R
2013-01-01
Animal studies are highly relevant for basic medical research, although their usage is discussed controversially in public. Thus, an optimal sample size for these projects should be aimed at from a biometrical point of view. Statistical sample size calculation is usually the appropriate methodology in planning medical research projects. However, required information is often not valid or only available during the course of an animal experiment. This article critically discusses the validity of formal sample size calculation for animal studies. Within the discussion, some requirements are formulated to fundamentally regulate the process of sample size determination for animal experiments.
Statistical Inference for Data Adaptive Target Parameters.
Hubbard, Alan E; Kherad-Pajouh, Sara; van der Laan, Mark J
2016-05-01
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in an estimation sample (one of the V subsamples) and corresponding complementary parameter-generating sample. For each of the V parameter-generating samples, we apply an algorithm that maps the sample to a statistical target parameter. We define our sample-split data adaptive statistical target parameter as the average of these V-sample specific target parameters. We present an estimator (and corresponding central limit theorem) of this type of data adaptive target parameter. This general methodology for generating data adaptive target parameters is demonstrated with a number of practical examples that highlight new opportunities for statistical learning from data. This new framework provides a rigorous statistical methodology for both exploratory and confirmatory analysis within the same data. Given that more research is becoming "data-driven", the theory developed within this paper provides a new impetus for a greater involvement of statistical inference into problems that are being increasingly addressed by clever, yet ad hoc pattern finding methods. To suggest such potential, and to verify the predictions of the theory, extensive simulation studies, along with a data analysis based on adaptively determined intervention rules are shown and give insight into how to structure such an approach. The results show that the data adaptive target parameter approach provides a general framework and resulting methodology for data-driven science.
Variation of microorganism concentrations in urban stormwater runoff with land use and seasons.
Selvakumar, Ariamalar; Borst, Michael
2006-03-01
Stormwater runoff samples were collected from outfalls draining small municipal separate storm sewer systems. The samples were collected from three different land use areas based on local designation (high-density residential, low-density residential and landscaped commercial). The concentrations of microorganisms in the stormwater runoff were found to be similar in magnitude to, but less variable than, those reported in the stormwater National Pollutant Discharge Elimination System (NPDES) database. Microorganism concentrations from high-density residential areas were higher than those associated with low-density residential and landscaped commercial areas. Since the outfalls were free of sanitary wastewater cross-connections, the major sources of microorganisms to the stormwater runoff were most likely from the feces of domestic animals and wildlife. Concentrations of microorganisms were significantly affected by the season during which the samples were collected. The lowest concentrations were observed during winter except for Staphylococcus aureus. The Pearson correlation coefficients among different indicators showed weak linear relationships and the relationships were statistically significant. However, the relationships between indicators and pathogens were poorly correlated and were not statistically significant, suggesting the use of indicators as evidence of the presence of pathogens is not appropriate. Further, the correlation between the concentration of the traditionally monitored indicators (total coliforms and fecal coliforms) and the suggested substitutes (enterococci and E. coli) is weak, but statistically significant, suggesting that historical time series will be only a qualitative indicator of impaired waters under the revised criteria for recreational water quality by the US EPA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vidal-Codina, F., E-mail: fvidal@mit.edu; Nguyen, N.C., E-mail: cuongng@mit.edu; Giles, M.B., E-mail: mike.giles@maths.ox.ac.uk
We present a model and variance reduction method for the fast and reliable computation of statistical outputs of stochastic elliptic partial differential equations. Our method consists of three main ingredients: (1) the hybridizable discontinuous Galerkin (HDG) discretization of elliptic partial differential equations (PDEs), which allows us to obtain high-order accurate solutions of the governing PDE; (2) the reduced basis method for a new HDG discretization of the underlying PDE to enable real-time solution of the parameterized PDE in the presence of stochastic parameters; and (3) a multilevel variance reduction method that exploits the statistical correlation among the different reduced basismore » approximations and the high-fidelity HDG discretization to accelerate the convergence of the Monte Carlo simulations. The multilevel variance reduction method provides efficient computation of the statistical outputs by shifting most of the computational burden from the high-fidelity HDG approximation to the reduced basis approximations. Furthermore, we develop a posteriori error estimates for our approximations of the statistical outputs. Based on these error estimates, we propose an algorithm for optimally choosing both the dimensions of the reduced basis approximations and the sizes of Monte Carlo samples to achieve a given error tolerance. We provide numerical examples to demonstrate the performance of the proposed method.« less
High levels of cynical distrust partly predict premature mortality in middle-aged to ageing men.
Šmigelskas, Kastytis; Joffė, Roza; Jonynienė, Jolita; Julkunen, Juhani; Kauhanen, Jussi
2017-08-01
The aim of this study was to evaluate the effect of cynical distrust on mortality in middle-aged and aging men. The analysis is based on Kuopio Ischemic Heart Disease study, follow-up from 1984 to 2011. Sample consisted of 2682 men, aged 42-61 years at baseline. Data on mortality was provided by the National Death Registry, causes of death were classified by the National Center of Statistics of Finland. Cynical distrust was measured at baseline using Cynical Distrust Scale. Survival analyses were conducted using Cox regression models. In crude estimates after 28 years of follow-up, high cynical distrust was associated with 1.5-1.7 higher hazards for earlier death compared to low cynical distrust. Adjusted for conventional risk factors, high cynical distrust was significantly associated regarding CVD-free men and CVD mortality, while non-CVD mortality in study sample was consistently but not significantly associated. The risk effects were more expressed after 12-20 years rather than in earlier or later follow-up. To conclude, high cynical distrust associates with increased risk of CVD mortality in CVD-free men. The associations with non-CVD mortality are weaker and not reach statistical significance.
A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Bhaduri, Budhendra L
2011-01-01
Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less
NASA Astrophysics Data System (ADS)
Kunert, Anna Theresa; Scheel, Jan Frederik; Helleis, Frank; Klimach, Thomas; Pöschl, Ulrich; Fröhlich-Nowoisky, Janine
2016-04-01
Freezing of water above homogeneous freezing is catalyzed by ice nucleation active (INA) particles called ice nuclei (IN), which can be of various inorganic or biological origin. The freezing temperatures reach up to -1 °C for some biological samples and are dependent on the chemical composition of the IN. The standard method to analyze IN in solution is the droplet freezing assay (DFA) established by Gabor Vali in 1970. Several modifications and improvements were already made within the last decades, but they are still limited by either small droplet numbers, large droplet volumes or inadequate separation of the single droplets resulting in mutual interferences and therefore improper measurements. The probability that miscellaneous IN are concentrated together in one droplet increases with the volume of the droplet, which can be described by the Poisson distribution. At a given concentration, the partition of a droplet into several smaller droplets leads to finely dispersed IN resulting in better statistics and therefore in a better resolution of the nucleation spectrum. We designed a new customized high-performance droplet freezing assay (HP-DFA), which represents an upgrade of the previously existing DFAs in terms of temperature range and statistics. The necessity of observing freezing events at temperatures lower than homogeneous freezing due to freezing point depression, requires high-performance thermostats combined with an optimal insulation. Furthermore, we developed a cooling setup, which allows both huge and tiny temperature changes within a very short period of time. Besides that, the new DFA provides the analysis of more than 750 droplets per run with a small droplet volume of 5 μL. This enables a fast and more precise analysis of biological samples with complex IN composition as well as better statistics for every sample at the same time.
Chen, Nan; Majda, Andrew J
2017-12-05
Solving the Fokker-Planck equation for high-dimensional complex dynamical systems is an important issue. Recently, the authors developed efficient statistically accurate algorithms for solving the Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures, which contain many strong non-Gaussian features such as intermittency and fat-tailed probability density functions (PDFs). The algorithms involve a hybrid strategy with a small number of samples [Formula: see text], where a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious Gaussian kernel density estimation in the remaining low-dimensional subspace. In this article, two effective strategies are developed and incorporated into these algorithms. The first strategy involves a judicious block decomposition of the conditional covariance matrix such that the evolutions of different blocks have no interactions, which allows an extremely efficient parallel computation due to the small size of each individual block. The second strategy exploits statistical symmetry for a further reduction of [Formula: see text] The resulting algorithms can efficiently solve the Fokker-Planck equation with strongly non-Gaussian PDFs in much higher dimensions even with orders in the millions and thus beat the curse of dimension. The algorithms are applied to a [Formula: see text]-dimensional stochastic coupled FitzHugh-Nagumo model for excitable media. An accurate recovery of both the transient and equilibrium non-Gaussian PDFs requires only [Formula: see text] samples! In addition, the block decomposition facilitates the algorithms to efficiently capture the distinct non-Gaussian features at different locations in a [Formula: see text]-dimensional two-layer inhomogeneous Lorenz 96 model, using only [Formula: see text] samples. Copyright © 2017 the Author(s). Published by PNAS.
Shoari, Niloofar; Dubé, Jean-Sébastien; Chenouri, Shoja'eddin
2015-11-01
In environmental studies, concentration measurements frequently fall below detection limits of measuring instruments, resulting in left-censored data. Some studies employ parametric methods such as the maximum likelihood estimator (MLE), robust regression on order statistic (rROS), and gamma regression on order statistic (GROS), while others suggest a non-parametric approach, the Kaplan-Meier method (KM). Using examples of real data from a soil characterization study in Montreal, we highlight the need for additional investigations that aim at unifying the existing literature. A number of studies have examined this issue; however, those considering data skewness and model misspecification are rare. These aspects are investigated in this paper through simulations. Among other findings, results show that for low skewed data, the performance of different statistical methods is comparable, regardless of the censoring percentage and sample size. For highly skewed data, the performance of the MLE method under lognormal and Weibull distributions is questionable; particularly, when the sample size is small or censoring percentage is high. In such conditions, MLE under gamma distribution, rROS, GROS, and KM are less sensitive to skewness. Related to model misspecification, MLE based on lognormal and Weibull distributions provides poor estimates when the true distribution of data is misspecified. However, the methods of rROS, GROS, and MLE under gamma distribution are generally robust to model misspecifications regardless of skewness, sample size, and censoring percentage. Since the characteristics of environmental data (e.g., type of distribution and skewness) are unknown a priori, we suggest using MLE based on gamma distribution, rROS and GROS. Copyright © 2015 Elsevier Ltd. All rights reserved.
Hammerton, Gemma; Harold, Gordon; Thapar, Anita; Thapar, Ajay
2013-01-01
Objective To examine the relationship between blood pressure and depressive disorder in children and adolescents at high risk for depression. Design Multisample longitudinal design including a prospective longitudinal three-wave high-risk study of offspring of parents with recurrent depression and an on-going birth cohort for replication. Setting Community-based studies. Participants High-risk sample includes 281 families where children were aged 9–17 years at baseline and 10–19 years at the final data point. Replication cohort includes 4830 families where children were aged 11–14 years at baseline and 14–17 years at follow-up and a high-risk subsample of 612 offspring with mothers that had reported recurrent depression. Main outcome measures The new-onset of Diagnostic and Statistical Manual of Mental Disorder, fourth edition defined depressive disorder in the offspring using established research diagnostic assessments—the Child and Adolescent Psychiatric Assessment in the high-risk sample and the Development and Wellbeing Assessment in the replication sample. Results Blood pressure was standardised for age and gender to create SD scores and child's weight was statistically controlled in all analyses. In the high-risk sample, lower systolic blood pressure at wave 1 significantly predicted new-onset depressive disorder in children (OR=0.65, 95% CI 0.44 to 0.96; p=0.029) but diastolic blood pressure did not. Depressive disorder at wave 1 did not predict systolic blood pressure at wave 3. A significant association between lower systolic blood pressure and future depression was also found in the replication cohort in the second subset of high-risk children whose mothers had experienced recurrent depression in the past. Conclusions Lower systolic blood pressure predicts new-onset depressive disorder in the offspring of parents with depression. Further studies are needed to investigate how this association arises. PMID:24071459
Bartsch, L.A.; Richardson, W.B.; Naimo, T.J.
1998-01-01
Estimation of benthic macroinvertebrate populations over large spatial scales is difficult due to the high variability in abundance and the cost of sample processing and taxonomic analysis. To determine a cost-effective, statistically powerful sample design, we conducted an exploratory study of the spatial variation of benthic macroinvertebrates in a 37 km reach of the Upper Mississippi River. We sampled benthos at 36 sites within each of two strata, contiguous backwater and channel border. Three standard ponar (525 cm(2)) grab samples were obtained at each site ('Original Design'). Analysis of variance and sampling cost of strata-wide estimates for abundance of Oligochaeta, Chironomidae, and total invertebrates showed that only one ponar sample per site ('Reduced Design') yielded essentially the same abundance estimates as the Original Design, while reducing the overall cost by 63%. A posteriori statistical power analysis (alpha = 0.05, beta = 0.20) on the Reduced Design estimated that at least 18 sites per stratum were needed to detect differences in mean abundance between contiguous backwater and channel border areas for Oligochaeta, Chironomidae, and total invertebrates. Statistical power was nearly identical for the three taxonomic groups. The abundances of several taxa of concern (e.g., Hexagenia mayflies and Musculium fingernail clams) were too spatially variable to estimate power with our method. Resampling simulations indicated that to achieve adequate sampling precision for Oligochaeta, at least 36 sample sites per stratum would be required, whereas a sampling precision of 0.2 would not be attained with any sample size for Hexagenia in channel border areas, or Chironomidae and Musculium in both strata given the variance structure of the original samples. Community-wide diversity indices (Brillouin and 1-Simpsons) increased as sample area per site increased. The backwater area had higher diversity than the channel border area. The number of sampling sites required to sample benthic macroinvertebrates during our sampling period depended on the study objective and ranged from 18 to more than 40 sites per stratum. No single sampling regime would efficiently and adequately sample all components of the macroinvertebrate community.
Development of uncertainty-based work injury model using Bayesian structural equation modelling.
Chatterjee, Snehamoy
2014-01-01
This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.
Water quality analysis of the Rapur area, Andhra Pradesh, South India using multivariate techniques
NASA Astrophysics Data System (ADS)
Nagaraju, A.; Sreedhar, Y.; Thejaswi, A.; Sayadi, Mohammad Hossein
2017-10-01
The groundwater samples from Rapur area were collected from different sites to evaluate the major ion chemistry. The large number of data can lead to difficulties in the integration, interpretation, and representation of the results. Two multivariate statistical methods, hierarchical cluster analysis (HCA) and factor analysis (FA), were applied to evaluate their usefulness to classify and identify geochemical processes controlling groundwater geochemistry. Four statistically significant clusters were obtained from 30 sampling stations. This has resulted two important clusters viz., cluster 1 (pH, Si, CO3, Mg, SO4, Ca, K, HCO3, alkalinity, Na, Na + K, Cl, and hardness) and cluster 2 (EC and TDS) which are released to the study area from different sources. The application of different multivariate statistical techniques, such as principal component analysis (PCA), assists in the interpretation of complex data matrices for a better understanding of water quality of a study area. From PCA, it is clear that the first factor (factor 1), accounted for 36.2% of the total variance, was high positive loading in EC, Mg, Cl, TDS, and hardness. Based on the PCA scores, four significant cluster groups of sampling locations were detected on the basis of similarity of their water quality.
Blynn, Emily; Ahmed, Saifuddin; Gibson, Dustin; Pariyo, George; Hyder, Adnan A
2017-01-01
In low- and middle-income countries (LMICs), historically, household surveys have been carried out by face-to-face interviews to collect survey data related to risk factors for noncommunicable diseases. The proliferation of mobile phone ownership and the access it provides in these countries offers a new opportunity to remotely conduct surveys with increased efficiency and reduced cost. However, the near-ubiquitous ownership of phones, high population mobility, and low cost require a re-examination of statistical recommendations for mobile phone surveys (MPS), especially when surveys are automated. As with landline surveys, random digit dialing remains the most appropriate approach to develop an ideal survey-sampling frame. Once the survey is complete, poststratification weights are generally applied to reduce estimate bias and to adjust for selectivity due to mobile ownership. Since weights increase design effects and reduce sampling efficiency, we introduce the concept of automated active strata monitoring to improve representativeness of the sample distribution to that of the source population. Although some statistical challenges remain, MPS represent a promising emerging means for population-level data collection in LMICs. PMID:28476726
ERIC Educational Resources Information Center
White, Susan; Tesfaye, Casey Langer
2014-01-01
Since 1987, the Statistical Research Center at the American Institute of Physics has regularly conducted a nationwide survey of high school physics teachers to take a closer look at physics in U.S. high schools. We contact all of the teachers who teach at least one physics course at a nationally representative sample of all U.S. high schools-both…
Heidel, R Eric
2016-01-01
Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.
Moorman, J. Randall; Delos, John B.; Flower, Abigail A.; Cao, Hanqing; Kovatchev, Boris P.; Richman, Joshua S.; Lake, Douglas E.
2014-01-01
We have applied principles of statistical signal processing and non-linear dynamics to analyze heart rate time series from premature newborn infants in order to assist in the early diagnosis of sepsis, a common and potentially deadly bacterial infection of the bloodstream. We began with the observation of reduced variability and transient decelerations in heart rate interval time series for hours up to days prior to clinical signs of illness. We find that measurements of standard deviation, sample asymmetry and sample entropy are highly related to imminent clinical illness. We developed multivariable statistical predictive models, and an interface to display the real-time results to clinicians. Using this approach, we have observed numerous cases in which incipient neonatal sepsis was diagnosed and treated without any clinical illness at all. This review focuses on the mathematical and statistical time series approaches used to detect these abnormal heart rate characteristics and present predictive monitoring information to the clinician. PMID:22026974
Derivation and Applicability of Asymptotic Results for Multiple Subtests Person-Fit Statistics
Albers, Casper J.; Meijer, Rob R.; Tendeiro, Jorge N.
2016-01-01
In high-stakes testing, it is important to check the validity of individual test scores. Although a test may, in general, result in valid test scores for most test takers, for some test takers, test scores may not provide a good description of a test taker’s proficiency level. Person-fit statistics have been proposed to check the validity of individual test scores. In this study, the theoretical asymptotic sampling distribution of two person-fit statistics that can be used for tests that consist of multiple subtests is first discussed. Second, simulation study was conducted to investigate the applicability of this asymptotic theory for tests of finite length, in which the correlation between subtests and number of items in the subtests was varied. The authors showed that these distributions provide reasonable approximations, even for tests consisting of subtests of only 10 items each. These results have practical value because researchers do not have to rely on extensive simulation studies to simulate sampling distributions. PMID:29881053
Sampling methods to the statistical control of the production of blood components.
Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo
2017-12-01
The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.
A lower bound on the number of cosmic ray events required to measure source catalogue correlations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dolci, Marco; Romero-Wolf, Andrew; Wissel, Stephanie, E-mail: marco.dolci@polito.it, E-mail: Andrew.Romero-Wolf@jpl.nasa.gov, E-mail: swissel@calpoly.edu
2016-10-01
Recent analyses of cosmic ray arrival directions have resulted in evidence for a positive correlation with active galactic nuclei positions that has weak significance against an isotropic source distribution. In this paper, we explore the sample size needed to measure a highly statistically significant correlation to a parent source catalogue. We compare several scenarios for the directional scattering of ultra-high energy cosmic rays given our current knowledge of the galactic and intergalactic magnetic fields. We find significant correlations are possible for a sample of >1000 cosmic ray protons with energies above 60 EeV.
Towards a routine application of Top-Down approaches for label-free discovery workflows.
Schmit, Pierre-Olivier; Vialaret, Jerome; Wessels, Hans J C T; van Gool, Alain J; Lehmann, Sylvain; Gabelle, Audrey; Wood, Jason; Bern, Marshall; Paape, Rainer; Suckau, Detlev; Kruppa, Gary; Hirtz, Christophe
2018-03-20
Thanks to proteomics investigations, our vision of the role of different protein isoforms in the pathophysiology of diseases has largely evolved. The idea that protein biomarkers like tau, amyloid peptides, ApoE, cystatin, or neurogranin are represented in body fluids as single species is obviously over-simplified, as most proteins are present in different isoforms and subjected to numerous processing and post-translational modifications. Measuring the intact mass of proteins by MS has the advantage to provide information on the presence and relative amount of the different proteoforms. Such Top-Down approaches typically require a high degree of sample pre-fractionation to allow the MS system to deliver optimal performance in terms of dynamic range, mass accuracy and resolution. In clinical studies, however, the requirements for pre-analytical robustness and sample size large enough for statistical power restrict the routine use of a high degree of sample pre-fractionation. In this study, we have investigated the capacities of current-generation Ultra-High Resolution Q-Tof systems to deal with high complexity intact protein samples and have evaluated the approach on a cohort of patients suffering from neurodegenerative disease. Statistical analysis has shown that several proteoforms can be used to distinguish Alzheimer disease patients from patients suffering from other neurodegenerative disease. Top-down approaches have an extremely high biological relevance, especially when it comes to biomarker discovery, but the necessary pre-fractionation constraints are not easily compatible with the robustness requirements and the size of clinical sample cohorts. We have demonstrated that intact protein profiling studies could be run on UHR-Q-ToF with limited pre-fractionation. The proteoforms that have been identified as candidate biomarkers in the-proof-of concept study are derived from proteins known to play a role in the pathophysiology process of Alzheimer disease. Copyright © 2017 Elsevier B.V. All rights reserved.
Pocket guide to transportation, 1999
DOT National Transportation Integrated Search
1998-12-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Pocket guide to transportation, 2009
DOT National Transportation Integrated Search
2009-01-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Pocket guide to transportation, 2013.
DOT National Transportation Integrated Search
2013-01-01
Abstract Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, ...
Pocket guide to transportation, 2010
DOT National Transportation Integrated Search
2010-01-01
Statistics published in this Pocket Guide to Transportation come from many different sources. Some statistics are based on samples and are subject to sampling variability. Statistics may also be subject to omissions and errors in reporting, recording...
Comparative Financial Statistics for Public Two-Year Colleges: FY 1992 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
This report, the 15th in an annual series, provides comparative information derived from a national sample of 544 public two-year colleges, highlighting financial statistics for fiscal year 1991-92. The report offers space for colleges to compare their institutional statistics with data provided on national sample medians; quartile data for the…
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.
ERIC Educational Resources Information Center
Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas
2002-01-01
Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
Velez-Montoya, Raul; Clapp, Carmen; Rivera, Jose Carlos; Garcia-Aguirre, Gerardo; Morales-Cantón, Virgilio; Fromow-Guerra, Jans; Guerrero-Naranjo, Jose Luis; Quiroz-Mercado, Hugo
2010-01-01
Purpose: To measure vitreous, aqueous, subretinal fluid and plasma levels of vascular endothelial growth factor in late stages of retinopathy of prematurity. Methods: Interventional study. We enrolled patients with clinical diagnoses of bilateral stage V retinopathy of prematurity, confirmed by b-scan ultrasound and programmed for vitrectomy. During surgery we took samples from blood, aqueous, vitreous, and subretinal fluids. The vascular endothelial growth factor concentration in each sample was measured by ELISA reaction. A control sample of aqueous, vitreous and blood was taken from patients with congenital cataract programmed for phacoemulsification. For statistical analysis, a Mann–Whitney and a Wilcoxon W test was done with a significant P value of 0.05. Results: We took samples of 16 consecutive patients who met the inclusion criteria. The vascular endothelial growth factor levels in the study group were: aqueous, 76.81 ± 61.89 pg/mL; vitreous, 118.53 ± 65.87 pg/mL; subretinal fluid, 1636.58 ± 356.47 pg/mL; and plasma, 74.64 ± 43.94 pg/mL. There was a statistical difference between the study and the control group (P < 0.001) in the aqueous and vitreous samples. Conclusion: Stage 5 retinopathy of prematurity has elevated intraocular levels of vascular endothelial growth factor, which remains high despite severe retinal lesion. There was no statistical difference in plasma levels of the molecule between the control and study group. PMID:20856587
Thompson, Steven K
2006-12-01
A flexible class of adaptive sampling designs is introduced for sampling in network and spatial settings. In the designs, selections are made sequentially with a mixture distribution based on an active set that changes as the sampling progresses, using network or spatial relationships as well as sample values. The new designs have certain advantages compared with previously existing adaptive and link-tracing designs, including control over sample sizes and of the proportion of effort allocated to adaptive selections. Efficient inference involves averaging over sample paths consistent with the minimal sufficient statistic. A Markov chain resampling method makes the inference computationally feasible. The designs are evaluated in network and spatial settings using two empirical populations: a hidden human population at high risk for HIV/AIDS and an unevenly distributed bird population.
Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos
2005-01-01
Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Estimating TCP Packet Loss Ratio from Sampled ACK Packets
NASA Astrophysics Data System (ADS)
Yamasaki, Yasuhiro; Shimonishi, Hideyuki; Murase, Tutomu
The advent of various quality-sensitive applications has greatly changed the requirements for IP network management and made the monitoring of individual traffic flows more important. Since the processing costs of per-flow quality monitoring are high, especially in high-speed backbone links, packet sampling techniques have been attracting considerable attention. Existing sampling techniques, such as those used in Sampled NetFlow and sFlow, however, focus on the monitoring of traffic volume, and there has been little discussion of the monitoring of such quality indexes as packet loss ratio. In this paper we propose a method for estimating, from sampled packets, packet loss ratios in individual TCP sessions. It detects packet loss events by monitoring duplicate ACK events raised by each TCP receiver. Because sampling reveals only a portion of the actual packet loss, the actual packet loss ratio is estimated statistically. Simulation results show that the proposed method can estimate the TCP packet loss ratio accurately from a 10% sampling of packets.
Connecting optical and X-ray tracers of galaxy cluster relaxation
NASA Astrophysics Data System (ADS)
Roberts, Ian D.; Parker, Laura C.; Hlavacek-Larrondo, Julie
2018-04-01
Substantial effort has been devoted in determining the ideal proxy for quantifying the morphology of the hot intracluster medium in clusters of galaxies. These proxies, based on X-ray emission, typically require expensive, high-quality X-ray observations making them difficult to apply to large surveys of groups and clusters. Here, we compare optical relaxation proxies with X-ray asymmetries and centroid shifts for a sample of Sloan Digital Sky Survey clusters with high-quality, archival X-ray data from Chandra and XMM-Newton. The three optical relaxation measures considered are the shape of the member-galaxy projected velocity distribution - measured by the Anderson-Darling (AD) statistic, the stellar mass gap between the most-massive and second-most-massive cluster galaxy, and the offset between the most-massive galaxy (MMG) position and the luminosity-weighted cluster centre. The AD statistic and stellar mass gap correlate significantly with X-ray relaxation proxies, with the AD statistic being the stronger correlator. Conversely, we find no evidence for a correlation between X-ray asymmetry or centroid shift and the MMG offset. High-mass clusters (Mhalo > 1014.5 M⊙) in this sample have X-ray asymmetries, centroid shifts, and Anderson-Darling statistics which are systematically larger than for low-mass systems. Finally, considering the dichotomy of Gaussian and non-Gaussian clusters (measured by the AD test), we show that the probability of being a non-Gaussian cluster correlates significantly with X-ray asymmetry but only shows a marginal correlation with centroid shift. These results confirm the shape of the radial velocity distribution as a useful proxy for cluster relaxation, which can then be applied to large redshift surveys lacking extensive X-ray coverage.
The Importance of Introductory Statistics Students Understanding Appropriate Sampling Techniques
ERIC Educational Resources Information Center
Menil, Violeta C.
2005-01-01
In this paper the author discusses the meaning of sampling, the reasons for sampling, the Central Limit Theorem, and the different techniques of sampling. Practical and relevant examples are given to make the appropriate sampling techniques understandable to students of Introductory Statistics courses. With a thorough knowledge of sampling…
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
A survey of work engagement and psychological capital levels.
Bonner, Lynda
2016-08-11
To evaluate the relationship between work engagement and psychological capital (PsyCap) levels reported by registered nurses. PsyCap is a developable human resource. Research on PsyCap as an antecedent to work engagement in nurses is needed. A convenience sample of 137 registered nurses participated in this quantitative cross-sectional survey. Questionnaires measured self-reported levels of work engagement and psychological capital. Descriptive and inferential statistics were used for data analysis. There was a statistically significant correlation between work engagement and PsyCap scores (r=0.633, p<0.01). Nurses working at band 5 level reported statistically significantly lower PsyCap scores compared with nurses working at band 6 and 7 levels. Nurses reporting high levels of work engagement also reported high levels of PsyCap. Band 5 nurses might benefit most from interventions to increase their PsyCap. This study supports PsyCap as an antecedent to work engagement.
Survey of rural, private wells. Statistical design
Mehnert, Edward; Schock, Susan C.; ,
1991-01-01
Half of Illinois' 38 million acres were planted in corn and soybeans in 1988. On the 19 million acres planted in corn and soybeans, approximately 1 million tons of nitrogen fertilizer and 50 million pounds of pesticides were applied. Because groundwater is the water supply for over 90 percent of rural Illinois, the occurrence of agricultural chemicals in groundwater in Illinois is of interest to the agricultural community, the public, and regulatory agencies. The occurrence of agricultural chemicals in groundwater is well documented. However, the extent of this contamination still needs to be defined. This can be done by randomly sampling wells across a geographic area. Key elements of a random, water-well sampling program for regional groundwater quality include the overall statistical design of the program, definition of the sample population, selection of wells to be sampled, and analysis of survey results. These elements must be consistent with the purpose for conducting the program; otherwise, the program will not provide the desired information. The need to carefully design and conduct a sampling program becomes readily apparent when one considers the high cost of collecting and analyzing a sample. For a random sampling program conducted in Illinois, the key elements, as well as the limitations imposed by available information, are described.
Nursing Care as Perceived by Nurses Working in Disability Community Settings in Greece
Fotiadou, Elpida; Malliarou, Maria; Zetta, Stella; Gouva, Mary; Kotrotsiou, Evaggelia
2016-01-01
Introduction-Aim: The concept of nursing care in learning disability community settings has not been investigated in Greece. The aim of this paper is to investigate how nurses working in learning disability community settings perceive the meaning of nursing care. Material and Methods: The sample consisted of 100 nurses and nursing assistants working in a social care hospice. Participants were asked to answer questions about socio- demographic characteristics of the sample and fill in a questionnaire of care (GR-NDI-24), the “Job-Communication-Satisfaction-Importance” (JCSI) questionnaire and the altruism scale of Ahmed and Jackson. The data analysis was realized with statistical methods of descriptive and inductive statistics. The analysis was made with the use of SPSS (version 19). Results: The majority of the sample was women (78%). The majority of participants were married (66 %), DE graduates (66%) without postgraduate studies (96.7%). The mean age of respondents was 36.98±6.70 years. On the scales of caring and altruism, the mean values were 40.89±15.87 and 28.12±4.16 respectively. Very or fully satisfied with his work was 72% of the sample. The scope of work emerges as the most important factor influencing job satisfaction. The wages and working conditions (73% and 40% respectively) are the parameters of work which gathers the most dissatisfaction, while the salary is emerging as the most important parameter, the improvement of which would provide the highest satisfaction. Marginally statistically significant difference was observed in the range between TE graduates (d=40) and those of the DE grade (d=37), p=0.053. No statistically significant differences were observed in relation to other working and demographic characteristics (p>0.05). Greater care importance was associated with greater job satisfaction (p<0.01), while the latter was associated with high levels of altruism (p<0.05). Conclusion: The scope of work provides high satisfaction to nurses working in social care hospices, while the salary is not satisfactory. Nurses’ aides appeared highly sensitive to care issues. A multidimensional approach to the materiality of care and job satisfaction in future research will allow to further highlight all the aspects affecting job satisfaction and performance of nurses. This will identify critical parameters of nursing care in healthcare centers for the chronically ill. PMID:26383223
Nursing Care as Perceived by Nurses Working in Disability Community Settings in Greece.
Fotiadou, Elpida; Malliarou, Maria; Zetta, Stella; Gouva, Mary; Kotrotsiou, Evaggelia
2015-06-25
The concept of nursing care in learning disability community settings has not been investigated in Greece. The aim of this paper is to investigate how nurses working in learning disability community settings perceive the meaning of nursing care. The sample consisted of 100 nurses and nursing assistants working in a social care hospice. Participants were asked to answer questions about socio- demographic characteristics of the sample and fill in a questionnaire of care (GR-NDI-24), the "Job-Communication-Satisfaction-Importance" (JCSI) questionnaire and the altruism scale of Ahmed and Jackson. The data analysis was realized with statistical methods of descriptive and inductive statistics. The analysis was made with the use of SPSS (version 19). The majority of the sample was women (78%). The majority of participants were married (66 %), DE graduates (66%) without postgraduate studies (96.7%). The mean age of respondents was 36.98±6.70 years. On the scales of caring and altruism, the mean values were 40.89±15.87 and 28.12±4.16 respectively. Very or fully satisfied with his work was 72% of the sample. The scope of work emerges as the most important factor influencing job satisfaction. The wages and working conditions (73% and 40% respectively) are the parameters of work which gathers the most dissatisfaction, while the salary is emerging as the most important parameter, the improvement of which would provide the highest satisfaction. Marginally statistically significant difference was observed in the range between TE graduates (d=40) and those of the DE grade (d=37), p=0.053. No statistically significant differences were observed in relation to other working and demographic characteristics (p>0.05). Greater care importance was associated with greater job satisfaction (p<0.01), while the latter was associated with high levels of altruism (p<0.05). The scope of work provides high satisfaction to nurses working in social care hospices, while the salary is not satisfactory. Nurses' aides appeared highly sensitive to care issues. A multidimensional approach to the materiality of care and job satisfaction in future research will allow to further highlight all the aspects affecting job satisfaction and performance of nurses. This will identify critical parameters of nursing care in healthcare centers for the chronically ill.
Rogala, James T.; Gray, Brian R.
2006-01-01
The Long Term Resource Monitoring Program (LTRMP) uses a stratified random sampling design to obtain water quality statistics within selected study reaches of the Upper Mississippi River System (UMRS). LTRMP sampling strata are based on aquatic area types generally found in large rivers (e.g., main channel, side channel, backwater, and impounded areas). For hydrologically well-mixed strata (i.e., main channel), variance associated with spatial scales smaller than the strata scale is a relatively minor issue for many water quality parameters. However, analysis of LTRMP water quality data has shown that within-strata variability at the strata scale is high in off-channel areas (i.e., backwaters). A portion of that variability may be associated with differences among individual backwater lakes (i.e., small and large backwater regions separated by channels) that cumulatively make up the backwater stratum. The objective of the statistical modeling presented here is to determine if differences among backwater lakes account for a large portion of the variance observed in the backwater stratum for selected parameters. If variance associated with backwater lakes is high, then inclusion of backwater lake effects within statistical models is warranted. Further, lakes themselves may represent natural experimental units where associations of interest to management may be estimated.
Impact of different tongue cleaning methods on the bacterial load of the tongue dorsum.
Bordas, Alice; McNab, Rod; Staples, Angela M; Bowman, Jim; Kanapka, Joe; Bosma, Marylynn P
2008-04-01
To assess the extent and duration of the effect of tongue cleaning procedures on bacterial load on the dorsal surface of the tongue. 19 subjects participated in this blinded crossover study. Subjects abstained from oral hygiene, eating and drinking from 22:00 h the previous evening. Tongue samples were collected at baseline and within 15 minutes of one of three procedures: teeth brushing alone; teeth brushing plus tongue scraping; teeth brushing plus tongue cleaning using a high speed vacuum ejector and irrigation with 20 ml antibacterial mouthwash. Subjects then brushed twice daily for 3 days apart from the second group who additionally scraped their tongue twice daily. On day 4, baseline and post-treatment samples were collected as per day 1. Bacteria (total anaerobes, Gram-negative anaerobes, VSC-producing bacteria and Streptococcus saliuarius) were enumerated using appropriate selective media. The tongue dorsum was colonized by all 4 bacterial categories (log(10) 6-8 cfu/sample). For subjects who brushed their teeth only, there was a significant reduction from baseline for S. saliuarius only. In contrast, tooth brushing plus tongue scraping resulted in statistically significant reductions from baseline for all bacterial categories (range log(10) 0.11-0.40 cfu/sample). Highly statistically significant reductions (log(10) 1.11-1.96 cfu/sample) were observed for subjects who underwent thorough tongue cleaning with the saliva ejector/mouthwash. To determine longevity of treatment effects, baseline bacterial loads for days 1 and 4 were compared. Only daily tongue scraping resulted in statistical significant reduction in baseline microbial loads on day 4. While mechanical tongue cleaning with or without chemical intervention can reduce bacterial load on the tongue, this effect is transient, and regular tongue cleaning is required to provide a long lasting (overnight) reduction in bacterial numbers. Nevertheless, tongue cleaning is an oral hygiene procedure that is little practiced due to discomfort and/or lack of awareness on the part of dental professionals and their patients.
Flow Chamber System for the Statistical Evaluation of Bacterial Colonization on Materials
Menzel, Friederike; Conradi, Bianca; Rodenacker, Karsten; Gorbushina, Anna A.; Schwibbert, Karin
2016-01-01
Biofilm formation on materials leads to high costs in industrial processes, as well as in medical applications. This fact has stimulated interest in the development of new materials with improved surfaces to reduce bacterial colonization. Standardized tests relying on statistical evidence are indispensable to evaluate the quality and safety of these new materials. We describe here a flow chamber system for biofilm cultivation under controlled conditions with a total capacity for testing up to 32 samples in parallel. In order to quantify the surface colonization, bacterial cells were DAPI (4`,6-diamidino-2-phenylindole)-stained and examined with epifluorescence microscopy. More than 100 images of each sample were automatically taken and the surface coverage was estimated using the free open source software g’mic, followed by a precise statistical evaluation. Overview images of all gathered pictures were generated to dissect the colonization characteristics of the selected model organism Escherichia coli W3310 on different materials (glass and implant steel). With our approach, differences in bacterial colonization on different materials can be quantified in a statistically validated manner. This reliable test procedure will support the design of improved materials for medical, industrial, and environmental (subaquatic or subaerial) applications. PMID:28773891
Estimating maize production in Kenya using NDVI: Some statistical considerations
Lewis, J.E.; Rowland, James; Nadeau , A.
1998-01-01
A regression model approach using a normalized difference vegetation index (NDVI) has the potential for estimating crop production in East Africa. However, before production estimation can become a reality, the underlying model assumptions and statistical nature of the sample data (NDVI and crop production) must be examined rigorously. Annual maize production statistics from 1982-90 for 36 agricultural districts within Kenya were used as the dependent variable; median area NDVI (independent variable) values from each agricultural district and year were extracted from the annual maximum NDVI data set. The input data and the statistical association of NDVI with maize production for Kenya were tested systematically for the following items: (1) homogeneity of the data when pooling the sample, (2) gross data errors and influence points, (3) serial (time) correlation, (4) spatial autocorrelation and (5) stability of the regression coefficients. The results of using a simple regression model with NDVI as the only independent variable are encouraging (r 0.75, p 0.05) and illustrate that NDVI can be a responsive indicator of maize production, especially in areas of high NDVI spatial variability, which coincide with areas of production variability in Kenya.
Biosignature Discovery for Substance Use Disorders Using Statistical Learning.
Baurley, James W; McMahan, Christopher S; Ervin, Carolyn M; Pardamean, Bens; Bergen, Andrew W
2018-02-01
There are limited biomarkers for substance use disorders (SUDs). Traditional statistical approaches are identifying simple biomarkers in large samples, but clinical use cases are still being established. High-throughput clinical, imaging, and 'omic' technologies are generating data from SUD studies and may lead to more sophisticated and clinically useful models. However, analytic strategies suited for high-dimensional data are not regularly used. We review strategies for identifying biomarkers and biosignatures from high-dimensional data types. Focusing on penalized regression and Bayesian approaches, we address how to leverage evidence from existing studies and knowledge bases, using nicotine metabolism as an example. We posit that big data and machine learning approaches will considerably advance SUD biomarker discovery. However, translation to clinical practice, will require integrated scientific efforts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses.
Liu, Ruijie; Holik, Aliaksei Z; Su, Shian; Jansz, Natasha; Chen, Kelan; Leong, Huei San; Blewitt, Marnie E; Asselin-Labat, Marie-Liesse; Smyth, Gordon K; Ritchie, Matthew E
2015-09-03
Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean-variance relationship of the log-counts-per-million using 'voom'. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source 'limma' package. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
The Performance Analysis Based on SAR Sample Covariance Matrix
Erten, Esra
2012-01-01
Multi-channel systems appear in several fields of application in science. In the Synthetic Aperture Radar (SAR) context, multi-channel systems may refer to different domains, as multi-polarization, multi-interferometric or multi-temporal data, or even a combination of them. Due to the inherent speckle phenomenon present in SAR images, the statistical description of the data is almost mandatory for its utilization. The complex images acquired over natural media present in general zero-mean circular Gaussian characteristics. In this case, second order statistics as the multi-channel covariance matrix fully describe the data. For practical situations however, the covariance matrix has to be estimated using a limited number of samples, and this sample covariance matrix follow the complex Wishart distribution. In this context, the eigendecomposition of the multi-channel covariance matrix has been shown in different areas of high relevance regarding the physical properties of the imaged scene. Specifically, the maximum eigenvalue of the covariance matrix has been frequently used in different applications as target or change detection, estimation of the dominant scattering mechanism in polarimetric data, moving target indication, etc. In this paper, the statistical behavior of the maximum eigenvalue derived from the eigendecomposition of the sample multi-channel covariance matrix in terms of multi-channel SAR images is simplified for SAR community. Validation is performed against simulated data and examples of estimation and detection problems using the analytical expressions are as well given. PMID:22736976
Vadalà, Rossella; Mottese, Antonio F.; Bua, Giuseppe D.; Salvo, Andrea; Mallamace, Domenico; Corsaro, Carmelo; Vasi, Sebastiano; Giofrè, Salvatore V.; Alfa, Maria; Cicero, Nicola; Dugo, Giacomo
2016-01-01
We performed a statistical analysis of the concentration of mineral elements, by means of inductively coupled plasma mass spectrometry (ICP-MS), in different varieties of garlic from Spain, Tunisia, and Italy. Nubia Red Garlic (Sicily) is one of the most known Italian varieties that belongs to traditional Italian food products (P.A.T.) of the Ministry of Agriculture, Food, and Forestry. The obtained results suggest that the concentrations of the considered elements may serve as geographical indicators for the discrimination of the origin of the different samples. In particular, we found a relatively high content of Selenium in the garlic variety known as Nubia red garlic, and, indeed, it could be used as an anticarcinogenic agent. PMID:28231115
STATISTICAL SAMPLING AND DATA ANALYSIS
Research is being conducted to develop approaches to improve soil and sediment sampling techniques, measurement design and geostatistics, and data analysis via chemometric, environmetric, and robust statistical methods. Improvements in sampling contaminated soil and other hetero...
Houts, Carrie R; Edwards, Michael C; Wirth, R J; Deal, Linda S
2016-11-01
There has been a notable increase in the advocacy of using small-sample designs as an initial quantitative assessment of item and scale performance during the scale development process. This is particularly true in the development of clinical outcome assessments (COAs), where Rasch analysis has been advanced as an appropriate statistical tool for evaluating the developing COAs using a small sample. We review the benefits such methods are purported to offer from both a practical and statistical standpoint and detail several problematic areas, including both practical and statistical theory concerns, with respect to the use of quantitative methods, including Rasch-consistent methods, with small samples. The feasibility of obtaining accurate information and the potential negative impacts of misusing large-sample statistical methods with small samples during COA development are discussed.
A coronagraphic search for brown dwarfs around nearby stars
NASA Technical Reports Server (NTRS)
Nakajima, T.; Durrance, S. T.; Golimowski, D. A.; Kulkarni, S. R.
1994-01-01
Brown dwarf companions have been searched for around stars within 10 pc of the Sun using the Johns-Hopkins University Adaptive Optics Coronagraph (AOC), a stellar coronagraph with an image stabilizer. The AOC covers the field around the target star with a minimum search radius of 1 sec .5 and a field of view of 1 arcmin sq. We have reached an unprecedented dynamic range of Delta m = 13 in our search for faint companions at I band. Comparison of our survey with other brown dwarf searches shows that the AOC technique is unique in its dynamic range while at the same time just as sensitive to brown dwarfs as the recent brown dwarf surveys. The present survey covered 24 target stars selected from the Gliese catalog. A total of 94 stars were detected in 16 fields. The low-latitude fields are completely dominated by background star contamination. Kolmogorov-Smirnov tests were carried out for a sample restricted to high latitudes and a sample with small angular separations. The high-latitude sample (b greater than or equal to 44 deg) appears to show spatial concentration toward target stars. The small separation sample (Delta Theta less than 20 sec) shows weaker dependence on Galactic coordinates than field stars. These statistical tests suggest that both the high-latitude sample and the small separation sample can include a substantial fraction of true companions. However, the nature of these putative companions is mysterious. They are too faint to be white dwarfs and too blue for brown dwarfs. Ignoring the signif icance of the statistical tests, we can reconcile most of the detections with distant main-sequence stars or white dwarfs except for a candidate next to GL 475. Given the small size of our sample, we conclude that considerably more targets need to be surveyed before a firm conclusion on the possibility of a new class of companions can be made.
Toward cost-efficient sampling methods
NASA Astrophysics Data System (ADS)
Luo, Peng; Li, Yongli; Wu, Chong; Zhang, Guijie
2015-09-01
The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper proposes two new sampling methods based on the idea that a small part of vertices with high node degree could possess the most structure information of a complex network. The two proposed sampling methods are efficient in sampling high degree nodes so that they would be useful even if the sampling rate is low, which means cost-efficient. The first new sampling method is developed on the basis of the widely used stratified random sampling (SRS) method and the second one improves the famous snowball sampling (SBS) method. In order to demonstrate the validity and accuracy of two new sampling methods, we compare them with the existing sampling methods in three commonly used simulation networks that are scale-free network, random network, small-world network, and also in two real networks. The experimental results illustrate that the two proposed sampling methods perform much better than the existing sampling methods in terms of achieving the true network structure characteristics reflected by clustering coefficient, Bonacich centrality and average path length, especially when the sampling rate is low.
A critical analysis of high-redshift, massive, galaxy clusters. Part I
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoyle, Ben; Jimenez, Raul; Verde, Licia
2012-02-01
We critically investigate current statistical tests applied to high redshift clusters of galaxies in order to test the standard cosmological model and describe their range of validity. We carefully compare a sample of high-redshift, massive, galaxy clusters with realistic Poisson sample simulations of the theoretical mass function, which include the effect of Eddington bias. We compare the observations and simulations using the following statistical tests: the distributions of ensemble and individual existence probabilities (in the > M, > z sense), the redshift distributions, and the 2d Kolmogorov-Smirnov test. Using seemingly rare clusters from Hoyle et al. (2011), and Jee etmore » al. (2011) and assuming the same survey geometry as in Jee et al. (2011, which is less conservative than Hoyle et al. 2011), we find that the ( > M, > z) existence probabilities of all clusters are fully consistent with ΛCDM. However assuming the same survey geometry, we use the 2d K-S test probability to show that the observed clusters are not consistent with being the least probable clusters from simulations at > 95% confidence, and are also not consistent with being a random selection of clusters, which may be caused by the non-trivial selection function and survey geometry. Tension can be removed if we examine only a X-ray selected sub sample, with simulations performed assuming a modified survey geometry.« less
ZERODUR - bending strength: review of achievements
NASA Astrophysics Data System (ADS)
Hartmann, Peter
2017-08-01
Increased demand for using the glass ceramic ZERODUR® with high mechanical loads called for strength data based on larger statistical samples. Design calculations for failure probability target value below 1: 100 000 cannot be made reliable with parameters derived from 20 specimen samples. The data now available for a variety of surface conditions, ground with different grain sizes and acid etched for full micro crack removal, allow stresses by factors four to ten times higher than before. The large sample revealed that breakage stresses of ground surfaces follow the three parameter Weibull distribution instead of the two parameter version. This is more reasonable considering that the micro cracks of such surfaces have a maximum depth which is reflected in the existence of a threshold breakage stress below which breakage probability is zero. This minimum strength allows calculating minimum lifetimes. Fatigue under load can be taken into account by using the stress corrosion coefficient for the actual environmental humidity. For fully etched surfaces Weibull statistics fails. The precondition of the Weibull distribution, the existence of one unique failure mechanism, is not given anymore. ZERODUR® with fully etched surfaces free from damages introduced after etching endures easily 100 MPa tensile stress. The possibility to use ZERODUR® for combined high precision and high stress application was confirmed by the successful launch and continuing operation of LISA Pathfinder the precursor experiment for the gravitational wave antenna satellite array eLISA.
Gao, Hongying; Deng, Shibing; Obach, R Scott
2015-12-01
An unbiased scanning methodology using ultra high-performance liquid chromatography coupled with high-resolution mass spectrometry was used to bank data and plasma samples for comparing the data generated at different dates. This method was applied to bank the data generated earlier in animal samples and then to compare the exposure to metabolites in animal versus human for safety assessment. With neither authentic standards nor prior knowledge of the identities and structures of metabolites, full scans for precursor ions and all ion fragments (AIF) were employed with a generic gradient LC method to analyze plasma samples at positive and negative polarity, respectively. In a total of 22 tested drugs and metabolites, 21 analytes were detected using this unbiased scanning method except that naproxen was not detected due to low sensitivity at negative polarity and interference at positive polarity; and 4'- or 5-hydroxy diclofenac was not separated by a generic UPLC method. Statistical analysis of the peak area ratios of the analytes versus the internal standard in five repetitive analyses over approximately 1 year demonstrated that the analysis variation was significantly different from sample instability. The confidence limits for comparing the exposure using peak area ratio of metabolites in animal plasma versus human plasma measured over approximately 1 year apart were comparable to the analysis undertaken side by side on the same days. These statistical analysis results showed it was feasible to compare data generated at different dates with neither authentic standards nor prior knowledge of the analytes.
Traum, Avram Z; Wells, Meghan P; Aivado, Manuel; Libermann, Towia A; Ramoni, Marco F; Schachter, Asher D
2006-03-01
Proteomic profiling with SELDI-TOF MS has facilitated the discovery of disease-specific protein profiles. However, multicenter studies are often hindered by the logistics required for prompt deep-freezing of samples in liquid nitrogen or dry ice within the clinic setting prior to shipping. We report high concordance between MS profiles within sets of quadruplicate split urine and serum samples deep-frozen at 0, 2, 6, and 24 h after sample collection. Gage R&R results confirm that deep-freezing times are not a statistically significant source of SELDI-TOF MS variability for either blood or urine.
Voids and constraints on nonlinear clustering of galaxies
NASA Technical Reports Server (NTRS)
Vogeley, Michael S.; Geller, Margaret J.; Park, Changbom; Huchra, John P.
1994-01-01
Void statistics of the galaxy distribution in the Center for Astrophysics Redshift Survey provide strong constraints on galaxy clustering in the nonlinear regime, i.e., on scales R equal to or less than 10/h Mpc. Computation of high-order moments of the galaxy distribution requires a sample that (1) densely traces the large-scale structure and (2) covers sufficient volume to obtain good statistics. The CfA redshift survey densely samples structure on scales equal to or less than 10/h Mpc and has sufficient depth and angular coverage to approach a fair sample on these scales. In the nonlinear regime, the void probability function (VPF) for CfA samples exhibits apparent agreement with hierarchical scaling (such scaling implies that the N-point correlation functions for N greater than 2 depend only on pairwise products of the two-point function xi(r)) However, simulations of cosmological models show that this scaling in redshift space does not necessarily imply such scaling in real space, even in the nonlinear regime; peculiar velocities cause distortions which can yield erroneous agreement with hierarchical scaling. The underdensity probability measures the frequency of 'voids' with density rho less than 0.2 -/rho. This statistic reveals a paucity of very bright galaxies (L greater than L asterisk) in the 'voids.' Underdensities are equal to or greater than 2 sigma more frequent in bright galaxy samples than in samples that include fainter galaxies. Comparison of void statistics of CfA samples with simulations of a range of cosmological models favors models with Gaussian primordial fluctuations and Cold Dark Matter (CDM)-like initial power spectra. Biased models tend to produce voids that are too empty. We also compare these data with three specific models of the Cold Dark Matter cosmogony: an unbiased, open universe CDM model (omega = 0.4, h = 0.5) provides a good match to the VPF of the CfA samples. Biasing of the galaxy distribution in the 'standard' CDM model (omega = 1, b = 1.5; see below for definitions) and nonzero cosmological constant CDM model (omega = 0.4, h = 0.6 lambda(sub 0) = 0.6, b = 1.3) produce voids that are too empty. All three simulations match the observed VPF and underdensity probability for samples of very bright (M less than M asterisk = -19.2) galaxies, but produce voids that are too empty when compared with samples that include fainter galaxies.
Chen, Shu-Ling; Hsieh, Pao-Chun; Chou, Chia-Hui; Tzeng, Ya-Ling
2014-11-25
Many Taiwanese women (43.8%) did not participate in regular cervical screening in 2011. An alternative to cervical screening, self-sampling for human papillomavirus (HPV), has been available at no cost under Taiwan's National Health Insurance since 2010, but the extent and likelihood of HPV self-sampling were unknown. A cross-sectional study was performed to explore determinants of women's likelihood of HPV self-sampling. Data were collected by questionnaire from a convenience sample of 500 women attending hospital gynecologic clinics in central Taiwan from June to October 2012. Data were analyzed by descriptive statistics, chi-square test, and logistic regression. Of 500 respondents, 297 (59.4%) had heard of HPV; of these 297 women, 69 (23%) had self-sampled for HPV. Among the 297 women who had heard of HPV, 234 (78.8%) considered cost a priority for HPV self-sampling. Likelihood of HPV self-sampling was determined by previous Pap testing, high perceived risk of cervical cancer, willingness to self-sample for HPV, high HPV knowledge, and cost as a priority consideration. Outreach efforts to increase the acceptability of self-sampling for HPV testing rates should target women who have had a Pap test, perceive themselves at high risk for cervical cancer, are willing to self-sample for HPV, have a high level of HPV knowledge, and for whom the cost of self-sampling covered by health insurance is a priority.
Statistical Hypothesis Testing in Intraspecific Phylogeography: NCPA versus ABC
Templeton, Alan R.
2009-01-01
Nested clade phylogeographic analysis (NCPA) and approximate Bayesian computation (ABC) have been used to test phylogeographic hypotheses. Multilocus NCPA tests null hypotheses, whereas ABC discriminates among a finite set of alternatives. The interpretive criteria of NCPA are explicit and allow complex models to be built from simple components. The interpretive criteria of ABC are ad hoc and require the specification of a complete phylogeographic model. The conclusions from ABC are often influenced by implicit assumptions arising from the many parameters needed to specify a complex model. These complex models confound many assumptions so that biological interpretations are difficult. Sampling error is accounted for in NCPA, but ABC ignores important sources of sampling error that creates pseudo-statistical power. NCPA generates the full sampling distribution of its statistics, but ABC only yields local probabilities, which in turn make it impossible to distinguish between a good fitting model, a non-informative model, and an over-determined model. Both NCPA and ABC use approximations, but convergences of the approximations used in NCPA are well defined whereas those in ABC are not. NCPA can analyze a large number of locations, but ABC cannot. Finally, the dimensionality of tested hypothesis is known in NCPA, but not for ABC. As a consequence, the “probabilities” generated by ABC are not true probabilities and are statistically non-interpretable. Accordingly, ABC should not be used for hypothesis testing, but simulation approaches are valuable when used in conjunction with NCPA or other methods that do not rely on highly parameterized models. PMID:19192182
Statistics provide guidance for indigenous organic carbon detection on Mars missions.
Sephton, Mark A; Carter, Jonathan N
2014-08-01
Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior probability of indigenous organic carbon should be resisted.
PROBABILITY SAMPLING AND POPULATION INFERENCE IN MONITORING PROGRAMS
A fundamental difference between probability sampling and conventional statistics is that "sampling" deals with real, tangible populations, whereas "conventional statistics" usually deals with hypothetical populations that have no real-world realization. he focus here is on real ...
Effect of environment and genotype on commercial maize hybrids using LC/MS-based metabolomics.
Baniasadi, Hamid; Vlahakis, Chris; Hazebroek, Jan; Zhong, Cathy; Asiago, Vincent
2014-02-12
We recently applied gas chromatography coupled to time-of-flight mass spectrometry (GC/TOF-MS) and multivariate statistical analysis to measure biological variation of many metabolites due to environment and genotype in forage and grain samples collected from 50 genetically diverse nongenetically modified (non-GM) DuPont Pioneer commercial maize hybrids grown at six North American locations. In the present study, the metabolome coverage was extended using a core subset of these grain and forage samples employing ultra high pressure liquid chromatography (uHPLC) mass spectrometry (LC/MS). A total of 286 and 857 metabolites were detected in grain and forage samples, respectively, using LC/MS. Multivariate statistical analysis was utilized to compare and correlate the metabolite profiles. Environment had a greater effect on the metabolome than genetic background. The results of this study support and extend previously published insights into the environmental and genetic associated perturbations to the metabolome that are not associated with transgenic modification.
Statistical distribution sampling
NASA Technical Reports Server (NTRS)
Johnson, E. S.
1975-01-01
Determining the distribution of statistics by sampling was investigated. Characteristic functions, the quadratic regression problem, and the differential equations for the characteristic functions are analyzed.
It's all relative: ranking the diversity of aquatic bacterial communities.
Shaw, Allison K; Halpern, Aaron L; Beeson, Karen; Tran, Bao; Venter, J Craig; Martiny, Jennifer B H
2008-09-01
The study of microbial diversity patterns is hampered by the enormous diversity of microbial communities and the lack of resources to sample them exhaustively. For many questions about richness and evenness, however, one only needs to know the relative order of diversity among samples rather than total diversity. We used 16S libraries from the Global Ocean Survey to investigate the ability of 10 diversity statistics (including rarefaction, non-parametric, parametric, curve extrapolation and diversity indices) to assess the relative diversity of six aquatic bacterial communities. Overall, we found that the statistics yielded remarkably similar rankings of the samples for a given sequence similarity cut-off. This correspondence, despite the different underlying assumptions of the statistics, suggests that diversity statistics are a useful tool for ranking samples of microbial diversity. In addition, sequence similarity cut-off influenced the diversity ranking of the samples, demonstrating that diversity statistics can also be used to detect differences in phylogenetic structure among microbial communities. Finally, a subsampling analysis suggests that further sequencing from these particular clone libraries would not have substantially changed the richness rankings of the samples.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Evolution of high-mass star-forming regions .
NASA Astrophysics Data System (ADS)
Giannetti, A.; Leurini, S.; Wyrowski, F.; Urquhart, J.; König, C.; Csengeri, T.; Güsten, R.; Menten, K. M.
Observational identification of a coherent evolutionary sequence for high-mass star-forming regions is still missing. We use the progressive heating of the gas caused by the feedback of high-mass young stellar objects to prove the statistical validity of the most common schemes used to observationally define an evolutionary sequence for high-mass clumps, and identify which physical process dominates in the different phases. From the spectroscopic follow-ups carried out towards the TOP100 sample between 84 and 365 km s^-1 giga hertz, we selected several multiplets of CH3CN, CH3CCH, and CH3OH lines to derive the physical properties of the gas in the clumps along the evolutionary sequence. We demonstrate that the evolutionary sequence is statistically valid, and we define intervals in L/M separating the compression, collapse and accretion, and disruption phases. The first hot cores and ZAMS stars appear at L/M≈10usk {L_ȯ}msun-1
Qualitative Meta-Analysis on the Hospital Task: Implications for Research
ERIC Educational Resources Information Center
Noll, Jennifer; Sharma, Sashi
2014-01-01
The "law of large numbers" indicates that as sample size increases, sample statistics become less variable and more closely estimate their corresponding population parameters. Different research studies investigating how people consider sample size when evaluating the reliability of a sample statistic have found a wide range of…
USDA-ARS?s Scientific Manuscript database
Statistically robust sampling strategies form an integral component of grain storage and handling activities throughout the world. Developing sampling strategies to target biological pests such as insects in stored grain is inherently difficult due to species biology and behavioral characteristics. ...
Unsupervised universal steganalyzer for high-dimensional steganalytic features
NASA Astrophysics Data System (ADS)
Hou, Xiaodan; Zhang, Tao
2016-11-01
The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.
Standard deviation and standard error of the mean.
Lee, Dong Kyu; In, Junyong; Lee, Sangseok
2015-06-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean
In, Junyong; Lee, Sangseok
2015-01-01
In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Smith, Paul F.
2017-01-01
Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855
Smith, Paul F
2017-01-01
Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.
Sub-sampling genetic data to estimate black bear population size: A case study
Tredick, C.A.; Vaughan, M.R.; Stauffer, D.F.; Simek, S.L.; Eason, T.
2007-01-01
Costs for genetic analysis of hair samples collected for individual identification of bears average approximately US$50 [2004] per sample. This can easily exceed budgetary allowances for large-scale studies or studies of high-density bear populations. We used 2 genetic datasets from 2 areas in the southeastern United States to explore how reducing costs of analysis by sub-sampling affected precision and accuracy of resulting population estimates. We used several sub-sampling scenarios to create subsets of the full datasets and compared summary statistics, population estimates, and precision of estimates generated from these subsets to estimates generated from the complete datasets. Our results suggested that bias and precision of estimates improved as the proportion of total samples used increased, and heterogeneity models (e.g., Mh[CHAO]) were more robust to reduced sample sizes than other models (e.g., behavior models). We recommend that only high-quality samples (>5 hair follicles) be used when budgets are constrained, and efforts should be made to maximize capture and recapture rates in the field.
Maximum entropy PDF projection: A review
NASA Astrophysics Data System (ADS)
Baggenstoss, Paul M.
2017-06-01
We review maximum entropy (MaxEnt) PDF projection, a method with wide potential applications in statistical inference. The method constructs a sampling distribution for a high-dimensional vector x based on knowing the sampling distribution p(z) of a lower-dimensional feature z = T (x). Under mild conditions, the distribution p(x) having highest possible entropy among all distributions consistent with p(z) may be readily found. Furthermore, the MaxEnt p(x) may be sampled, making the approach useful in Monte Carlo methods. We review the theorem and present a case study in model order selection and classification for handwritten character recognition.
Objective Diagnosis of Cervical Cancer by Tissue Protein Profile Analysis
NASA Astrophysics Data System (ADS)
Patil, Ajeetkumar; Bhat, Sujatha; Rai, Lavanya; Kartha, V. B.; Chidangil, Santhosh
2011-07-01
Protein profiles of homogenized normal cervical tissue samples from hysterectomy subjects and cancerous cervical tissues from biopsy samples collected from patients with different stages of cervical cancer were recorded using High Performance Liquid Chromatography coupled with Laser Induced Fluorescence (HPLC-LIF). The Protein profiles were subjected to Principle Component Analysis to derive statistically significant parameters. Diagnosis of sample types were carried out by matching three parameters—scores of factors, squared residuals, and Mahalanobis Distance. ROC and Youden's Index curves for calibration standards were used for objective estimation of the optimum threshold for decision making and performance.
Explorations in statistics: the log transformation.
Curran-Everett, Douglas
2018-06-01
Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
Multi-pulse multi-delay (MPMD) multiple access modulation for UWB
Dowla, Farid U.; Nekoogar, Faranak
2007-03-20
A new modulation scheme in UWB communications is introduced. This modulation technique utilizes multiple orthogonal transmitted-reference pulses for UWB channelization. The proposed UWB receiver samples the second order statistical function at both zero and non-zero lags and matches the samples to stored second order statistical functions, thus sampling and matching the shape of second order statistical functions rather than just the shape of the received pulses.
NASA Astrophysics Data System (ADS)
Edjah, Adwoba; Stenni, Barbara; Cozzi, Giulio; Turetta, Clara; Dreossi, Giuliano; Tetteh Akiti, Thomas; Yidana, Sandow
2017-04-01
Adwoba Kua- Manza Edjaha, Barbara Stennib,c,Giuliano Dreossib, Giulio Cozzic, Clara Turetta c,T.T Akitid ,Sandow Yidanae a,eDepartment of Earth Science, University of Ghana Legon, Ghana West Africa bDepartment of Enviromental Sciences, Informatics and Statistics, Ca Foscari University of Venice, Italy cInstitute for the Dynamics of Environmental Processes, CNR, Venice, Italy dDepartment of Nuclear Application and Techniques, Graduate School of Nuclear and Allied Sciences University of Ghana Legon This research is part of a PhD research work "Hydrogeological Assessment of the Lower Tano river basin for sustainable economic usage, Ghana, West - Africa". In this study, the researcher investigated surface water and groundwater quality in the Lower Tano river basin. This assessment was based on some selected sampling sites associated with mining activities, and the development of oil and gas. Statistical approach was applied to characterize the quality of surface water and groundwater. Also, water stable isotopes, which is a natural tracer of the hydrological cycle was used to investigate the origin of groundwater recharge in the basin. The study revealed that Pb and Ni values of the surface water and groundwater samples exceeded the WHO standards for drinking water. In addition, water quality index (WQI), based on physicochemical parameters(EC, TDS, pH) and major ions(Ca2+, Na+, Mg2+, HCO3-,NO3-, CL-, SO42-, K+) exhibited good quality water for 60% of the sampled surface water and groundwater. Other statistical techniques, such as Heavy metal pollution index (HPI), degree of contamination (Cd), and heavy metal evaluation index (HEI), based on trace element parameters in the water samples, reveal that 90% of the surface water and groundwater samples belong to high level of pollution. Principal component analysis (PCA) also suggests that the water quality in the basin is likely affected by rock - water interaction and anthropogenic activities (sea water intrusion). This was confirm by further statistical analysis (cluster analysis and correlation matrix) of the water quality parameters. Spatial distribution of water quality parameters, trace elements and the results obtained from the statistical analysis was determined by geographical information system (GIS). In addition, the isotopic analysis of the sampled surface water and groundwater revealed that most of the surface water and groundwater were of meteoric origin with little or no isotopic variations. It is expected that outcomes of this research will form a baseline for making appropriate decision on water quality management by decision makers in the Lower Tano river Basin. Keywords: Water stable isotopes, Trace elements, Multivariate statistics, Evaluation indices, Lower Tano river basin.
Chen, C; Xiang, J Y; Hu, W; Xie, Y B; Wang, T J; Cui, J W; Xu, Y; Liu, Z; Xiang, H; Xie, Q
2015-11-01
To screen and identify safe micro-organisms used during Douchi fermentation, and verify the feasibility of producing high-quality Douchi using these identified micro-organisms. PCR-denaturing gradient gel electrophoresis (DGGE) and automatic amino-acid analyser were used to investigate the microbial diversity and free amino acids (FAAs) content of 10 commercial Douchi samples. The correlations between microbial communities and FAAs were analysed by statistical analysis. Ten strains with significant positive correlation were identified. Then an experiment on Douchi fermentation by identified strains was carried out, and the nutritional composition in Douchi was analysed. Results showed that FAAs and relative content of isoflavone aglycones in verification Douchi samples were generally higher than those in commercial Douchi samples. Our study indicated that fungi, yeasts, Bacillus and lactic acid bacteria were the key players in Douchi fermentation, and with identified probiotic micro-organisms participating in fermentation, a higher quality Douchi product was produced. This is the first report to analyse and confirm the key micro-organisms during Douchi fermentation by statistical analysis. This work proves fermentation micro-organisms to be the key influencing factor of Douchi quality, and demonstrates the feasibility of fermenting Douchi using identified starter micro-organisms. © 2015 The Society for Applied Microbiology.
Labrique, Alain; Blynn, Emily; Ahmed, Saifuddin; Gibson, Dustin; Pariyo, George; Hyder, Adnan A
2017-05-05
In low- and middle-income countries (LMICs), historically, household surveys have been carried out by face-to-face interviews to collect survey data related to risk factors for noncommunicable diseases. The proliferation of mobile phone ownership and the access it provides in these countries offers a new opportunity to remotely conduct surveys with increased efficiency and reduced cost. However, the near-ubiquitous ownership of phones, high population mobility, and low cost require a re-examination of statistical recommendations for mobile phone surveys (MPS), especially when surveys are automated. As with landline surveys, random digit dialing remains the most appropriate approach to develop an ideal survey-sampling frame. Once the survey is complete, poststratification weights are generally applied to reduce estimate bias and to adjust for selectivity due to mobile ownership. Since weights increase design effects and reduce sampling efficiency, we introduce the concept of automated active strata monitoring to improve representativeness of the sample distribution to that of the source population. Although some statistical challenges remain, MPS represent a promising emerging means for population-level data collection in LMICs. ©Alain Labrique, Emily Blynn, Saifuddin Ahmed, Dustin Gibson, George Pariyo, Adnan A Hyder. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 05.05.2017.
Seasonal trend of fog water chemical composition in the Po Valley.
Fuzzi, S; Facchini, M C; Orsi, G; Ferri, D
1992-01-01
Fog frequency in the Po Valley, Northern Italy, can be as high as 30% of the time in the fall-winter season. High pollutant concentrations have been measured in fog water samples collected in this area over the past few years. The combined effects of high fog occurrence and high pollutant loading of the fog droplets can determine, in this area, appreciable chemical deposition rates. An automated station for fog water collection was developed, and deployed at the field station of S. Pietro Capofiume, in the eastern part of the Po Valley for an extended period: from the beginning of November 1989 to the end of April 1990. Time-resolved sampling of fog droplets was carried out during all fog events occurring in this period, and chemical analyses were performed on the collected samples. Statistical information on fog occurrence and fog water chemical composition is reported in this paper, and a tentative seasonal deposition budget is calculated for H+, NH4+, NO3- and SO4(2-) ions. The problems connected with fog droplet sampling in sub-freezing conditions are also addressed in the paper.
2015-08-01
the nine questions. The Statistical Package for the Social Sciences ( SPSS ) [11] was used to conduct statistical analysis on the sample. Two types...constructs. SPSS was again used to conduct statistical analysis on the sample. This time factor analysis was conducted. Factor analysis attempts to...Business Research Methods and Statistics using SPSS . P432. 11 IBM SPSS Statistics . (2012) 12 Burns, R.B., Burns, R.A. (2008) ‘Business Research
Statistical Methods in Ai: Rare Event Learning Using Associative Rules and Higher-Order Statistics
NASA Astrophysics Data System (ADS)
Iyer, V.; Shetty, S.; Iyengar, S. S.
2015-07-01
Rare event learning has not been actively researched since lately due to the unavailability of algorithms which deal with big samples. The research addresses spatio-temporal streams from multi-resolution sensors to find actionable items from a perspective of real-time algorithms. This computing framework is independent of the number of input samples, application domain, labelled or label-less streams. A sampling overlap algorithm such as Brooks-Iyengar is used for dealing with noisy sensor streams. We extend the existing noise pre-processing algorithms using Data-Cleaning trees. Pre-processing using ensemble of trees using bagging and multi-target regression showed robustness to random noise and missing data. As spatio-temporal streams are highly statistically correlated, we prove that a temporal window based sampling from sensor data streams converges after n samples using Hoeffding bounds. Which can be used for fast prediction of new samples in real-time. The Data-cleaning tree model uses a nonparametric node splitting technique, which can be learned in an iterative way which scales linearly in memory consumption for any size input stream. The improved task based ensemble extraction is compared with non-linear computation models using various SVM kernels for speed and accuracy. We show using empirical datasets the explicit rule learning computation is linear in time and is only dependent on the number of leafs present in the tree ensemble. The use of unpruned trees (t) in our proposed ensemble always yields minimum number (m) of leafs keeping pre-processing computation to n × t log m compared to N2 for Gram Matrix. We also show that the task based feature induction yields higher Qualify of Data (QoD) in the feature space compared to kernel methods using Gram Matrix.
NASA Astrophysics Data System (ADS)
Caruso, S.; Günther-Leopold, I.; Murphy, M. F.; Jatuff, F.; Chawla, R.
2008-05-01
Non-destructive and destructive methods have been compared to validate their corresponding assessed accuracies in the measurement of 134Cs/137Cs and 154Eu/137Cs isotopic concentration ratios in four spent UO2 fuel samples with very high (52 and 71 GWd/t) and ultra-high (91 and 126 GWd/t) burnup values, and about 10 (in the first three samples) and 4 years (in the latter sample) cooling time. The non-destructive technique tested was high-resolution gamma spectrometry using a high-purity germanium detector (HPGe) and a special tomographic station for the handling of highly radioactive 400 mm spent fuel segments that included a tungsten collimator, lead filter (to enhance the signal to Compton background ratio and reduce the dead time) and paraffin wax (to reduce neutron damage). The non-destructive determination of these isotopic concentration ratios has been particularly challenging for these segments because of the need to properly derive non-Gaussian gamma-peak areas and subtract the background from perturbing capture gammas produced by the intrinsic high-intensity neutron emissions from the spent fuel. Additionally, the activity distribution within each pin was determined tomographically to correct appropriately for self-attenuation and geometrical effects. The ratios obtained non-destructively showed a 1σ statistical error in the range 1.9-2.9%. The destructive technique used was a high-performance liquid chromatographic separation system, combined online to a multicollector inductively coupled plasma mass spectrometer (HPLC-MC-ICP-MS), for the analysis of dissolved fuel solutions. During the mass spectrometric analyses, special care was taken in the optimisation of the chromatographic separation for Eu and the interfering element Gd, as also in the mathematical correction of the 154Gd background from the 154Eu signal. The ratios obtained destructively are considerably more precise (1σ statistical error in the range 0.4-0.8% for most of the samples, but up to 2.8% for one sample). The HPGe gamma spectrometry can achieve a high degree of accuracy (agreement with HPLC-MC-ICP-MS within a few percent), only by virtue of the optimised setup, and the refined measurement strategy and data treatment employed.
Validation of a novel saliva-based ELISA test for diagnosing tapeworm burden in horses.
Lightbody, Kirsty L; Davis, Paul J; Austin, Corrine J
2016-06-01
Tapeworm infections pose a significant threat to equine health as they are associated with clinical cases of colic. Diagnosis of tapeworm burden using fecal egg counts (FECs) is unreliable, and, although a commercial serologic ELISA for anti-tapeworm antibodies is available, it requires a veterinarian to collect the blood sample. A reliable diagnostic test using an owner-accessible sample such as saliva could provide a cost-effective alternative for tapeworm testing in horses, and allow targeted deworming strategies. The purpose of the study was to statistically validate a saliva tapeworm ELISA test and compare to a tapeworm-specific IgG(T) serologic ELISA. Serum samples (139) and matched saliva samples (104) were collected from horses at a UK abattoir. The ileocecal junction and cecum were visually examined for tapeworms and any present were counted. Samples were analyzed using a serologic ELISA and the saliva tapeworm test. The test results were compared to tapeworm numbers and the various data sets were statistically analyzed. Saliva scores had strong positive correlations with both infection intensity (0.74) and serologic results (Spearman's rank coefficients; 0.74 and 0.86, respectively). The saliva tapeworm test was capable of identifying the presence of one or more tapeworms with 83% sensitivity and 85% specificity. Importantly, no high-burden (more than 20 tapeworms) horses were misdiagnosed. The saliva tapeworm test has statistical accuracy for detecting tapeworm burdens in horses with 83% sensitivity and 85% specificity, similar to those of the serologic ELISA (85% and 78%, respectively). © 2016 American Society for Veterinary Clinical Pathology.
Clinical competence of Guatemalan and Mexican physicians for family dysfunction management.
Cabrera-Pivaral, Carlos Enrique; Orozco-Valerio, María de Jesús; Celis-de la Rosa, Alfredo; Covarrubias-Bermúdez, María de Los Ángeles; Zavala-González, Marco Antonio
2017-01-01
To evaluate the clinical competence of Mexican and Guatemalan physicians to management the family dysfunction. Cross comparative study in four care units first in Guadalajara, Mexico, and four in Guatemala, Guatemala, based on a purposeful sampling, involving 117 and 100 physicians, respectively. Clinical competence evaluated by validated instrument integrated for 187 items. Non-parametric descriptive and inferential statistical analysis was performed. The percentage of Mexican physicians with high clinical competence was 13.7%, medium 53%, low 24.8% and defined by random 8.5%. For the Guatemalan physicians'14% was high, average 63%, and 23% defined by random. There were no statistically significant differences between healthcare country units, but between the medium of Mexicans (0.55) and Guatemalans (0.55) (p = 0.02). The proportion of the high clinical competency of Mexican physicians' was as Guatemalans.
EVALUATION OF FACTORS IN THE ELUTION OF HYDROCORTISONE FROM PAPER CHROMATOGRAMS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ganis, F.M.; Hendrickson, M.W.; Giunta, P.D.
An assessment was made of a number of variable factors which affect the recovery of hydrocortisone from eluted filter paper chromatographic fractions. Factors tested included time of elution, sample concentration, rinsing of eluting fractions and pre-washing of the filter paper. It was noted that a 50 mu g sample could be quantitatively recovered after a 15-minute elution time from a pre-washed filter paper fraction. The results were subjected to a statistical analysis and were found to be highly significant. (auth)
Anthropometrics related to the performance of a sample of male swimmers.
Perciavalle, Valentina; Di Corrado, Donatella; Scuto, Claudia; Perciavalle, Vincenzo; Coco, Marinella
2014-06-01
The main purpose of the present investigation of 21 elite male swimmers was to assess whether the Ape Index (the ratio between the individual's arm span and height) and/or the second-to-fourth digit length ratio (2D:4D), i.e., the ratio between the length of the second and the fourth fingers of the right hand, are associated with the performance of high-level swimmers, when mood and/or executive function are covaried. The results showed no statistically significant correlation between the Ape Index and 2D:4D ratio, performance, executive function, or mood. In contrast, statistically significant correlations were found between 2D:4D ratio and performance, executive function, and mood. Regressions indicated that 2D:4D ratio and not Ape Index is related to the performances of a sample of male swimmers.
Statistical Analysis Techniques for Small Sample Sizes
NASA Technical Reports Server (NTRS)
Navard, S. E.
1984-01-01
The small sample sizes problem which is encountered when dealing with analysis of space-flight data is examined. Because of such a amount of data available, careful analyses are essential to extract the maximum amount of information with acceptable accuracy. Statistical analysis of small samples is described. The background material necessary for understanding statistical hypothesis testing is outlined and the various tests which can be done on small samples are explained. Emphasis is on the underlying assumptions of each test and on considerations needed to choose the most appropriate test for a given type of analysis.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
A combination of routine blood analytes predicts fitness decrement in elderly endurance athletes.
Haslacher, Helmuth; Ratzinger, Franz; Perkmann, Thomas; Batmyagmar, Delgerdalai; Nistler, Sonja; Scherzer, Thomas M; Ponocny-Seliger, Elisabeth; Pilger, Alexander; Gerner, Marlene; Scheichenberger, Vanessa; Kundi, Michael; Endler, Georg; Wagner, Oswald F; Winker, Robert
2017-01-01
Endurance sports are enjoying greater popularity, particularly among new target groups such as the elderly. Predictors of future physical capacities providing a basis for training adaptations are in high demand. We therefore aimed to estimate the future physical performance of elderly marathoners (runners/bicyclists) using a set of easily accessible standard laboratory parameters. To this end, 47 elderly marathon athletes underwent physical examinations including bicycle ergometry and a blood draw at baseline and after a three-year follow-up period. In order to compile a statistical model containing baseline laboratory results allowing prediction of follow-up ergometry performance, the cohort was subgrouped into a model training (n = 25) and a test sample (n = 22). The model containing significant predictors in univariate analysis (alanine aminotransferase, urea, folic acid, myeloperoxidase and total cholesterol) presented with high statistical significance and excellent goodness of fit (R2 = 0.789, ROC-AUC = 0.951±0.050) in the model training sample and was validated in the test sample (ROC-AUC = 0.786±0.098). Our results suggest that standard laboratory parameters could be particularly useful for predicting future physical capacity in elderly marathoners. It hence merits further research whether these conclusions can be translated to other disciplines or age groups.
A combination of routine blood analytes predicts fitness decrement in elderly endurance athletes
Ratzinger, Franz; Perkmann, Thomas; Batmyagmar, Delgerdalai; Nistler, Sonja; Scherzer, Thomas M.; Ponocny-Seliger, Elisabeth; Pilger, Alexander; Gerner, Marlene; Scheichenberger, Vanessa; Kundi, Michael; Endler, Georg; Wagner, Oswald F.; Winker, Robert
2017-01-01
Endurance sports are enjoying greater popularity, particularly among new target groups such as the elderly. Predictors of future physical capacities providing a basis for training adaptations are in high demand. We therefore aimed to estimate the future physical performance of elderly marathoners (runners/bicyclists) using a set of easily accessible standard laboratory parameters. To this end, 47 elderly marathon athletes underwent physical examinations including bicycle ergometry and a blood draw at baseline and after a three-year follow-up period. In order to compile a statistical model containing baseline laboratory results allowing prediction of follow-up ergometry performance, the cohort was subgrouped into a model training (n = 25) and a test sample (n = 22). The model containing significant predictors in univariate analysis (alanine aminotransferase, urea, folic acid, myeloperoxidase and total cholesterol) presented with high statistical significance and excellent goodness of fit (R2 = 0.789, ROC-AUC = 0.951±0.050) in the model training sample and was validated in the test sample (ROC-AUC = 0.786±0.098). Our results suggest that standard laboratory parameters could be particularly useful for predicting future physical capacity in elderly marathoners. It hence merits further research whether these conclusions can be translated to other disciplines or age groups. PMID:28475643
[Statistical prediction methods in violence risk assessment and its application].
Liu, Yuan-Yuan; Hu, Jun-Mei; Yang, Min; Li, Xiao-Song
2013-06-01
It is an urgent global problem how to improve the violence risk assessment. As a necessary part of risk assessment, statistical methods have remarkable impacts and effects. In this study, the predicted methods in violence risk assessment from the point of statistics are reviewed. The application of Logistic regression as the sample of multivariate statistical model, decision tree model as the sample of data mining technique, and neural networks model as the sample of artificial intelligence technology are all reviewed. This study provides data in order to contribute the further research of violence risk assessment.
Ren, Y T; Jia, Q Z; Zhang, X D; Guo, B S; Zhang, F F; Cheng, X T; Wang, Y P
2018-05-10
Objective: To investigate the effects of high iodine intake on thyroid function in pregnant and lactating women. Methods: A cross sectional epidemiological study was conducted among 130 pregnant women and 220 lactating women aged 19-40 years in areas with high environment iodine level (>300 μg/L) or proper environment iodine level (50-100 μg/L) in Shanxi in 2014. The general information, urine samples and blood samples of the women surveyed and water samples were collected. The water and urine iodine levels were detected with arsenic and cerium catalysis spectrophotometric method, the blood TSH level was detected with electrochemiluminescence immunoassay, and thyroid stimulating hormone (FT(4)), antithyroid peroxidase autoantibody (TPOAb) and anti-thyroglobulin antibodies (TGAb) were detected with chemiluminescence immunoassay. Results: The median urine iodine levels of the four groups were 221.9, 282.5, 814.1 and 818.6 μg/L, respectively. The median serum FT(4) of lactating women in high iodine area and proper iodine area were 12.96 and 13.22 pmol/L, and the median serum TSH was 2.45 and 2.17 mIU/L, respectively. The median serum FT(4) of pregnant women in high iodine area and proper iodine area were 14.66 and 16.16 pmol/L, and the median serum TSH was 2.13 and 1.82 mIU/L, respectively. The serum FT(4) levels were lower and the abnormal rates of serum TSH were higher in lactating women than in pregnant women in both high iodine area and proper iodine area, the difference was statistically significant (FT(4): Z =-6.677, -4.041, P <0.01; TSH: Z =8.797, 8.910, P <0.01). In high iodine area, the abnormal rate of serum FT(4) in lactating women was higher than that in pregnant women, the difference was statistically significant ( Z =7.338, P =0.007). The serum FT(4) level of lactating women in high iodine area was lower than that in proper iodine area, the difference was statistically significant ( Z =-4.687, P =0.000). In high iodine area, the median serum FT(4) in early pregnancy, mid-pregnancy and late pregnancy was 16.26, 14.22 and 14.80 pmol/L, respectively, and the median serum TSH was 1.74, 1.91 and 2.38 mIU/L, respectively. In high iodine area, the serum FT(4) level in early pregnancy was higher than that in mid-pregnancy and late pregnancy, and the serum TSH level was lower than that in mid-pregnancy and late pregnancy, the difference was statistically significant (FT(4): Z =-2.174, -2.238, P <0.05; TSH: Z =-2.985, -1.978, P <0.05). There were no significant differences in the positive rates of serum thyroid autoantibodies among the four groups of women and women in different periods of pregnancy ( P >0.05). The morbidity rates of subclinical hyperthyroidism in pregnant women and lactating women in high iodine area were obviously higher than those in proper iodine areas, the difference was statistically significant ( χ (2)=5.363, 5.007, P <0.05). Conclusions: Excessive iodine intake might increase the risk of subclinical hypothyroidism in pregnant women and lactating women. It is suggested to strengthen the iodine nutrition and thyroid function monitoring in women, pregnant women and lactating women in areas with high environmental iodine.
Chi-squared and C statistic minimization for low count per bin data. [sampling in X ray astronomy
NASA Technical Reports Server (NTRS)
Nousek, John A.; Shue, David R.
1989-01-01
Results are presented from a computer simulation comparing two statistical fitting techniques on data samples with large and small counts per bin; the results are then related specifically to X-ray astronomy. The Marquardt and Powell minimization techniques are compared by using both to minimize the chi-squared statistic. In addition, Cash's C statistic is applied, with Powell's method, and it is shown that the C statistic produces better fits in the low-count regime than chi-squared.
Ensembles of radial basis function networks for spectroscopic detection of cervical precancer
NASA Technical Reports Server (NTRS)
Tumer, K.; Ramanujam, N.; Ghosh, J.; Richards-Kortum, R.
1998-01-01
The mortality related to cervical cancer can be substantially reduced through early detection and treatment. However, current detection techniques, such as Pap smear and colposcopy, fail to achieve a concurrently high sensitivity and specificity. In vivo fluorescence spectroscopy is a technique which quickly, noninvasively and quantitatively probes the biochemical and morphological changes that occur in precancerous tissue. A multivariate statistical algorithm was used to extract clinically useful information from tissue spectra acquired from 361 cervical sites from 95 patients at 337-, 380-, and 460-nm excitation wavelengths. The multivariate statistical analysis was also employed to reduce the number of fluorescence excitation-emission wavelength pairs required to discriminate healthy tissue samples from precancerous tissue samples. The use of connectionist methods such as multilayered perceptrons, radial basis function (RBF) networks, and ensembles of such networks was investigated. RBF ensemble algorithms based on fluorescence spectra potentially provide automated and near real-time implementation of precancer detection in the hands of nonexperts. The results are more reliable, direct, and accurate than those achieved by either human experts or multivariate statistical algorithms.
Hybrid Gibbs Sampling and MCMC for CMB Analysis at Small Angular Scales
NASA Technical Reports Server (NTRS)
Jewell, Jeffrey B.; Eriksen, H. K.; Wandelt, B. D.; Gorski, K. M.; Huey, G.; O'Dwyer, I. J.; Dickinson, C.; Banday, A. J.; Lawrence, C. R.
2008-01-01
A) Gibbs Sampling has now been validated as an efficient, statistically exact, and practically useful method for "low-L" (as demonstrated on WMAP temperature polarization data). B) We are extending Gibbs sampling to directly propagate uncertainties in both foreground and instrument models to total uncertainty in cosmological parameters for the entire range of angular scales relevant for Planck. C) Made possible by inclusion of foreground model parameters in Gibbs sampling and hybrid MCMC and Gibbs sampling for the low signal to noise (high-L) regime. D) Future items to be included in the Bayesian framework include: 1) Integration with Hybrid Likelihood (or posterior) code for cosmological parameters; 2) Include other uncertainties in instrumental systematics? (I.e. beam uncertainties, noise estimation, calibration errors, other).
Lange, J H; Lange, P R; Reinhard, T K; Thomulka, K W
1996-08-01
Data were collected and analysed on airborne concentrations of asbestos generated by abatement of different asbestos-containing materials using various removal practices. Airborne concentrations of asbestos are dramatically variable among the types of asbestos-containing material being abated. Abatement practices evaluated in this study were removal of boiler/pipe insulation in a crawl space, ceiling tile, transite, floor tile/mastic with traditional methods, and mastic removal with a high-efficiency particulate air filter blast track (shot-blast) machine. In general, abatement of boiler and pipe insulation produces the highest airborne fibre levels, while abatement of floor tile and mastic was observed to be the lowest. A comparison of matched personal and area samples was not significantly different, and exhibited a good correlation using regression analysis. After adjusting data for outliers, personal sample fibre concentrations were greater than area sample fibre concentrations. Statistical analysis and sample distribution of airborne asbestos concentrations appear to be best represented in a logarithmic form. Area sample fibre concentrations were shown in this study to have a larger variability than personal measurements. Evaluation of outliers in fibre concentration data and the ability of these values to skew sample populations is presented. The use of personal and area samples in determining exposure, selecting personal protective equipment and its historical relevance as related to future abatement projects is discussed.
Schmitt, Christopher J.; Finger, Susan E.
1987-01-01
The influence of sample preparation on measured concentrations of eight elements in the edible tissues of two black basses (Centrarchidae), two catfishes (Ictaluridae), and the black redhorse,Moxostoma duquesnei (Catostomidae) from two rivers in southeastern Missouri contaminated by mining and related activities was investigated. Concentrations of Pb, Cd, Cu, Zn, Fe, Mn, Ba, and Ca were measured in two skinless, boneless samples of axial muscle from individual fish prepared in a clean room. One sample (normally-processed) was removed from each fish with a knife in a manner typically used by investigators to process fish for elemental analysis and presumedly representative of methods employed by anglers when preparing fish for home consumption. A second sample (clean-processed) was then prepared from each normally-processed sample by cutting away all surface material with acid-cleaned instruments under ultraclean conditions. The samples were analyzed as a single group by atomic absorption spectrophotometry. Of the elements studied, only Pb regularly exceeded current guidelines for elemental contaminants in foods. Concentrations were high in black redhorse from contaminated sites, regardless of preparation method; for the other fishes, whether or not Pb guidelines were exceeded depended on preparation technique. Except for Mn and Ca, concentrations of all elements measured were significantly lower in cleanthan in normally-processed tissue samples. Absolute differences in measured concentrations between clean- and normally-processed samples were most evident for Pb and Ba in bass and catfish and for Cd and Zn in redhorse. Regardless of preparation method, concentrations of Pb, Ca, Mn, and Ba in individual fish were closely correlated; samples that were high or low in one of these four elements were correspondingly high or low in the other three. In contrast, correlations between Zn, Fe, and Cd occurred only in normallyprocessed samples, suggesting that these correlations resulted from high concentrations on the surfaces of some samples. Concentrations of Pb and Ba in edible tissues of fish from contaminated sites were highly correlated with Ca content, which was probably determined largely by the amount of tissue other than muscle in the sample because fish muscle contains relatively little Ca. Accordingly, variation within a group of similar samples can be reduced by normalizing Pb and Ba concentrations to a standard Ca concentration. When sample size (N) is large, this can be accomplished statistically by analysis of covariance; whenN is small, molar ratios of [Pb]/[Ca] and [Ba]/[Ca] can be computed. Without such adjustments, unrealistically large Ns are required to yield statistically reliable estimates of Pb concentrations in edible tissues. Investigators should acknowledge that reported concentrations of certain elements are only estimates, and that regardless of the care exercised during the collection, preparation, and analysis of samples, results should be interpreted with the awareness that contamination from external sources may have occurred.
Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John
2018-03-07
DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
The Utility of Robust Means in Statistics
ERIC Educational Resources Information Center
Goodwyn, Fara
2012-01-01
Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…
The Role of the Sampling Distribution in Understanding Statistical Inference
ERIC Educational Resources Information Center
Lipson, Kay
2003-01-01
Many statistics educators believe that few students develop the level of conceptual understanding essential for them to apply correctly the statistical techniques at their disposal and to interpret their outcomes appropriately. It is also commonly believed that the sampling distribution plays an important role in developing this understanding.…
Illustrating Sampling Distribution of a Statistic: Minitab Revisited
ERIC Educational Resources Information Center
Johnson, H. Dean; Evans, Marc A.
2008-01-01
Understanding the concept of the sampling distribution of a statistic is essential for the understanding of inferential procedures. Unfortunately, this topic proves to be a stumbling block for students in introductory statistics classes. In efforts to aid students in their understanding of this concept, alternatives to a lecture-based mode of…
[Socioeconomic status and risky health behaviors in Croatian adult population].
Pilić, Leta; Dzakula, Aleksandar
2013-03-01
Based on the previous research, there is strong association between low socioeconomic status (SES) and high morbidity and mortality rates. Even though association between SES and risky health behaviors as the main factors influencing health has been investigated in Croatian population, some questions are yet to be answered. The aim of this study was to investigate the presence of unhealthy diet, physical inactivity, smoking and excessive drinking in low, middle, and high socioeconomic group of adult Croatian population included in the cohort study on regionalism of cardiovascular health risk behaviors. We also investigated the association between SES measured by income, education and occupation, as well as single SES indicators, and risky health behaviors. We analyzed data on 1227 adult men and women (aged 19 and older at baseline) with complete data on health behaviors, SES and chronic diseases at baseline (2003) and 5-year follow up. Respondents were classified as being healthy or chronically ill. SES categories were derived from answers to questions on monthly household income, occupation and education by using two-step cluster analysis algorithm. At baseline, for the whole sample as well as for healthy respondents, SES was statistically significantly associated with unhealthy diet (whole sample/healthy respondents: p = 0.001), physical inactivity (whole sample/healthy respondents p = 0.44/ p = 0.007), and smoking (whole sample/healthy respondents p < 0.001/p = 0.002). The proportion of respondents with unhealthy diet was greatest in the lowest social class, smokers in the middle and physically inactive in the high social class. During the follow up, smoking and physical inactivity remained statistically significantly associated with SES. In chronically ill respondents, only smoking was statistically significantly associated with SES, at baseline and follow up (p = 0.001/p = 0.002). The highest share of smokers was in the middle social class. Results of our study show that risky health behaviors are associated with SES and are divergently represented across socioeconomic groups of adult Croatian population. There is an obvious need for interventions targeting the specific socioeconomic group and behavior characteristic of that group.
STATISTICAL ANALYSIS OF TANK 18F FLOOR SAMPLE RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, S.
2010-09-02
Representative sampling has been completed for characterization of the residual material on the floor of Tank 18F as per the statistical sampling plan developed by Shine [1]. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL [2]. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » [3] to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL{sub 95%}) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 18F. The uncertainty is quantified in this report by an upper 95% confidence limit (UCL{sub 95%}) on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL{sub 95%} was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less
STATISTICAL ANALYSIS OF TANK 19F FLOOR SAMPLE RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harris, S.
2010-09-02
Representative sampling has been completed for characterization of the residual material on the floor of Tank 19F as per the statistical sampling plan developed by Harris and Shine. Samples from eight locations have been obtained from the tank floor and two of the samples were archived as a contingency. Six samples, referred to in this report as the current scrape samples, have been submitted to and analyzed by SRNL. This report contains the statistical analysis of the floor sample analytical results to determine if further data are needed to reduce uncertainty. Included are comparisons with the prior Mantis samples resultsmore » to determine if they can be pooled with the current scrape samples to estimate the upper 95% confidence limits (UCL95%) for concentration. Statistical analysis revealed that the Mantis and current scrape sample results are not compatible. Therefore, the Mantis sample results were not used to support the quantification of analytes in the residual material. Significant spatial variability among the current scrape sample results was not found. Constituent concentrations were similar between the North and South hemispheres as well as between the inner and outer regions of the tank floor. The current scrape sample results from all six samples fall within their 3-sigma limits. In view of the results from numerous statistical tests, the data were pooled from all six current scrape samples. As such, an adequate sample size was provided for quantification of the residual material on the floor of Tank 19F. The uncertainty is quantified in this report by an UCL95% on each analyte concentration. The uncertainty in analyte concentration was calculated as a function of the number of samples, the average, and the standard deviation of the analytical results. The UCL95% was based entirely on the six current scrape sample results (each averaged across three analytical determinations).« less
Brief Report: Comparability of DSM-IV and DSM-5 ASD Research Samples
ERIC Educational Resources Information Center
Mazefsky, C. A.; McPartland, J. C.; Gastgeb, H. Z.; Minshew, N. J.
2013-01-01
Diagnostic and Statistical Manual (DSM-5) criteria for ASD have been criticized for being too restrictive, especially for more cognitively-able individuals. It is unclear, however, if high-functioning individuals deemed eligible for research via standardized diagnostic assessments would meet DSM-5 criteria. This study investigated the impact of…
Traditionally, the EPA has monitored aquatic ecosystems using statistically rigorous sample designs and intensive field efforts which provide high quality datasets. But by their nature they leave many aquatic systems unsampled, follow a top down approach, have a long lag between ...
ERIC Educational Resources Information Center
White, Susan C.
2016-01-01
Since 1987, the Statistical Research Center at the American Institute of Physics has regularly conducted a survey of high school physics teachers. This September we're at it again. This fall, we will look for physics teachers at each of the 4,000+ schools with 12th grade in our nationally representative sample of public and private schools. We…
Students' Individual and Social Behaviors with Physical Education Teachers' Personality
ERIC Educational Resources Information Center
Arbabisarjou, Azizollah; Sourki, Mehdi Sadeghian; Bonjar, Seyedeh Elaham Hashemi
2016-01-01
The main objective for this survey is to assess the relationship between physical education teachers' personality and students' individual with social behaviors. The statistical population of the study was all the teachers of physical education working at high schools in the academic year 2012-2013. The sample consisted of sixty teachers that were…
2017-01-01
Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing—with its unique statistical properties—became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca. PMID:28817636
Kim, Dong-Hyeon; Kim, Hyunsook; Chon, Jung-Whan; Moon, Jin-San; Song, Kwang-Young; Seo, Kun-Ho
2013-07-15
Blood-yolk-polymyxin B-trimethoprim agar (BYPTA) was developed by the addition of egg yolk, laked horse blood, sodium pyruvate, polymyxin B, and trimethoprim, and compared with mannitol-yolk-polymyxin B agar (MYPA) for the isolation and enumeration of Bacillus cereus (B. cereus) in pure culture and various food samples. In pure culture, there was no statistical difference (p>0.05) between the recoverability and sensitivity of MYPA and BYPTA, whereas BYPTA exhibited higher specificity (p<0.05). To evaluate BYPTA agar with food samples, B. cereus was experimentally spiked into six types of foods, triangle kimbab, sandwich, misugaru, Saengsik, red pepper powder, and soybean paste. No statistical difference was observed in recoverability (p>0.05) between MYPA and BYPTA in all tested foods, whereas BYPTA exhibited higher selectivity than MYPA, especially in foods with high background microflora, such as Saengsik, red pepper powder, and soybean paste. The newly developed selective medium BYPTA could be a useful enumeration tool to assess the level of B. cereus in foods, particularly with high background microflora. Copyright © 2013 Elsevier B.V. All rights reserved.
Ramachandran, Parameswaran; Sánchez-Taltavull, Daniel; Perkins, Theodore J
2017-01-01
Co-expression networks have long been used as a tool for investigating the molecular circuitry governing biological systems. However, most algorithms for constructing co-expression networks were developed in the microarray era, before high-throughput sequencing-with its unique statistical properties-became the norm for expression measurement. Here we develop Bayesian Relevance Networks, an algorithm that uses Bayesian reasoning about expression levels to account for the differing levels of uncertainty in expression measurements between highly- and lowly-expressed entities, and between samples with different sequencing depths. It combines data from groups of samples (e.g., replicates) to estimate group expression levels and confidence ranges. It then computes uncertainty-moderated estimates of cross-group correlations between entities, and uses permutation testing to assess their statistical significance. Using large scale miRNA data from The Cancer Genome Atlas, we show that our Bayesian update of the classical Relevance Networks algorithm provides improved reproducibility in co-expression estimates and lower false discovery rates in the resulting co-expression networks. Software is available at www.perkinslab.ca.
Niu, Yiming; Wang, Jiayi; Zhang, Chi; Chen, Yiqiang
2017-04-15
The objective of this study was to develop a micro-plate based colorimetric assay for rapid and high-throughput detection of copper in animal feed. Copper ion in animal feed was extracted by trichloroacetic acid solution and reduced to cuprous ion by hydroxylamine. The cuprous ion can chelate with 2,2'-bicinchoninic acid to form a Cu-BCA complex which was detected with high sensitivity by micro-plate reader at 354nm. The whole assay procedure can be completed within 20min. To eliminate matrix interference, a statistical partitioning correction approach was proposed, which makes the detection of copper in complex samples possible. The limit of detection was 0.035μg/mL and the detection range was 0.1-10μg/mL of copper in buffer solution. Actual sample analysis indicated that this colorimetric assay produced results consistent with atomic absorption spectrometry analysis. These results demonstrated that the developed assay can be used for rapid determination of copper in animal feed. Copyright © 2016 Elsevier Ltd. All rights reserved.
Correlation of Thermally Induced Pores with Microstructural Features Using High Energy X-rays
NASA Astrophysics Data System (ADS)
Menasche, David B.; Shade, Paul A.; Lind, Jonathan; Li, Shiu Fai; Bernier, Joel V.; Kenesei, Peter; Schuren, Jay C.; Suter, Robert M.
2016-11-01
Combined application of a near-field High Energy Diffraction Microscopy measurement of crystal lattice orientation fields and a tomographic measurement of pore distributions in a sintered nickel-based superalloy sample allows pore locations to be correlated with microstructural features. Measurements were carried out at the Advanced Photon Source beamline 1-ID using an X-ray energy of 65 keV for each of the measurement modes. The nickel superalloy sample was prepared in such a way as to generate significant thermally induced porosity. A three-dimensionally resolved orientation map is directly overlaid with the tomographically determined pore map through a careful registration procedure. The data are shown to reliably reproduce the expected correlations between specific microstructural features (triple lines and quadruple nodes) and pore positions. With the statistics afforded by the 3D data set, we conclude that within statistical limits, pore formation does not depend on the relative orientations of the grains. The experimental procedures and analysis tools illustrated are being applied to a variety of materials problems in which local heterogeneities can affect materials properties.
Xu, Chao; Fang, Jian; Shen, Hui; Wang, Yu-Ping; Deng, Hong-Wen
2018-01-25
Extreme phenotype sampling (EPS) is a broadly-used design to identify candidate genetic factors contributing to the variation of quantitative traits. By enriching the signals in extreme phenotypic samples, EPS can boost the association power compared to random sampling. Most existing statistical methods for EPS examine the genetic factors individually, despite many quantitative traits have multiple genetic factors underlying their variation. It is desirable to model the joint effects of genetic factors, which may increase the power and identify novel quantitative trait loci under EPS. The joint analysis of genetic data in high-dimensional situations requires specialized techniques, e.g., the least absolute shrinkage and selection operator (LASSO). Although there are extensive research and application related to LASSO, the statistical inference and testing for the sparse model under EPS remain unknown. We propose a novel sparse model (EPS-LASSO) with hypothesis test for high-dimensional regression under EPS based on a decorrelated score function. The comprehensive simulation shows EPS-LASSO outperforms existing methods with stable type I error and FDR control. EPS-LASSO can provide a consistent power for both low- and high-dimensional situations compared with the other methods dealing with high-dimensional situations. The power of EPS-LASSO is close to other low-dimensional methods when the causal effect sizes are small and is superior when the effects are large. Applying EPS-LASSO to a transcriptome-wide gene expression study for obesity reveals 10 significant body mass index associated genes. Our results indicate that EPS-LASSO is an effective method for EPS data analysis, which can account for correlated predictors. The source code is available at https://github.com/xu1912/EPSLASSO. hdeng2@tulane.edu. Supplementary data are available at Bioinformatics online. © The Author (2018). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Sterling, D A; Lewis, R D; Luke, D A; Shadel, B N
2000-06-01
Dust wipe samples collected in the field were tested by nondestructive X-ray fluorescence (XRF) followed by laboratory analysis with flame atomic absorption spectrophotometry (FAAS). Data were analyzed for precision and accuracy of measurement. Replicate samples with the XRF show high precision with an intraclass correlation coefficient (ICC) of 0.97 (P<0.0001) and an overall coefficient of variation of 11.6%. Paired comparison indicates no statistical difference (P=0.272) between XRF and FAAS analysis. Paired samples are highly correlated with an R(2) ranging between 0.89 for samples that contain paint chips and 0.93 for samples that do not contain paint chips. The ICC for absolute agreement between XRF and laboratory results was 0.95 (P<0.0001). The relative error over the concentration range of 25 to 14,200 microgram Pb is -12% (95% CI, -18 to -5). The XRF appears to be an excellent method for rapid on-site evaluation of dust wipes for clearance and risk assessment purposes, although there are indications of some confounding when paint chips are present. Copyright 2000 Academic Press.
Errors in radial velocity variance from Doppler wind lidar
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, H.; Barthelmie, R. J.; Doubrawa, P.
A high-fidelity lidar turbulence measurement technique relies on accurate estimates of radial velocity variance that are subject to both systematic and random errors determined by the autocorrelation function of radial velocity, the sampling rate, and the sampling duration. Our paper quantifies the effect of the volumetric averaging in lidar radial velocity measurements on the autocorrelation function and the dependence of the systematic and random errors on the sampling duration, using both statistically simulated and observed data. For current-generation scanning lidars and sampling durations of about 30 min and longer, during which the stationarity assumption is valid for atmospheric flows, themore » systematic error is negligible but the random error exceeds about 10%.« less
Errors in radial velocity variance from Doppler wind lidar
Wang, H.; Barthelmie, R. J.; Doubrawa, P.; ...
2016-08-29
A high-fidelity lidar turbulence measurement technique relies on accurate estimates of radial velocity variance that are subject to both systematic and random errors determined by the autocorrelation function of radial velocity, the sampling rate, and the sampling duration. Our paper quantifies the effect of the volumetric averaging in lidar radial velocity measurements on the autocorrelation function and the dependence of the systematic and random errors on the sampling duration, using both statistically simulated and observed data. For current-generation scanning lidars and sampling durations of about 30 min and longer, during which the stationarity assumption is valid for atmospheric flows, themore » systematic error is negligible but the random error exceeds about 10%.« less
Factors contributing to academic achievement: a Bayesian structure equation modelling study
NASA Astrophysics Data System (ADS)
Payandeh Najafabadi, Amir T.; Omidi Najafabadi, Maryam; Farid-Rohani, Mohammad Reza
2013-06-01
In Iran, high school graduates enter university after taking a very difficult entrance exam called the Konkoor. Therefore, only the top-performing students are admitted by universities to continue their bachelor's education in statistics. Surprisingly, statistically, most of such students fall into the following categories: (1) do not succeed in their education despite their excellent performance on the Konkoor and in high school; (2) graduate with a grade point average (GPA) that is considerably lower than their high school GPA; (3) continue their master's education in majors other than statistics and (4) try to find jobs unrelated to statistics. This article employs the well-known and powerful statistical technique, the Bayesian structural equation modelling (SEM), to study the academic success of recent graduates who have studied statistics at Shahid Beheshti University in Iran. This research: (i) considered academic success as a latent variable, which was measured by GPA and other academic success (see below) of students in the target population; (ii) employed the Bayesian SEM, which works properly for small sample sizes and ordinal variables; (iii), which is taken from the literature, developed five main factors that affected academic success and (iv) considered several standard psychological tests and measured characteristics such as 'self-esteem' and 'anxiety'. We then study the impact of such factors on the academic success of the target population. Six factors that positively impact student academic success were identified in the following order of relative impact (from greatest to least): 'Teaching-Evaluation', 'Learner', 'Environment', 'Family', 'Curriculum' and 'Teaching Knowledge'. Particularly, influential variables within each factor have also been noted.
Croce, María V; Isla-Larrain, Marina; Rabassa, Martín E; Demichelis, Sandra; Colussi, Andrea G; Crespo, Marina; Lacunza, Ezequiel; Segal-Eiras, Amada
2007-01-01
An immunohistochemical analysis was employed to determine the expression of carbohydrate antigens associated to mucins in normal epithelia. Tissue samples were obtained as biopsies from normal breast (18), colon (35) and oral cavity mucosa (8). The following carbohydrate epitopes were studied: sialyl-Lewis x, Lewis x, Lewis y, Tn hapten, sialyl-Tn and Thomsen-Friedenreich antigen. Mucins were also studied employing antibodies against MUC1, MUC2, MUC4, MUC5AC, MUC6 and also normal colonic glycolipid. Statistical analysis was performed and Kendall correlations were obtained. Lewis x showed an apical pattern mainly at plasma membrane, although cytoplasmic staining was also found in most samples. TF, Tn and sTn haptens were detected in few specimens, while sLewis x was found in oral mucosa and breast tissue. Also, normal breast expressed MUC1 at a high percentage, whereas MUC4 was observed in a small number of samples. Colon specimens mainly expressed MUC2 and MUC1, while most oral mucosa samples expressed MUC4 and MUC1. A positive correlation between MUC1VNTR and TF epitope (r=0.396) was found in breast samples, while in colon specimens MUC2 and colonic glycolipid versus Lewis x were statistically significantly correlated (r=0.28 and r=0.29, respectively). As a conclusion, a defined carbohydrate epitope expression is not exclusive of normal tissue or a determined localization, and it is possible to assume that different glycoproteins and glycolipids may be carriers of carbohydrate antigens depending on the tissue localization considered.
Rate, Andrew W
2018-06-15
Urban environments are dynamic and highly heterogeneous, and multiple additions of potential contaminants are likely on timescales which are short relative to natural processes. The likely sources and location of soil or sediment contamination in urban environment should therefore be detectable using multielement geochemical composition combined with rigorously applied multivariate statistical techniques. Soil, wetland sediment, and street dust was sampled along intersecting transects in Robertson Park in metropolitan Perth, Western Australia. Samples were analysed for near-total concentrations of multiple elements (including Cd, Ce, Co, Cr, Cu, Fe, Gd, La, Mn, Nd, Ni, Pb, Y, and Zn), as well as pH, and electrical conductivity. Samples at some locations within Robertson Park had high concentrations of potentially toxic elements (Pb above Health Investigation Limits; As, Ba, Cu, Mn, Ni, Pb, V, and Zn above Ecological Investigation Limits). However, these concentrations carry low risk due to the main land use as recreational open space, the low proportion of samples exceeding guideline values, and a tendency for the highest concentrations to be located within the less accessible wetland basin. The different spatial distributions of different groups of contaminants was consistent with different inputs of contaminants related to changes in land use and technology over the history of the site. Multivariate statistical analyses reinforced the spatial information, with principal component analysis identifying geochemical associations of elements which were also spatially related. A multivariate linear discriminant model was able to discriminate samples into a-priori types, and could predict sample type with 84% accuracy based on multielement composition. The findings suggest substantial advantages of characterising a site using multielement and multivariate analyses, an approach which could benefit investigations of other sites of concern. Copyright © 2018 Elsevier B.V. All rights reserved.
Spatial Differentiation of Landscape Values in the Murray River Region of Victoria, Australia
NASA Astrophysics Data System (ADS)
Zhu, Xuan; Pfueller, Sharron; Whitelaw, Paul; Winter, Caroline
2010-05-01
This research advances the understanding of the location of perceived landscape values through a statistically based approach to spatial analysis of value densities. Survey data were obtained from a sample of people living in and using the Murray River region, Australia, where declining environmental quality prompted a reevaluation of its conservation status. When densities of 12 perceived landscape values were mapped using geographic information systems (GIS), valued places clustered along the entire river bank and in associated National/State Parks and reserves. While simple density mapping revealed high value densities in various locations, it did not indicate what density of a landscape value could be regarded as a statistically significant hotspot or distinguish whether overlapping areas of high density for different values indicate identical or adjacent locations. A spatial statistic Getis-Ord Gi* was used to indicate statistically significant spatial clusters of high value densities or “hotspots”. Of 251 hotspots, 40% were for single non-use values, primarily spiritual, therapeutic or intrinsic. Four hotspots had 11 landscape values. Two, lacking economic value, were located in ecologically important river red gum forests and two, lacking wilderness value, were near the major towns of Echuca-Moama and Albury-Wodonga. Hotspots for eight values showed statistically significant associations with another value. There were high associations between learning and heritage values while economic and biological diversity values showed moderate associations with several other direct and indirect use values. This approach may improve confidence in the interpretation of spatial analysis of landscape values by enhancing understanding of value relationships.
SparRec: An effective matrix completion framework of missing data imputation for GWAS
NASA Astrophysics Data System (ADS)
Jiang, Bo; Ma, Shiqian; Causey, Jason; Qiao, Linbo; Hardin, Matthew Price; Bitts, Ian; Johnson, Daniel; Zhang, Shuzhong; Huang, Xiuzhen
2016-10-01
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
Mohammed A. Kalkhan; Robin M. Reich; Raymond L. Czaplewski
1996-01-01
A Monte Carlo simulation was used to evaluate the statistical properties of measures of association and the Kappa statistic under double sampling with replacement. Three error matrices representing three levels of classification accuracy of Landsat TM Data consisting of four forest cover types in North Carolina. The overall accuracy of the five indices ranged from 0.35...
Chiavegatto Filho, Alexandre Dias Porto; Kawachi, Ichiro; Wang, Yuan Pang; Viana, Maria Carmen; Andrade, Laura Helena Silveira Guerra
2013-11-01
Test the original income inequality theory, by analysing its association with depression, anxiety and any mental disorders. We analysed a sample of 3542 individuals aged 18 years and older selected through a stratified, multistage area probability sample of households from the São Paulo Metropolitan Area. Mental disorder symptoms were assessed using the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) criteria. Bayesian multilevel logistic models were performed. Living in areas with medium and high-income inequality was statistically associated with increased risk of depression, relative to low-inequality areas (OR 1.76; 95% CI 1.21 to 2.55, and 1.53; 95% CI 1.07 to 2.19, respectively). The same was not true for anxiety (OR 1.25; 95% CI 0.90 to 1.73, and OR 1.07; 95% CI 0.79 to 1.46). In the case of any mental disorder, results were mixed. In general, our findings were consistent with the income inequality theory, that is, people living in places with higher income inequality had an overall higher odd of mental disorders, albeit not always statistically significant. The fact that depression, but not anxiety, was statistically significant could indicate a pathway by which inequality influences health.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Fangyan; Zhang, Song; Chung Wong, Pak
Effectively visualizing large graphs and capturing the statistical properties are two challenging tasks. To aid in these two tasks, many sampling approaches for graph simplification have been proposed, falling into three categories: node sampling, edge sampling, and traversal-based sampling. It is still unknown which approach is the best. We evaluate commonly used graph sampling methods through a combined visual and statistical comparison of graphs sampled at various rates. We conduct our evaluation on three graph models: random graphs, small-world graphs, and scale-free graphs. Initial results indicate that the effectiveness of a sampling method is dependent on the graph model, themore » size of the graph, and the desired statistical property. This benchmark study can be used as a guideline in choosing the appropriate method for a particular graph sampling task, and the results presented can be incorporated into graph visualization and analysis tools.« less
Statistical analyses to support guidelines for marine avian sampling. Final report
Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris
2012-01-01
Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.
EVALUATION AND COMPARISON OF URINARY METABOLIC BIOMARKERS OF EXPOSURE FOR THE JET FUEL JP-8
B’Hymer, Clayton; Krieg, Edward; Cheever, Kenneth L.; Toennis, Christine A.; Clark, John C.; Kesner, James S.; Gibson, Roger; Butler, Mary Ann
2015-01-01
A study of workers exposed to jet fuel propellant 8 (JP-8) was conducted at U.S. Air Force bases and included the evaluation of three biomarkers of exposure: S-benzylmercapturic acid (BMA), S-phenylmercapturic acid (PMA), and (2-methoxyethoxy)acetic acid (MEAA). Postshift urine specimens were collected from various personnel categorized as high (n = 98), moderate (n = 38) and low (n = 61) JP-8 exposure based on work activities. BMA and PMA urinary levels were determined by high-performance liquid chromatography–tandem mass spectrometry (HPLC-MS/MS), and MEAA urinary levels were determined by gas chromatography–mass spectrometry (GC-MS). The numbers of samples determined as positive for the presence of the BMA biomarker (above the test method’s limit of detection [LOD = 0.5 ng/ml]) were 96 (98.0%), 37 (97.4%), and 58 (95.1%) for the high, moderate, and low (control) exposure workgroup categories, respectively. The numbers of samples determined as positive for the presence of the PMA biomarker (LOD = 0.5 ng/ml) were 33 (33.7%), 9 (23.7%), and 12 (19.7%) for the high, moderate, and low exposure categories. The numbers of samples determined as positive for the presence of the MEAA biomarker (LOD = 0.1 μg/ml) were 92 (93.4%), 13 (34.2%), and 2 (3.3%) for the high, moderate, and low exposure categories. Statistical analysis of the mean levels of the analytes demonstrated MEAA to be the most accurate or appropriate biomarker for JP-8 exposure using urinary concentrations either adjusted or not adjusted for creatinine; mean levels of BMA and PMA were not statistically significant between workgroup categories after adjusting for creatinine. PMID:22712851
Evaluation and comparison of urinary metabolic biomarkers of exposure for the jet fuel JP-8.
B'Hymer, Clayton; Krieg, Edward; Cheever, Kenneth L; Toennis, Christine A; Clark, John C; Kesner, James S; Gibson, Roger; Butler, Mary Ann
2012-01-01
A study of workers exposed to jet fuel propellant 8 (JP-8) was conducted at U.S. Air Force bases and included the evaluation of three biomarkers of exposure: S-benzylmercapturic acid (BMA), S-phenylmercapturic acid (PMA), and (2-methoxyethoxy)acetic acid (MEAA). Postshift urine specimens were collected from various personnel categorized as high (n = 98), moderate (n = 38) and low (n = 61) JP-8 exposure based on work activities. BMA and PMA urinary levels were determined by high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS), and MEAA urinary levels were determined by gas chromatography-mass spectrometry (GC-MS). The numbers of samples determined as positive for the presence of the BMA biomarker (above the test method's limit of detection [LOD = 0.5 ng/ml]) were 96 (98.0%), 37 (97.4%), and 58 (95.1%) for the high, moderate, and low (control) exposure workgroup categories, respectively. The numbers of samples determined as positive for the presence of the PMA biomarker (LOD = 0.5 ng/ml) were 33 (33.7%), 9 (23.7%), and 12 (19.7%) for the high, moderate, and low exposure categories. The numbers of samples determined as positive for the presence of the MEAA biomarker (LOD = 0.1 μ g/ml) were 92 (93.4%), 13 (34.2%), and 2 (3.3%) for the high, moderate, and low exposure categories. Statistical analysis of the mean levels of the analytes demonstrated MEAA to be the most accurate or appropriate biomarker for JP-8 exposure using urinary concentrations either adjusted or not adjusted for creatinine; mean levels of BMA and PMA were not statistically significant between workgroup categories after adjusting for creatinine.
Order statistics applied to the most massive and most distant galaxy clusters
NASA Astrophysics Data System (ADS)
Waizmann, J.-C.; Ettori, S.; Bartelmann, M.
2013-06-01
In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.
ERIC Educational Resources Information Center
Hill, Jason G.
2011-01-01
This report examines the postsecondary majors and teaching certifications of public high school-level teachers of departmentalized classes in a selection of subject areas by using data from the 2007-08 Schools and Staffing Survey (SASS), a sample survey of elementary and secondary schools in the United States. SASS collects data on American…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1993 Peer Group Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
Comparative financial information derived from a national sample of 516 two-year colleges is presented in this report for fiscal year 1992-93, including statistics for the national sample and for six peer groups. The report's nine sections focus on: (1) introductory information about the study's background, objectives, and sample; the National…
Emotional Intelligence and Depressive Symptoms as Predictors of Happiness Among Adolescents.
Abdollahi, Abbas; Abu Talib, Mansor; Motalebi, Seyedeh Ameneh
2015-12-01
Given that happiness is an important construct to enable adolescents to cope better with difficulties and stress of life, it is necessary to advance our knowledge about the possible etiology of happiness in adolescents. The present study sought to investigate the relationships of emotional intelligence, depressive symptoms, and happiness in a sample of male students in Tehran, Iran. This cross-sectional study was conducted on a sample of high school students in Tehran in 2012. The participants comprised of 188 male students (aged 16 to 19 years old) selected by multi-stage cluster sampling method. For gathering the data, the students filled out assessing emotions scale, Beck depression inventory-II, and Oxford happiness inventory. Data analysis was carried out using descriptive and analytical statistics in statistical package for social sciences (SPSS) software. The findings showed that a significant positive association existed between high ability of emotional intelligence and happiness (P < 0.01). Conversely, the low ability of emotional intelligence was associated with unhappiness (P < 0.01), there was a positive association between non-depression symptoms and happiness (P < 0.05), and severe depressive symptoms were positively associated with unhappiness (P < 0.01). High ability of emotional intelligence (P < 0.01) and non-depression symptoms (P < 0.05) were the strongest predictors of happiness. These findings reinforced the importance of emotional intelligence as a facilitating factor for happiness in adolescences. In addition, the findings suggested that depression symptoms may be harmful for happiness in adolescents.
Image statistics underlying natural texture selectivity of neurons in macaque V4
Okazawa, Gouki; Tajima, Satohiro; Komatsu, Hidehiko
2015-01-01
Our daily visual experiences are inevitably linked to recognizing the rich variety of textures. However, how the brain encodes and differentiates a plethora of natural textures remains poorly understood. Here, we show that many neurons in macaque V4 selectively encode sparse combinations of higher-order image statistics to represent natural textures. We systematically explored neural selectivity in a high-dimensional texture space by combining texture synthesis and efficient-sampling techniques. This yielded parameterized models for individual texture-selective neurons. The models provided parsimonious but powerful predictors for each neuron’s preferred textures using a sparse combination of image statistics. As a whole population, the neuronal tuning was distributed in a way suitable for categorizing textures and quantitatively predicts human ability to discriminate textures. Together, we suggest that the collective representation of visual image statistics in V4 plays a key role in organizing the natural texture perception. PMID:25535362
Bayesian demography 250 years after Bayes
Bijak, Jakub; Bryant, John
2016-01-01
Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms. PMID:26902889
Nowell, Lisa H.; Ludtke, Amy S.; Mueller, David K.; Scott, Jonathon C.
2011-01-01
Considering all the information evaluated in this report, there were significant differences between pre-landfall and post-landfall samples for PAH concentrations in sediment. Pre-landfall and post-landfall samples did not differ significantly in concentrations or benchmark exceedances for most organics in water or trace elements in sediment. For trace elements in water, aquatic-life benchmarks were exceeded in almost 50 percent of samples, but the high and variable analytical reporting levels precluded statistical comparison of benchmark exceedances between sampling periods. Concentrations of several PAH compounds in sediment were significantly higher in post-landfall samples than pre-landfall samples, and five of seven sites with the largest differences in PAH concentrations also had diagnostic geochemical evidence of Deepwater Horizon Macondo-1 oil from Rosenbauer and others (2010).
Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping
2016-05-15
Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
A comprehensive review of arsenic levels in the semiconductor manufacturing industry.
Park, Donguk; Yang, Haengsun; Jeong, Jeeyeon; Ha, Kwonchul; Choi, Sangjun; Kim, Chinyon; Yoon, Chungsik; Park, Dooyong; Paek, Domyung
2010-11-01
This paper presents a summary of arsenic level statistics from air and wipe samples taken from studies conducted in fabrication operations. The main objectives of this study were not only to describe arsenic measurement data but also, through a literature review, to categorize fabrication workers in accordance with observed arsenic levels. All airborne arsenic measurements reported were included in the summary statistics for analysis of the measurement data. The arithmetic mean was estimated assuming a lognormal distribution from the geometric mean and the geometric standard deviation or the range. In addition, weighted arithmetic means (WAMs) were calculated based on the number of measurements reported for each mean. Analysis of variance (ANOVA) was employed to compare arsenic levels classified according to several categories such as the year, sampling type, location sampled, operation type, and cleaning technique. Nine papers were found reporting airborne arsenic measurement data from maintenance workers or maintenance areas in semiconductor chip-making plants. A total of 40 statistical summaries from seven articles were identified that represented a total of 423 airborne arsenic measurements. Arsenic exposure levels taken during normal operating activities in implantation operations (WAM = 1.6 μg m⁻³, no. of samples = 77, no. of statistical summaries = 2) were found to be lower than exposure levels of engineers who were involved in maintenance works (7.7 μg m⁻³, no. of samples = 181, no. of statistical summaries = 19). The highest level (WAM = 218.6 μg m⁻³) was associated with various maintenance works performed inside an ion implantation chamber. ANOVA revealed no significant differences in the WAM arsenic levels among the categorizations based on operation and sampling characteristics. Arsenic levels (56.4 μg m⁻³) recorded during maintenance works performed in dry conditions were found to be much higher than those from maintenance works in wet conditions (0.6 μg m⁻³). Arsenic levels from wipe samples in process areas after maintenance activities ranged from non-detectable to 146 μg cm⁻², indicating the potential for dispersion into the air and hence inhalation. We conclude that workers who are regularly or occasionally involved in maintenance work have higher potential for occupational exposure than other employees who are in charge of routine production work. In addition, fabrication workers can be classified into two groups based on the reviewed arsenic exposure levels: operators with potential for low levels of exposure and maintenance engineers with high levels of exposure. These classifications could be used as a basis for a qualitative ordinal ranking of exposure in an epidemiological study.
Städler, Thomas; Haubold, Bernhard; Merino, Carlos; Stephan, Wolfgang; Pfaffelhuber, Peter
2009-01-01
Using coalescent simulations, we study the impact of three different sampling schemes on patterns of neutral diversity in structured populations. Specifically, we are interested in two summary statistics based on the site frequency spectrum as a function of migration rate, demographic history of the entire substructured population (including timing and magnitude of specieswide expansions), and the sampling scheme. Using simulations implementing both finite-island and two-dimensional stepping-stone spatial structure, we demonstrate strong effects of the sampling scheme on Tajima's D (DT) and Fu and Li's D (DFL) statistics, particularly under specieswide (range) expansions. Pooled samples yield average DT and DFL values that are generally intermediate between those of local and scattered samples. Local samples (and to a lesser extent, pooled samples) are influenced by local, rapid coalescence events in the underlying coalescent process. These processes result in lower proportions of external branch lengths and hence lower proportions of singletons, explaining our finding that the sampling scheme affects DFL more than it does DT. Under specieswide expansion scenarios, these effects of spatial sampling may persist up to very high levels of gene flow (Nm > 25), implying that local samples cannot be regarded as being drawn from a panmictic population. Importantly, many data sets on humans, Drosophila, and plants contain signatures of specieswide expansions and effects of sampling scheme that are predicted by our simulation results. This suggests that validating the assumption of panmixia is crucial if robust demographic inferences are to be made from local or pooled samples. However, future studies should consider adopting a framework that explicitly accounts for the genealogical effects of population subdivision and empirical sampling schemes. PMID:19237689
High Dimensional Classification Using Features Annealed Independence Rules.
Fan, Jianqing; Fan, Yingying
2008-01-01
Classification using high-dimensional features arises frequently in many contemporary statistical studies such as tumor classification using microarray or other high-throughput data. The impact of dimensionality on classifications is largely poorly understood. In a seminal paper, Bickel and Levina (2004) show that the Fisher discriminant performs poorly due to diverging spectra and they propose to use the independence rule to overcome the problem. We first demonstrate that even for the independence classification rule, classification using all the features can be as bad as the random guessing due to noise accumulation in estimating population centroids in high-dimensional feature space. In fact, we demonstrate further that almost all linear discriminants can perform as bad as the random guessing. Thus, it is paramountly important to select a subset of important features for high-dimensional classification, resulting in Features Annealed Independence Rules (FAIR). The conditions under which all the important features can be selected by the two-sample t-statistic are established. The choice of the optimal number of features, or equivalently, the threshold value of the test statistics are proposed based on an upper bound of the classification error. Simulation studies and real data analysis support our theoretical results and demonstrate convincingly the advantage of our new classification procedure.
Statistical Thermodynamic Approach to Vibrational Solitary Waves in Acetanilide
NASA Astrophysics Data System (ADS)
Vasconcellos, Áurea R.; Mesquita, Marcus V.; Luzzi, Roberto
1998-03-01
We analyze the behavior of the macroscopic thermodynamic state of polymers, centering on acetanilide. The nonlinear equations of evolution for the populations and the statistically averaged field amplitudes of CO-stretching modes are derived. The existence of excitations of the solitary wave type is evidenced. The infrared spectrum is calculated and compared with the experimental data of Careri et al. [Phys. Rev. Lett. 51, 104 (1983)], resulting in a good agreement. We also consider the situation of a nonthermally highly excited sample, predicting the occurrence of a large increase in the lifetime of the solitary wave excitation.
ERIC Educational Resources Information Center
Beeman, Jennifer Leigh Sloan
2013-01-01
Research has found that students successfully complete an introductory course in statistics without fully comprehending the underlying theory or being able to exhibit statistical reasoning. This is particularly true for the understanding about the sampling distribution of the mean, a crucial concept for statistical inference. This study…
ERIC Educational Resources Information Center
Nitko, Anthony J.; Hsu, Tse-chi
Item analysis procedures appropriate for domain-referenced classroom testing are described. A conceptual framework within which item statistics can be considered and promising statistics in light of this framework are presented. The sampling fluctuations of the more promising item statistics for sample sizes comparable to the typical classroom…
Improving Statistics Education through Simulations: The Case of the Sampling Distribution.
ERIC Educational Resources Information Center
Earley, Mark A.
This paper presents a summary of action research investigating statistics students' understandings of the sampling distribution of the mean. With four sections of an introductory Statistics in Education course (n=98 students), a computer simulation activity (R. delMas, J. Garfield, and B. Chance, 1999) was implemented and evaluated to show…
Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power
Miciak, Jeremy; Taylor, W. Pat; Stuebing, Karla K.; Fletcher, Jack M.; Vaughn, Sharon
2016-01-01
An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%–155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%–71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power. PMID:28479943
Designing Intervention Studies: Selected Populations, Range Restrictions, and Statistical Power.
Miciak, Jeremy; Taylor, W Pat; Stuebing, Karla K; Fletcher, Jack M; Vaughn, Sharon
2016-01-01
An appropriate estimate of statistical power is critical for the design of intervention studies. Although the inclusion of a pretest covariate in the test of the primary outcome can increase statistical power, samples selected on the basis of pretest performance may demonstrate range restriction on the selection measure and other correlated measures. This can result in attenuated pretest-posttest correlations, reducing the variance explained by the pretest covariate. We investigated the implications of two potential range restriction scenarios: direct truncation on a selection measure and indirect range restriction on correlated measures. Empirical and simulated data indicated direct range restriction on the pretest covariate greatly reduced statistical power and necessitated sample size increases of 82%-155% (dependent on selection criteria) to achieve equivalent statistical power to parameters with unrestricted samples. However, measures demonstrating indirect range restriction required much smaller sample size increases (32%-71%) under equivalent scenarios. Additional analyses manipulated the correlations between measures and pretest-posttest correlations to guide planning experiments. Results highlight the need to differentiate between selection measures and potential covariates and to investigate range restriction as a factor impacting statistical power.
Verhagen, Simone J. W.; Simons, Claudia J. P.; van Zelst, Catherine; Delespaul, Philippe A. E. G.
2017-01-01
Background: Mental healthcare needs person-tailored interventions. Experience Sampling Method (ESM) can provide daily life monitoring of personal experiences. This study aims to operationalize and test a measure of momentary reward-related Quality of Life (rQoL). Intuitively, quality of life improves by spending more time on rewarding experiences. ESM clinical interventions can use this information to coach patients to find a realistic, optimal balance of positive experiences (maximize reward) in daily life. rQoL combines the frequency of engaging in a relevant context (a ‘behavior setting’) with concurrent (positive) affect. High rQoL occurs when the most frequent behavior settings are combined with positive affect or infrequent behavior settings co-occur with low positive affect. Methods: Resampling procedures (Monte Carlo experiments) were applied to assess the reliability of rQoL using various behavior setting definitions under different sampling circumstances, for real or virtual subjects with low-, average- and high contextual variability. Furthermore, resampling was used to assess whether rQoL is a distinct concept from positive affect. Virtual ESM beep datasets were extracted from 1,058 valid ESM observations for virtual and real subjects. Results: Behavior settings defined by Who-What contextual information were most informative. Simulations of at least 100 ESM observations are needed for reliable assessment. Virtual ESM beep datasets of a real subject can be defined by Who-What-Where behavior setting combinations. Large sample sizes are necessary for reliable rQoL assessments, except for subjects with low contextual variability. rQoL is distinct from positive affect. Conclusion: rQoL is a feasible concept. Monte Carlo experiments should be used to assess the reliable implementation of an ESM statistic. Future research in ESM should asses the behavior of summary statistics under different sampling situations. This exploration is especially relevant in clinical implementation, where often only small datasets are available. PMID:29163294
Verhagen, Simone J W; Simons, Claudia J P; van Zelst, Catherine; Delespaul, Philippe A E G
2017-01-01
Background: Mental healthcare needs person-tailored interventions. Experience Sampling Method (ESM) can provide daily life monitoring of personal experiences. This study aims to operationalize and test a measure of momentary reward-related Quality of Life (rQoL). Intuitively, quality of life improves by spending more time on rewarding experiences. ESM clinical interventions can use this information to coach patients to find a realistic, optimal balance of positive experiences (maximize reward) in daily life. rQoL combines the frequency of engaging in a relevant context (a 'behavior setting') with concurrent (positive) affect. High rQoL occurs when the most frequent behavior settings are combined with positive affect or infrequent behavior settings co-occur with low positive affect. Methods: Resampling procedures (Monte Carlo experiments) were applied to assess the reliability of rQoL using various behavior setting definitions under different sampling circumstances, for real or virtual subjects with low-, average- and high contextual variability. Furthermore, resampling was used to assess whether rQoL is a distinct concept from positive affect. Virtual ESM beep datasets were extracted from 1,058 valid ESM observations for virtual and real subjects. Results: Behavior settings defined by Who-What contextual information were most informative. Simulations of at least 100 ESM observations are needed for reliable assessment. Virtual ESM beep datasets of a real subject can be defined by Who-What-Where behavior setting combinations. Large sample sizes are necessary for reliable rQoL assessments, except for subjects with low contextual variability. rQoL is distinct from positive affect. Conclusion: rQoL is a feasible concept. Monte Carlo experiments should be used to assess the reliable implementation of an ESM statistic. Future research in ESM should asses the behavior of summary statistics under different sampling situations. This exploration is especially relevant in clinical implementation, where often only small datasets are available.
The bioavailability of manganese in welders in relation to its solubility in welding fumes.
Ellingsen, Dag G; Zibarev, Evgenij; Kusraeva, Zarina; Berlinger, Balazs; Chashchin, Maxim; Bast-Pettersen, Rita; Chashchin, Valery; Thomassen, Yngvar
2013-02-01
Blood and urine samples for determination of manganese (Mn) and iron (Fe) concentrations were collected in a cross-sectional study of 137 currently exposed welders, 137 referents and 34 former welders. Aerosol samples for measurements of personal air exposure to Mn and Fe were also collected. The aerosol samples were assessed for their solubility using a simulated lung lining fluid (Hatch solution). On average 13.8% of the total Mn mass (range 1-49%; N = 237) was soluble (Hatch sol), while only 1.4% (<0.1-10.0%; N = 237) of the total Fe mass was Hatch sol. The welders had statistically significantly higher geometric mean concentrations of Mn in whole blood (B-Mn 12.8 vs. 8.0 μg L (-1)), serum (S-Mn 1.04 vs. 0.77 μg L(-1)) and urine (U-Mn 0.36 vs. 0.07 μg g (-1) cr.) than the referents. Statistically significant univariate correlations were observed between exposure to Hatch sol Mn in the welding aerosol and B-Mn, S-Mn and U-Mn respectively. Pearson's correlation coefficient between mean Hatch sol Mn of two days preceding the collection of biological samples and U-Mn was 0.46 (p < 0.001). The duration of employment as a welder in years was also associated with B-Mn and S-Mn, but not with U-Mn. Statistically significantly higher U-Mn and B-Mn were observed in welders currently exposed to even less than 12 and 6 μg m (-3) Hatchsol Mn, respectively. When using the 95(th) percentile concentration among the referents as a cut-point, 70.0 and 64.5% of the most highly exposed welders exceeded this level with respect to B-Mn and U-Mn. The concentrations of B-Mn, S-Mn and U-Mn were all highly correlated in the welders, but not in the referents.
Tennant, M; Kruger, E
2014-01-01
In Australia, over the past 30 years, the prevalence of dental decay in children has reduced significantly, where today 60-70% of all 12-year-olds are caries free, and only 10% of children have more than two decayed teeth. However, many studies continue to report a small but significant subset of children suffering severe levels of decay. The present study applies Monte Carlo simulation to examine, at the national level, 12-year-old decayed, missing or filled teeth and shed light on both the statistical limitation of Australia's reporting to date as well as the problem of targeting high-risk children. A simulation for 273 000 Australian 12-year-old children found that moving from different levels of geographic clustering produced different statistical influences that drive different conclusions. At the high scale (ie state level) the gross averaging of the non-normally distributed disease burden masks the small subset of disease bearing children. At the much higher acuity of analysis (ie local government area) the risk of low numbers in the sample becomes a significant issue. The results clearly highlight the importance of care when examining the existing data, and, second, opportunities for far greater levels of targeting of services to children in need. The sustainability (and fairness) of universal coverage systems needs to be examined to ensure they remain highly targeted at disease burden, and not just focused on the children that are easy to reach (and suffer the least disease).
Baumrind, S; Korn, E L; Isaacson, R J; West, E E; Molthen, R
1983-12-01
This article analyzes differences in the measured displacement of the condyle and of progonion when different vectors of force are delivered to the maxilla in the course of non-full-banded, Phase 1, mixed-dentition treatment for the correction of Class II malocclusion. The 238-case sample is identical to that for which changes in other parameters of facial form have been reported previously. Relative to superimposition on anterior cranial base and measured in a Frankfort-plane-determined coordinate system, we have attempted to identify and quantitate (1) the displacement of each structure which results from local remodeling and (2) the displacement of each structure which occurs as a secondary consequence of changes in other regions of the skull. We have also attempted to isolate treatment effects from those attributable to spontaneous growth and development. At the condyle, we note that in all three treatment groups and in the control group there is a small but real downward and backward displacement of the glenoid fossa. This change is not treatment induced but, rather, is associated with spontaneous growth and development. (See Fig. 5.) Some interesting differences in pattern of "growth at the condyle" were noted between samples. In the intraoral (modified activator) sample, there were small but statistically significant increases in growth rate as compared to the untreated group of Class II controls. To our surprise, similar statistically significant increases over the growth rate of the control group were noted in the cervical sample. (See Table III, variables 17 and 18.) Small but statistically significant differences between treatments were also noted in the patterns of change at pogonion. As compared to the untreated control group, the rate of total displacement in the modified activator group was significantly greater in the forward direction, while the rate of total displacement in the cervical group was significantly greater in the downward direction. There were no statistically significant differences in the rate of total displacement of pogonion between the high-pull sample and the control sample. (See Table IV, variables 21 and 22.
Ho, Lindsey A; Lange, Ethan M
2010-12-01
Genome-wide association (GWA) studies are a powerful approach for identifying novel genetic risk factors associated with human disease. A GWA study typically requires the inclusion of thousands of samples to have sufficient statistical power to detect single nucleotide polymorphisms that are associated with only modest increases in risk of disease given the heavy burden of a multiple test correction that is necessary to maintain valid statistical tests. Low statistical power and the high financial cost of performing a GWA study remains prohibitive for many scientific investigators anxious to perform such a study using their own samples. A number of remedies have been suggested to increase statistical power and decrease cost, including the utilization of free publicly available genotype data and multi-stage genotyping designs. Herein, we compare the statistical power and relative costs of alternative association study designs that use cases and screened controls to study designs that are based only on, or additionally include, free public control genotype data. We describe a novel replication-based two-stage study design, which uses free public control genotype data in the first stage and follow-up genotype data on case-matched controls in the second stage that preserves many of the advantages inherent when using only an epidemiologically matched set of controls. Specifically, we show that our proposed two-stage design can substantially increase statistical power and decrease cost of performing a GWA study while controlling the type-I error rate that can be inflated when using public controls due to differences in ancestry and batch genotype effects.
Middle school science teachers' teaching self-efficacy and students' science self-efficacy
NASA Astrophysics Data System (ADS)
Pisa, Danielle
Project 2061, initiated by the American Association for the Advancement of Science (AAAS), developed recommendations for what is essential in education to produce scientifically literate citizens. Furthermore, they suggest that teachers teach effectively. There is an abundance of literature that focuses on the effects of a teacher's science teaching self-efficacy and a student's science self-efficacy. However, there is no literature on the relationship between the two self-efficacies. This study investigated if there is a differential change in students' science self-efficacy over an academic term after instruction from a teacher with high science teaching self-efficacy. Quantitative analysis of STEBI scores for teachers showed that mean STEBI scores did not change over one academic term. A t test indicated that there was no statistically significant difference in mean SMTSL scores for students' science self-efficacy over the course of one academic term for a) the entire sample, b) each science class, and c) each grade level. In addition, ANOVA indicated that there was no statistically significant difference in mean gain factor of students rated as low, medium, and high on science self-efficacy as measured by the SMTSL, when students received instruction from a teacher with a high science teaching self-efficacy value as measured by the STEBI. Finally, there was no statistically significant association between the pre- and post-instructional rankings of SMTSL by grade level when students received instruction from a teacher with a high science teaching self-efficacy value as measured by the STEBI. This is the first study of its kind. Studies indicated that teaching strategies typically practiced by teachers with high science teaching were beneficial to physics self-efficacy (Fencl & Scheel, 2005). Although it was unsuccessful at determining whether or not a teacher with high science teaching self-efficacy has a differential affect on students' science self-efficacy, it is worth repeating on a more diverse sample of teacher and students over a longer period of time.
El Batawi, H Y
2015-02-01
To investigate the possible effect of intraoperative analgesia, namely diclofenac sodium compared to acetaminophen on post-recovery pain perception in children undergoing painful dental procedures under general anaesthesia. A double-blind randomised clinical trial. A sample of 180 consecutive cases of children undergoing full dental rehabilitation under general anaesthesia in a private hospital in Saudi Arabia during 2013 was divided into three groups (60 children each) according to the analgesic used prior to extubation. Group A, children had diclofenac sodium suppository. Group B, children received acetaminophen suppository and Group C, the control group. Using an authenticated Arabic version of the Wong and Baker faces Pain assessment Scale, patients were asked to choose the face that suits best the pain he/she is suffering. Data were collected and recorded for statistical analysis. Student's t test was used for comparison of sample means. A preliminary F test to compare sample variances was carried out to determine the appropriate t test variant to be used. A "p" value less than 0.05 was considered significant. More than 93% of children had post-operative pain in varying degrees. High statistical significance was observed between children in groups A and B compared to control group C with the later scoring high pain perception. Diclofenac showed higher potency in multiple painful procedures, while the statistical difference was not significant in children with three or less painful dental procedures. Diclophenac sodium is more potent than acetaminophen, especially for multiple pain-provoking or traumatic procedures. A timely use of NSAID analgesia just before extubation helps provide adequate coverage during recovery. Peri-operative analgesia is to be recommended as an essential treatment adjunct for child dental rehabilitation under general anaesthesia.
NASA Astrophysics Data System (ADS)
Garrido, Marta Isabel; Teng, Chee Leong James; Taylor, Jeremy Alexander; Rowe, Elise Genevieve; Mattingley, Jason Brett
2016-06-01
The ability to learn about regularities in the environment and to make predictions about future events is fundamental for adaptive behaviour. We have previously shown that people can implicitly encode statistical regularities and detect violations therein, as reflected in neuronal responses to unpredictable events that carry a unique prediction error signature. In the real world, however, learning about regularities will often occur in the context of competing cognitive demands. Here we asked whether learning of statistical regularities is modulated by concurrent cognitive load. We compared electroencephalographic metrics associated with responses to pure-tone sounds with frequencies sampled from narrow or wide Gaussian distributions. We showed that outliers evoked a larger response than those in the centre of the stimulus distribution (i.e., an effect of surprise) and that this difference was greater for physically identical outliers in the narrow than in the broad distribution. These results demonstrate an early neurophysiological marker of the brain's ability to implicitly encode complex statistical structure in the environment. Moreover, we manipulated concurrent cognitive load by having participants perform a visual working memory task while listening to these streams of sounds. We again observed greater prediction error responses in the narrower distribution under both low and high cognitive load. Furthermore, there was no reliable reduction in prediction error magnitude under high-relative to low-cognitive load. Our findings suggest that statistical learning is not a capacity limited process, and that it proceeds automatically even when cognitive resources are taxed by concurrent demands.
NASA Astrophysics Data System (ADS)
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo
2018-06-01
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo
2018-06-05
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Xiao, Rong; Anderson, Stephen; Aramini, James; Belote, Rachel; Buchwald, William A.; Ciccosanti, Colleen; Conover, Ken; Everett, John K.; Hamilton, Keith; Huang, Yuanpeng Janet; Janjua, Haleema; Jiang, Mei; Kornhaber, Gregory J.; Lee, Dong Yup; Locke, Jessica Y.; Ma, Li-Chung; Maglaqui, Melissa; Mao, Lei; Mitra, Saheli; Patel, Dayaban; Rossi, Paolo; Sahdev, Seema; Sharma, Seema; Shastry, Ritu; Swapna, G.V.T.; Tong, Saichu N.; Wang, Dongyan; Wang, Huang; Zhao, Li; Montelione, Gaetano T.; Acton, Thomas B.
2014-01-01
We describe the core Protein Production Platform of the Northeast Structural Genomics Consortium (NESG) and outline the strategies used for producing high-quality protein samples. The platform is centered on the cloning, expression and purification of 6X-His-tagged proteins using T7-based Escherichia coli systems. The 6X-His tag allows for similar purification procedures for most targets and implementation of high-throughput (HTP) parallel methods. In most cases, the 6X-His-tagged proteins are sufficiently purified (> 97% homogeneity) using a HTP two-step purification protocol for most structural studies. Using this platform, the open reading frames of over 16,000 different targeted proteins (or domains) have been cloned as > 26,000 constructs. Over the past nine years, more than 16,000 of these expressed protein, and more than 4,400 proteins (or domains) have been purified to homogeneity in tens of milligram quantities (see Summary Statistics, http://nesg.org/statistics.html). Using these samples, the NESG has deposited more than 900 new protein structures to the Protein Data Bank (PDB). The methods described here are effective in producing eukaryotic and prokaryotic protein samples in E. coli. This paper summarizes some of the updates made to the protein production pipeline in the last five years, corresponding to phase 2 of the NIGMS Protein Structure Initiative (PSI-2) project. The NESG Protein Production Platform is suitable for implementation in a large individual laboratory or by a small group of collaborating investigators. These advanced automated and/or parallel cloning, expression, purification, and biophysical screening technologies are of broad value to the structural biology, functional proteomics, and structural genomics communities. PMID:20688167
Shawna, Wicks; M., Taylor Christopher; Meng, Luo; Eugene, Blanchard IV; David, Ribnicky; T., Cefalu William; L., Mynatt Randall; A., Welsh David
2014-01-01
Objective The gut microbiome has been implicated in obesity and metabolic syndrome; however, most studies have focused on fecal or colonic samples. Several species of Artemisia have been reported to ameliorate insulin signaling both in vitro and in vivo. The aim of this study was to characterize the mucosal and luminal bacterial populations in the terminal ileum with or without supplementation with Artemisia extracts. Materials/Methods Following 4 weeks of supplementation with different Artemisia extracts (PMI 5011, Santa or Scopa), diet-induced obese mice were sacrificed and luminal and mucosal samples of terminal ileum were used to evaluate microbial community composition by pyrosequencing of 16S rDNA hypervariable regions. Results Significant differences in community structure and membership were observed between luminal and mucosal samples, irrespective of diet group. All Artemisia extracts increased the Bacteroidetes:Firmicutes ratio in mucosal samples. This effect was not observed in the luminal compartment. There was high inter-individual variability in the phylogenetic assessments of the ileal microbiota, limiting the statistical power of this pilot investigation. Conclusions Marked differences in bacterial communities exist dependent upon the biogeographic compartment in the terminal ileum. Future studies testing the effects of Artemisia or other botanical supplements require larger sample sizes for adequate statistical power. PMID:24985102
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kreuzer-Martin, Helen W.; Hegg, Eric L.
The use of isotopic signatures for forensic analysis of biological materials is well-established, and the same general principles that apply to interpretation of stable isotope content of C, N, O, and H apply to the analysis of microorganisms. Heterotrophic microorganisms derive their isotopic content from their growth substrates, which are largely plant and animal products, and the water in their culture medium. Thus the isotope signatures of microbes are tied to their growth environment. The C, N, O, and H isotope ratios of spores have been demonstrated to constitute highly discriminating signatures for sample matching. They can rule out specificmore » samples of media and/or water as possible production media, and can predict isotope ratio ranges of the culture media and water used to produce a given sample. These applications have been developed and tested through analyses of approximately 250 samples of Bacillus subtilis spores and over 500 samples of culture media, providing a strong statistical basis for data interpretation. A Bayesian statistical framework for integrating stable isotope data with other types of signatures derived from microorganisms has been able to characterize the culture medium used to produce spores of various Bacillus species, leveraging isotopic differences in different medium types and demonstrating the power of data integration for forensic investigations.« less
Méndez, Jesús; González, Mónica; Lobo, M Gloria; Carnero, Aurelio
2004-03-10
The commercial value of a cochineal (Dactylopius coccus Costa) sample is associated with its color quality. Because the cochineal is a legal food colorant, its color quality is generally understood as its pigment content. Simply put, the higher this content, the more valuable the sample is to the market. In an effort to devise a way to measure the color quality of a cochineal, the present study evaluates different parameters of color measurement such as chromatic attributes (L*, and a*), percentage of carminic acid, tint determination, and chromatographic profile of pigments. Tint determination did not achieve this objective because this parameter does not correlate with carminic acid content. On the other hand, carminic acid showed a highly significant correlation (r = - 0.922, p = 0.000) with L* values determined from powdered cochineal samples. The combination of the information from the spectrophotometric determination of carminic acid with that of the pigment profile acquired by liquid chromatography (LC) and the composition of the red and yellow pigment groups, also acquired by LC, enables greater accuracy in judging the quality of the final sample. As a result of this study, it was possible to achieve the separation of cochineal samples according to geographical origin using two statistical techniques: cluster analysis and principal component analysis.
Dória, Maria Luisa; McKenzie, James S.; Mroz, Anna; Phelps, David L.; Speller, Abigail; Rosini, Francesca; Strittmatter, Nicole; Golf, Ottmar; Veselkov, Kirill; Brown, Robert; Ghaem-Maghami, Sadaf; Takats, Zoltan
2016-01-01
Ovarian cancer is highly prevalent among European women, and is the leading cause of gynaecological cancer death. Current histopathological diagnoses of tumour severity are based on interpretation of, for example, immunohistochemical staining. Desorption electrospray mass spectrometry imaging (DESI-MSI) generates spatially resolved metabolic profiles of tissues and supports an objective investigation of tumour biology. In this study, various ovarian tissue types were analysed by DESI-MSI and co-registered with their corresponding haematoxylin and eosin (H&E) stained images. The mass spectral data reveal tissue type-dependent lipid profiles which are consistent across the n = 110 samples (n = 107 patients) used in this study. Multivariate statistical methods were used to classify samples and identify molecular features discriminating between tissue types. Three main groups of samples (epithelial ovarian carcinoma, borderline ovarian tumours, normal ovarian stroma) were compared as were the carcinoma histotypes (serous, endometrioid, clear cell). Classification rates >84% were achieved for all analyses, and variables differing statistically between groups were determined and putatively identified. The changes noted in various lipid types help to provide a context in terms of tumour biochemistry. The classification of unseen samples demonstrates the capability of DESI-MSI to characterise ovarian samples and to overcome existing limitations in classical histopathology. PMID:27976698
2011-01-01
Background There is substantial variation in reported reference intervals for canine plasma creatinine among veterinary laboratories, thereby influencing the clinical assessment of analytical results. The aims of the study was to determine the inter- and intra-laboratory variation in plasma creatinine among 10 veterinary laboratories, and to compare results from each laboratory with the upper limit of its reference interval. Methods Samples were collected from 10 healthy dogs, 10 dogs with expected intermediate plasma creatinine concentrations, and 10 dogs with azotemia. Overlap was observed for the first two groups. The 30 samples were divided into 3 batches and shipped in random order by postal delivery for plasma creatinine determination. Statistical testing was performed in accordance with ISO standard methodology. Results Inter- and intra-laboratory variation was clinically acceptable as plasma creatinine values for most samples were usually of the same magnitude. A few extreme outliers caused three laboratories to fail statistical testing for consistency. Laboratory sample means above or below the overall sample mean, did not unequivocally reflect high or low reference intervals in that laboratory. Conclusions In spite of close analytical results, further standardization among laboratories is warranted. The discrepant reference intervals seem to largely reflect different populations used in establishing the reference intervals, rather than analytical variation due to different laboratory methods. PMID:21477356
Lin, Yu-Pin; Chu, Hone-Jay; Huang, Yu-Long; Tang, Chia-Hsi; Rouhani, Shahrokh
2011-06-01
This study develops a stratified conditional Latin hypercube sampling (scLHS) approach for multiple, remotely sensed, normalized difference vegetation index (NDVI) images. The objective is to sample, monitor, and delineate spatiotemporal landscape changes, including spatial heterogeneity and variability, in a given area. The scLHS approach, which is based on the variance quadtree technique (VQT) and the conditional Latin hypercube sampling (cLHS) method, selects samples in order to delineate landscape changes from multiple NDVI images. The images are then mapped for calibration and validation by using sequential Gaussian simulation (SGS) with the scLHS selected samples. Spatial statistical results indicate that in terms of their statistical distribution, spatial distribution, and spatial variation, the statistics and variograms of the scLHS samples resemble those of multiple NDVI images more closely than those of cLHS and VQT samples. Moreover, the accuracy of simulated NDVI images based on SGS with scLHS samples is significantly better than that of simulated NDVI images based on SGS with cLHS samples and VQT samples, respectively. However, the proposed approach efficiently monitors the spatial characteristics of landscape changes, including the statistics, spatial variability, and heterogeneity of NDVI images. In addition, SGS with the scLHS samples effectively reproduces spatial patterns and landscape changes in multiple NDVI images.
2018-01-01
ABSTRACT To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli. These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. PMID:29475868
Shakeri, Heman; Volkova, Victoriya; Wen, Xuesong; Deters, Andrea; Cull, Charley; Drouillard, James; Müller, Christian; Moradijamei, Behnaz; Jaberi-Douraki, Majid
2018-05-01
To assess phenotypic bacterial antimicrobial resistance (AMR) in different strata (e.g., host populations, environmental areas, manure, or sewage effluents) for epidemiological purposes, isolates of target bacteria can be obtained from a stratum using various sample types. Also, different sample processing methods can be applied. The MIC of each target antimicrobial drug for each isolate is measured. Statistical equivalence testing of the MIC data for the isolates allows evaluation of whether different sample types or sample processing methods yield equivalent estimates of the bacterial antimicrobial susceptibility in the stratum. We demonstrate this approach on the antimicrobial susceptibility estimates for (i) nontyphoidal Salmonella spp. from ground or trimmed meat versus cecal content samples of cattle in processing plants in 2013-2014 and (ii) nontyphoidal Salmonella spp. from urine, fecal, and blood human samples in 2015 (U.S. National Antimicrobial Resistance Monitoring System data). We found that the sample types for cattle yielded nonequivalent susceptibility estimates for several antimicrobial drug classes and thus may gauge distinct subpopulations of salmonellae. The quinolone and fluoroquinolone susceptibility estimates for nontyphoidal salmonellae from human blood are nonequivalent to those from urine or feces, conjecturally due to the fluoroquinolone (ciprofloxacin) use to treat infections caused by nontyphoidal salmonellae. We also demonstrate statistical equivalence testing for comparing sample processing methods for fecal samples (culturing one versus multiple aliquots per sample) to assess AMR in fecal Escherichia coli These methods yield equivalent results, except for tetracyclines. Importantly, statistical equivalence testing provides the MIC difference at which the data from two sample types or sample processing methods differ statistically. Data users (e.g., microbiologists and epidemiologists) may then interpret practical relevance of the difference. IMPORTANCE Bacterial antimicrobial resistance (AMR) needs to be assessed in different populations or strata for the purposes of surveillance and determination of the efficacy of interventions to halt AMR dissemination. To assess phenotypic antimicrobial susceptibility, isolates of target bacteria can be obtained from a stratum using different sample types or employing different sample processing methods in the laboratory. The MIC of each target antimicrobial drug for each of the isolates is measured, yielding the MIC distribution across the isolates from each sample type or sample processing method. We describe statistical equivalence testing for the MIC data for evaluating whether two sample types or sample processing methods yield equivalent estimates of the bacterial phenotypic antimicrobial susceptibility in the stratum. This includes estimating the MIC difference at which the data from the two approaches differ statistically. Data users (e.g., microbiologists, epidemiologists, and public health professionals) can then interpret whether that present difference is practically relevant. Copyright © 2018 Shakeri et al.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trattner, Sigal; Cheng, Bin; Pieniazek, Radoslaw L.
2014-04-15
Purpose: Effective dose (ED) is a widely used metric for comparing ionizing radiation burden between different imaging modalities, scanners, and scan protocols. In computed tomography (CT), ED can be estimated by performing scans on an anthropomorphic phantom in which metal-oxide-semiconductor field-effect transistor (MOSFET) solid-state dosimeters have been placed to enable organ dose measurements. Here a statistical framework is established to determine the sample size (number of scans) needed for estimating ED to a desired precision and confidence, for a particular scanner and scan protocol, subject to practical limitations. Methods: The statistical scheme involves solving equations which minimize the sample sizemore » required for estimating ED to desired precision and confidence. It is subject to a constrained variation of the estimated ED and solved using the Lagrange multiplier method. The scheme incorporates measurement variation introduced both by MOSFET calibration, and by variation in MOSFET readings between repeated CT scans. Sample size requirements are illustrated on cardiac, chest, and abdomen–pelvis CT scans performed on a 320-row scanner and chest CT performed on a 16-row scanner. Results: Sample sizes for estimating ED vary considerably between scanners and protocols. Sample size increases as the required precision or confidence is higher and also as the anticipated ED is lower. For example, for a helical chest protocol, for 95% confidence and 5% precision for the ED, 30 measurements are required on the 320-row scanner and 11 on the 16-row scanner when the anticipated ED is 4 mSv; these sample sizes are 5 and 2, respectively, when the anticipated ED is 10 mSv. Conclusions: Applying the suggested scheme, it was found that even at modest sample sizes, it is feasible to estimate ED with high precision and a high degree of confidence. As CT technology develops enabling ED to be lowered, more MOSFET measurements are needed to estimate ED with the same precision and confidence.« less
Trattner, Sigal; Cheng, Bin; Pieniazek, Radoslaw L.; Hoffmann, Udo; Douglas, Pamela S.; Einstein, Andrew J.
2014-01-01
Purpose: Effective dose (ED) is a widely used metric for comparing ionizing radiation burden between different imaging modalities, scanners, and scan protocols. In computed tomography (CT), ED can be estimated by performing scans on an anthropomorphic phantom in which metal-oxide-semiconductor field-effect transistor (MOSFET) solid-state dosimeters have been placed to enable organ dose measurements. Here a statistical framework is established to determine the sample size (number of scans) needed for estimating ED to a desired precision and confidence, for a particular scanner and scan protocol, subject to practical limitations. Methods: The statistical scheme involves solving equations which minimize the sample size required for estimating ED to desired precision and confidence. It is subject to a constrained variation of the estimated ED and solved using the Lagrange multiplier method. The scheme incorporates measurement variation introduced both by MOSFET calibration, and by variation in MOSFET readings between repeated CT scans. Sample size requirements are illustrated on cardiac, chest, and abdomen–pelvis CT scans performed on a 320-row scanner and chest CT performed on a 16-row scanner. Results: Sample sizes for estimating ED vary considerably between scanners and protocols. Sample size increases as the required precision or confidence is higher and also as the anticipated ED is lower. For example, for a helical chest protocol, for 95% confidence and 5% precision for the ED, 30 measurements are required on the 320-row scanner and 11 on the 16-row scanner when the anticipated ED is 4 mSv; these sample sizes are 5 and 2, respectively, when the anticipated ED is 10 mSv. Conclusions: Applying the suggested scheme, it was found that even at modest sample sizes, it is feasible to estimate ED with high precision and a high degree of confidence. As CT technology develops enabling ED to be lowered, more MOSFET measurements are needed to estimate ED with the same precision and confidence. PMID:24694150
Statistical literacy and sample survey results
NASA Astrophysics Data System (ADS)
McAlevey, Lynn; Sullivan, Charles
2010-10-01
Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In general, they fare no better than managers who have never studied statistics. There are implications for teaching, especially in business schools, as well as for consulting.
NASA Astrophysics Data System (ADS)
Viironen, K.; Marín-Franch, A.; López-Sanjuan, C.; Varela, J.; Chaves-Montero, J.; Cristóbal-Hornillos, D.; Molino, A.; Fernández-Soto, A.; Vilella-Rojo, G.; Ascaso, B.; Cenarro, A. J.; Cerviño, M.; Cepa, J.; Ederoclite, A.; Márquez, I.; Masegosa, J.; Moles, M.; Oteo, I.; Pović, M.; Aguerri, J. A. L.; Alfaro, E.; Aparicio-Villegas, T.; Benítez, N.; Broadhurst, T.; Cabrera-Caño, J.; Castander, J. F.; Del Olmo, A.; González Delgado, R. M.; Husillos, C.; Infante, L.; Martínez, V. J.; Perea, J.; Prada, F.; Quintana, J. M.
2015-04-01
Context. Most observational results on the high redshift restframe UV-bright galaxies are based on samples pinpointed using the so-called dropout technique or Ly-α selection. However, the availability of multifilter data now allows the dropout selections to be replaced by direct methods based on photometric redshifts. In this paper we present the methodology to select and study the population of high redshift galaxies in the ALHAMBRA survey data. Aims: Our aim is to develop a less biased methodology than the traditional dropout technique to study the high redshift galaxies in ALHAMBRA and other multifilter data. Thanks to the wide area ALHAMBRA covers, we especially aim at contributing to the study of the brightest, least frequent, high redshift galaxies. Methods: The methodology is based on redshift probability distribution functions (zPDFs). It is shown how a clean galaxy sample can be obtained by selecting the galaxies with high integrated probability of being within a given redshift interval. However, reaching both a complete and clean sample with this method is challenging. Hence, a method to derive statistical properties by summing the zPDFs of all the galaxies in the redshift bin of interest is introduced. Results: Using this methodology we derive the galaxy rest frame UV number counts in five redshift bins centred at z = 2.5,3.0,3.5,4.0, and 4.5, being complete up to the limiting magnitude at mUV(AB) = 24, where mUV refers to the first ALHAMBRA filter redwards of the Ly-α line. With the wide field ALHAMBRA data we especially contribute to the study of the brightest ends of these counts, accurately sampling the surface densities down to mUV(AB) = 21-22. Conclusions: We show that using the zPDFs it is easy to select a very clean sample of high redshift galaxies. We also show that it is better to do statistical analysis of the properties of galaxies using a probabilistic approach, which takes into account both the incompleteness and contamination issues in a natural way. Based on observations collected at the German-Spanish Astronomical Center, Calar Alto, jointly operated by the Max-Planck-Institut für Astronomie (MPIA) at Heidelberg and the Instituto de Astrofísica de Andalucía (CSIC).
Use of Whatman-41 filters in air quality sampling networks (with applications to elemental analysis)
NASA Technical Reports Server (NTRS)
Neustadter, H. E.; Sidik, S. M.; King, R. B.; Fordyce, J. S.; Burr, J. C.
1974-01-01
The operation of a 16-site parallel high volume air sampling network with glass fiber filters on one unit and Whatman-41 filters on the other is reported. The network data and data from several other experiments indicate that (1) Sampler-to-sampler and filter-to-filter variabilities are small; (2) hygroscopic affinity of Whatman-41 filters need not introduce errors; and (3) suspended particulate samples from glass fiber filters averaged slightly, but not statistically significantly, higher than from Whatman-41-filters. The results obtained demonstrate the practicability of Whatman-41 filters for air quality monitoring and elemental analysis.
FUNSTAT and statistical image representations
NASA Technical Reports Server (NTRS)
Parzen, E.
1983-01-01
General ideas of functional statistical inference analysis of one sample and two samples, univariate and bivariate are outlined. ONESAM program is applied to analyze the univariate probability distributions of multi-spectral image data.
Statistical scaling of geometric characteristics in stochastically generated pore microstructures
Hyman, Jeffrey D.; Guadagnini, Alberto; Winter, C. Larrabee
2015-05-21
In this study, we analyze the statistical scaling of structural attributes of virtual porous microstructures that are stochastically generated by thresholding Gaussian random fields. Characterization of the extent at which randomly generated pore spaces can be considered as representative of a particular rock sample depends on the metrics employed to compare the virtual sample against its physical counterpart. Typically, comparisons against features and/patterns of geometric observables, e.g., porosity and specific surface area, flow-related macroscopic parameters, e.g., permeability, or autocorrelation functions are used to assess the representativeness of a virtual sample, and thereby the quality of the generation method. Here, wemore » rely on manifestations of statistical scaling of geometric observables which were recently observed in real millimeter scale rock samples [13] as additional relevant metrics by which to characterize a virtual sample. We explore the statistical scaling of two geometric observables, namely porosity (Φ) and specific surface area (SSA), of porous microstructures generated using the method of Smolarkiewicz and Winter [42] and Hyman and Winter [22]. Our results suggest that the method can produce virtual pore space samples displaying the symptoms of statistical scaling observed in real rock samples. Order q sample structure functions (statistical moments of absolute increments) of Φ and SSA scale as a power of the separation distance (lag) over a range of lags, and extended self-similarity (linear relationship between log structure functions of successive orders) appears to be an intrinsic property of the generated media. The width of the range of lags where power-law scaling is observed and the Hurst coefficient associated with the variables we consider can be controlled by the generation parameters of the method.« less
NASA Technical Reports Server (NTRS)
Fisher, Brad; Wolff, David B.
2010-01-01
Passive and active microwave rain sensors onboard earth-orbiting satellites estimate monthly rainfall from the instantaneous rain statistics collected during satellite overpasses. It is well known that climate-scale rain estimates from meteorological satellites incur sampling errors resulting from the process of discrete temporal sampling and statistical averaging. Sampling and retrieval errors ultimately become entangled in the estimation of the mean monthly rain rate. The sampling component of the error budget effectively introduces statistical noise into climate-scale rain estimates that obscure the error component associated with the instantaneous rain retrieval. Estimating the accuracy of the retrievals on monthly scales therefore necessitates a decomposition of the total error budget into sampling and retrieval error quantities. This paper presents results from a statistical evaluation of the sampling and retrieval errors for five different space-borne rain sensors on board nine orbiting satellites. Using an error decomposition methodology developed by one of the authors, sampling and retrieval errors were estimated at 0.25 resolution within 150 km of ground-based weather radars located at Kwajalein, Marshall Islands and Melbourne, Florida. Error and bias statistics were calculated according to the land, ocean and coast classifications of the surface terrain mask developed for the Goddard Profiling (GPROF) rain algorithm. Variations in the comparative error statistics are attributed to various factors related to differences in the swath geometry of each rain sensor, the orbital and instrument characteristics of the satellite and the regional climatology. The most significant result from this study found that each of the satellites incurred negative longterm oceanic retrieval biases of 10 to 30%.
The slip resistance of common footwear materials measured with two slipmeters.
Chang, W R; Matz, S
2001-12-01
The slip resistance of 16 commonly used footwear materials was measured with the Brungraber Mark II and the English XL on 3 floor surfaces under surface conditions of dry, wet, oily and oily wet. Three samples were used for each material combination and surface condition. The results of a one way ANOVA analysis indicated that the differences among different samples were statistically significant for a large number of material combinations and surface conditions. The results indicated that the ranking of materials based on their slip resistance values depends highly on the slipmeters, floor surfaces and surface conditions. For contaminated surfaces including wet, oily and oily wet surfaces, the slip resistance obtained with the English XL was usually higher than that measured with the Brungraber Mark II. The correlation coefficients between the slip resistance obtained with these two slipmeters calculated for different surface conditions indicated a strong correlation with statistical significance.
NASA Astrophysics Data System (ADS)
Hsu, Hsiao-Ping; Nadler, Walder; Grassberger, Peter
2005-07-01
The scaling behavior of randomly branched polymers in a good solvent is studied in two to nine dimensions, modeled by lattice animals on simple hypercubic lattices. For the simulations, we use a biased sequential sampling algorithm with re-sampling, similar to the pruned-enriched Rosenbluth method (PERM) used extensively for linear polymers. We obtain high statistics of animals with up to several thousand sites in all dimension 2⩽d⩽9. The partition sum (number of different animals) and gyration radii are estimated. In all dimensions we verify the Parisi-Sourlas prediction, and we verify all exactly known critical exponents in dimensions 2, 3, 4, and ⩾8. In addition, we present the hitherto most precise estimates for growth constants in d⩾3. For clusters with one site attached to an attractive surface, we verify the superuniversality of the cross-over exponent at the adsorption transition predicted by Janssen and Lyssy.
Space-Time Data fusion for Remote Sensing Applications
NASA Technical Reports Server (NTRS)
Braverman, Amy; Nguyen, H.; Cressie, N.
2011-01-01
NASA has been collecting massive amounts of remote sensing data about Earth's systems for more than a decade. Missions are selected to be complementary in quantities measured, retrieval techniques, and sampling characteristics, so these datasets are highly synergistic. To fully exploit this, a rigorous methodology for combining data with heterogeneous sampling characteristics is required. For scientific purposes, the methodology must also provide quantitative measures of uncertainty that propagate input-data uncertainty appropriately. We view this as a statistical inference problem. The true but notdirectly- observed quantities form a vector-valued field continuous in space and time. Our goal is to infer those true values or some function of them, and provide to uncertainty quantification for those inferences. We use a spatiotemporal statistical model that relates the unobserved quantities of interest at point-level to the spatially aggregated, observed data. We describe and illustrate our method using CO2 data from two NASA data sets.
An evaluation of grease type ball bearing lubricants operating in various environments
NASA Technical Reports Server (NTRS)
Mcmurtrey, E. L.
1981-01-01
Because many future spacecraft or space stations will require mechanisms to operate for long periods of time in environments which are adverse to most bearing lubricants, a series of tests is continuing to evaluate 38 grease type lubricants in R-4 size bearings in five different environments for a 1 year period. Four repetitions of each test are made to provide statistical samples. These tests were used to select four lubricants for 5 year tests in selected environments with five repetitions of each test for statistical samples. At the present time, 100 test sets are completed and 22 test sets are underway. Three 5 year tests were started in (1) continuous operation and (2) start-stop operation, with both in vacuum at ambient temperatures, and (3) continuous operation at 93.3 C. In the 1 year tests the best results to date in all environments were obtained with a high viscosity index perfluoroalkylpolyether (PFPE) grease.
Sayago, Ana; González-Domínguez, Raúl; Beltrán, Rafael; Fernández-Recamales, Ángeles
2018-09-30
This work explores the potential of multi-element fingerprinting in combination with advanced data mining strategies to assess the geographical origin of extra virgin olive oil samples. For this purpose, the concentrations of 55 elements were determined in 125 oil samples from multiple Spanish geographic areas. Several unsupervised and supervised multivariate statistical techniques were used to build classification models and investigate the relationship between mineral composition of olive oils and their provenance. Results showed that Spanish extra virgin olive oils exhibit characteristic element profiles, which can be differentiated on the basis of their origin in accordance with three geographical areas: Atlantic coast (Huelva province), Mediterranean coast and inland regions. Furthermore, statistical modelling yielded high sensitivity and specificity, principally when random forest and support vector machines were employed, thus demonstrating the utility of these techniques in food traceability and authenticity research. Copyright © 2018 Elsevier Ltd. All rights reserved.
Rapid analysis of pharmaceutical drugs using LIBS coupled with multivariate analysis.
Tiwari, P K; Awasthi, S; Kumar, R; Anand, R K; Rai, P K; Rai, A K
2018-02-01
Type 2 diabetes drug tablets containing voglibose having dose strengths of 0.2 and 0.3 mg of various brands have been examined, using laser-induced breakdown spectroscopy (LIBS) technique. The statistical methods such as the principal component analysis (PCA) and the partial least square regression analysis (PLSR) have been employed on LIBS spectral data for classifying and developing the calibration models of drug samples. We have developed the ratio-based calibration model applying PLSR in which relative spectral intensity ratios H/C, H/N and O/N are used. Further, the developed model has been employed to predict the relative concentration of element in unknown drug samples. The experiment has been performed in air and argon atmosphere, respectively, and the obtained results have been compared. The present model provides rapid spectroscopic method for drug analysis with high statistical significance for online control and measurement process in a wide variety of pharmaceutical industrial applications.
Sangeetha, S; Sujatha, C M; Manamalli, D
2014-01-01
In this work, anisotropy of compressive and tensile strength regions of femur trabecular bone are analysed using quaternion wavelet transforms. The normal and abnormal femur trabecular bone radiographic images are considered for this study. The sub-anatomic regions, which include compressive and tensile regions, are delineated using pre-processing procedures. These delineated regions are subjected to quaternion wavelet transforms and statistical parameters are derived from the transformed images. These parameters are correlated with apparent porosity, which is derived from the strength regions. Further, anisotropy is also calculated from the transformed images and is analyzed. Results show that the anisotropy values derived from second and third phase components of quaternion wavelet transform are found to be distinct for normal and abnormal samples with high statistical significance for both compressive and tensile regions. These investigations demonstrate that architectural anisotropy derived from QWT analysis is able to differentiate normal and abnormal samples.
Developing Sampling Frame for Case Study: Challenges and Conditions
ERIC Educational Resources Information Center
Ishak, Noriah Mohd; Abu Bakar, Abu Yazid
2014-01-01
Due to statistical analysis, the issue of random sampling is pertinent to any quantitative study. Unlike quantitative study, the elimination of inferential statistical analysis, allows qualitative researchers to be more creative in dealing with sampling issue. Since results from qualitative study cannot be generalized to the bigger population,…
How Large Should a Statistical Sample Be?
ERIC Educational Resources Information Center
Menil, Violeta C.; Ye, Ruili
2012-01-01
This study serves as a teaching aid for teachers of introductory statistics. The aim of this study was limited to determining various sample sizes when estimating population proportion. Tables on sample sizes were generated using a C[superscript ++] program, which depends on population size, degree of precision or error level, and confidence…
Repeated Random Sampling in Year 5
ERIC Educational Resources Information Center
Watson, Jane M.; English, Lyn D.
2016-01-01
As an extension to an activity introducing Year 5 students to the practice of statistics, the software "TinkerPlots" made it possible to collect repeated random samples from a finite population to informally explore students' capacity to begin reasoning with a distribution of sample statistics. This article provides background for the…
Challenging Conventional Wisdom for Multivariate Statistical Models with Small Samples
ERIC Educational Resources Information Center
McNeish, Daniel
2017-01-01
In education research, small samples are common because of financial limitations, logistical challenges, or exploratory studies. With small samples, statistical principles on which researchers rely do not hold, leading to trust issues with model estimates and possible replication issues when scaling up. Researchers are generally aware of such…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Yousu; Etingov, Pavel V.; Ren, Huiying
This paper describes a probabilistic look-ahead contingency analysis application that incorporates smart sampling and high-performance computing (HPC) techniques. Smart sampling techniques are implemented to effectively represent the structure and statistical characteristics of uncertainty introduced by different sources in the power system. They can significantly reduce the data set size required for multiple look-ahead contingency analyses, and therefore reduce the time required to compute them. High-performance-computing (HPC) techniques are used to further reduce computational time. These two techniques enable a predictive capability that forecasts the impact of various uncertainties on potential transmission limit violations. The developed package has been tested withmore » real world data from the Bonneville Power Administration. Case study results are presented to demonstrate the performance of the applications developed.« less
Dwivedi, Jaya; Namdev, Kuldeep K; Chilkoti, Deepak C; Verma, Surajpal; Sharma, Swapnil
2018-06-06
Therapeutic drug monitoring (TDM) of anti-epileptic drugs provides a valid clinical tool in optimization of overall therapy. However, TDM is challenging due to the high biological samples (plasma/blood) storage/shipment costs and the limited availability of laboratories providing TDM services. Sampling in the form of dry plasma spot (DPS) or dry blood spot (DBS) is a suitable alternative to overcome these issues. An improved, simple, rapid, and stability indicating method for quantification of pregabalin in human plasma and DPS has been developed and validated. Analyses were performed on liquid chromatography tandem mass spectrometer under positive ionization mode of electrospray interface. Pregabain-d4 was used as internal standard, and the chromatographic separations were performed on Poroshell 120 EC-C18 column using an isocratic mobile phase flow rate of 1 mL/min. Stability of pregabalin in DPS was evaluated under simulated real-time conditions. Extraction procedures from plasma and DPS samples were compared using statistical tests. The method was validated considering the FDA method validation guideline. The method was linear over the concentration range of 20-16000 ng/mL and 100-10000 ng/mL in plasma and DPS, respectively. DPS samples were found stable for only one week upon storage at room temperature and for at least four weeks at freezing temperature (-20 ± 5 °C). Method was applied for quantification of pregabalin in over 600 samples of a clinical study. Statistical analyses revealed that two extraction procedures in plasma and DPS samples showed statistically insignificant difference and can be used interchangeably without any bias. Proposed method involves simple and rapid steps of sample processing that do not require a pre- or post-column derivatization procedure. The method is suitable for routine pharmacokinetic analysis and therapeutic monitoring of pregabalin.
Association of Blastocystis subtypes with diarrhea in children
NASA Astrophysics Data System (ADS)
Zulfa, F.; Sari, I. P.; Kurniawan, A.
2017-08-01
Blastocystis hominis is an intestinal zoonotic protozoa that epidemiological surveys have shown, is highly prevalent among children and may cause chronic diarrhea. This study aimed to identify Blastocystis subtypes among children and associate those subtypes to pathology. The study’s population was children aged 6-12 years old divided into asymptomatic and symptomatic (diarrhea) groups. The asymptomatic samples were obtained from primary school students in the Bukit Duri area of South Jakarta, while the symptomatic samples were obtained from patients who visited nearby primary health centers (Puskesmas). Symptomatic stool samples were examined inParasitology Laboratory FKUI. Microscopic examination of the stool samples was performed to screen for single Blastocystic infection, followed by culture, PCR of 18S rRNA, and sequencing. In the study, 53.2% of children (n = 156) harbored intestinal parasites, Blastocysts sp. A single infection of Blastocystis sp. was present in 69 (44.23%) samples, comprised of 36 symptomatic and 33 asymptomatic participants. The Blastocystis subtypes (STs) identified in this study were STs 1-4 ST3 was the most dominant and was observed with statistically significant higher frequency in the symptomatic group. ST4 was only found in one sample in the symptomatic group. While ST1 and ST2 were found more frequently in the asymptomatic group, no statistical association was observed. ST3 is more likely to be associated with clinical symptoms than ST1 and ST2.
A statistical evaluation of non-ergodic variogram estimators
Curriero, F.C.; Hohn, M.E.; Liebhold, A.M.; Lele, S.R.
2002-01-01
Geostatistics is a set of statistical techniques that is increasingly used to characterize spatial dependence in spatially referenced ecological data. A common feature of geostatistics is predicting values at unsampled locations from nearby samples using the kriging algorithm. Modeling spatial dependence in sampled data is necessary before kriging and is usually accomplished with the variogram and its traditional estimator. Other types of estimators, known as non-ergodic estimators, have been used in ecological applications. Non-ergodic estimators were originally suggested as a method of choice when sampled data are preferentially located and exhibit a skewed frequency distribution. Preferentially located samples can occur, for example, when areas with high values are sampled more intensely than other areas. In earlier studies the visual appearance of variograms from traditional and non-ergodic estimators were compared. Here we evaluate the estimators' relative performance in prediction. We also show algebraically that a non-ergodic version of the variogram is equivalent to the traditional variogram estimator. Simulations, designed to investigate the effects of data skewness and preferential sampling on variogram estimation and kriging, showed the traditional variogram estimator outperforms the non-ergodic estimators under these conditions. We also analyzed data on carabid beetle abundance, which exhibited large-scale spatial variability (trend) and a skewed frequency distribution. Detrending data followed by robust estimation of the residual variogram is demonstrated to be a successful alternative to the non-ergodic approach.
A method to estimate the effect of deformable image registration uncertainties on daily dose mapping
Murphy, Martin J.; Salguero, Francisco J.; Siebers, Jeffrey V.; Staub, David; Vaman, Constantin
2012-01-01
Purpose: To develop a statistical sampling procedure for spatially-correlated uncertainties in deformable image registration and then use it to demonstrate their effect on daily dose mapping. Methods: Sequential daily CT studies are acquired to map anatomical variations prior to fractionated external beam radiotherapy. The CTs are deformably registered to the planning CT to obtain displacement vector fields (DVFs). The DVFs are used to accumulate the dose delivered each day onto the planning CT. Each DVF has spatially-correlated uncertainties associated with it. Principal components analysis (PCA) is applied to measured DVF error maps to produce decorrelated principal component modes of the errors. The modes are sampled independently and reconstructed to produce synthetic registration error maps. The synthetic error maps are convolved with dose mapped via deformable registration to model the resulting uncertainty in the dose mapping. The results are compared to the dose mapping uncertainty that would result from uncorrelated DVF errors that vary randomly from voxel to voxel. Results: The error sampling method is shown to produce synthetic DVF error maps that are statistically indistinguishable from the observed error maps. Spatially-correlated DVF uncertainties modeled by our procedure produce patterns of dose mapping error that are different from that due to randomly distributed uncertainties. Conclusions: Deformable image registration uncertainties have complex spatial distributions. The authors have developed and tested a method to decorrelate the spatial uncertainties and make statistical samples of highly correlated error maps. The sample error maps can be used to investigate the effect of DVF uncertainties on daily dose mapping via deformable image registration. An initial demonstration of this methodology shows that dose mapping uncertainties can be sensitive to spatial patterns in the DVF uncertainties. PMID:22320766
Engblom, Henrik; Heiberg, Einar; Erlinge, David; Jensen, Svend Eggert; Nordrehaug, Jan Erik; Dubois-Randé, Jean-Luc; Halvorsen, Sigrun; Hoffmann, Pavel; Koul, Sasha; Carlsson, Marcus; Atar, Dan; Arheden, Håkan
2016-03-09
Cardiac magnetic resonance (CMR) can quantify myocardial infarct (MI) size and myocardium at risk (MaR), enabling assessment of myocardial salvage index (MSI). We assessed how MSI impacts the number of patients needed to reach statistical power in relation to MI size alone and levels of biochemical markers in clinical cardioprotection trials and how scan day affect sample size. Controls (n=90) from the recent CHILL-MI and MITOCARE trials were included. MI size, MaR, and MSI were assessed from CMR. High-sensitivity troponin T (hsTnT) and creatine kinase isoenzyme MB (CKMB) levels were assessed in CHILL-MI patients (n=50). Utilizing distribution of these variables, 100 000 clinical trials were simulated for calculation of sample size required to reach sufficient power. For a treatment effect of 25% decrease in outcome variables, 50 patients were required in each arm using MSI compared to 93, 98, 120, 141, and 143 for MI size alone, hsTnT (area under the curve [AUC] and peak), and CKMB (AUC and peak) in order to reach a power of 90%. If average CMR scan day between treatment and control arms differed by 1 day, sample size needs to be increased by 54% (77 vs 50) to avoid scan day bias masking a treatment effect of 25%. Sample size in cardioprotection trials can be reduced 46% to 65% without compromising statistical power when using MSI by CMR as an outcome variable instead of MI size alone or biochemical markers. It is essential to ensure lack of bias in scan day between treatment and control arms to avoid compromising statistical power. © 2016 The Authors. Published on behalf of the American Heart Association, Inc., by Wiley Blackwell.
Testing the Isotropic Universe Using the Gamma-Ray Burst Data of Fermi/GBM
NASA Astrophysics Data System (ADS)
Řípa, Jakub; Shafieloo, Arman
2017-12-01
The sky distribution of gamma-ray bursts (GRBs) has been intensively studied by various groups for more than two decades. Most of these studies test the isotropy of GRBs based on their sky number density distribution. In this work, we propose an approach to test the isotropy of the universe through inspecting the isotropy of the properties of GRBs such as their duration, fluences, and peak fluxes at various energy bands and different timescales. We apply this method on the Fermi/Gamma-ray Burst Monitor (GBM) data sample containing 1591 GRBs. The most noticeable feature we found is near the Galactic coordinates l≈ 30^\\circ , b≈ 15^\\circ , and radius r≈ 20^\\circ {--}40^\\circ . The inferred probability for the occurrence of such an anisotropic signal (in a random isotropic sample) is derived to be less than a percent in some of the tests while the other tests give results consistent with isotropy. These are based on the comparison of the results from the real data with the randomly shuffled data samples. Considering the large number of statistics we used in this work (some of which are correlated with each other), we can anticipate that the detected feature could be a result of statistical fluctuations. Moreover, we noticed a considerably low number of GRBs in this particular patch, which might be due to some instrumentation or observational effects that can consequently affect our statistics through some systematics. Further investigation is highly desirable in order to clarify this result, e.g., utilizing a larger future Fermi/GBM data sample as well as data samples of other GRB missions and also looking for possible systematics.
Designing image segmentation studies: Statistical power, sample size and reference standard quality.
Gibson, Eli; Hu, Yipeng; Huisman, Henkjan J; Barratt, Dean C
2017-12-01
Segmentation algorithms are typically evaluated by comparison to an accepted reference standard. The cost of generating accurate reference standards for medical image segmentation can be substantial. Since the study cost and the likelihood of detecting a clinically meaningful difference in accuracy both depend on the size and on the quality of the study reference standard, balancing these trade-offs supports the efficient use of research resources. In this work, we derive a statistical power calculation that enables researchers to estimate the appropriate sample size to detect clinically meaningful differences in segmentation accuracy (i.e. the proportion of voxels matching the reference standard) between two algorithms. Furthermore, we derive a formula to relate reference standard errors to their effect on the sample sizes of studies using lower-quality (but potentially more affordable and practically available) reference standards. The accuracy of the derived sample size formula was estimated through Monte Carlo simulation, demonstrating, with 95% confidence, a predicted statistical power within 4% of simulated values across a range of model parameters. This corresponds to sample size errors of less than 4 subjects and errors in the detectable accuracy difference less than 0.6%. The applicability of the formula to real-world data was assessed using bootstrap resampling simulations for pairs of algorithms from the PROMISE12 prostate MR segmentation challenge data set. The model predicted the simulated power for the majority of algorithm pairs within 4% for simulated experiments using a high-quality reference standard and within 6% for simulated experiments using a low-quality reference standard. A case study, also based on the PROMISE12 data, illustrates using the formulae to evaluate whether to use a lower-quality reference standard in a prostate segmentation study. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
Szabolcsi, Zoltán; Farkas, Zsuzsa; Borbély, Andrea; Bárány, Gusztáv; Varga, Dániel; Heinrich, Attila; Völgyi, Antónia; Pamjav, Horolma
2015-11-01
When the DNA profile from a crime-scene matches that of a suspect, the weight of DNA evidence depends on the unbiased estimation of the match probability of the profiles. For this reason, it is required to establish and expand the databases that reflect the actual allele frequencies in the population applied. 21,473 complete DNA profiles from Databank samples were used to establish the allele frequency database to represent the population of Hungarian suspects. We used fifteen STR loci (PowerPlex ESI16) including five, new ESS loci. The aim was to calculate the statistical, forensic efficiency parameters for the Databank samples and compare the newly detected data to the earlier report. The population substructure caused by relatedness may influence the frequency of profiles estimated. As our Databank profiles were considered non-random samples, possible relationships between the suspects can be assumed. Therefore, population inbreeding effect was estimated using the FIS calculation. The overall inbreeding parameter was found to be 0.0106. Furthermore, we tested the impact of the two allele frequency datasets on 101 randomly chosen STR profiles, including full and partial profiles. The 95% confidence interval estimates for the profile frequencies (pM) resulted in a tighter range when we used the new dataset compared to the previously published ones. We found that the FIS had less effect on frequency values in the 21,473 samples than the application of minimum allele frequency. No genetic substructure was detected by STRUCTURE analysis. Due to the low level of inbreeding effect and the high number of samples, the new dataset provides unbiased and precise estimates of LR for statistical interpretation of forensic casework and allows us to use lower allele frequencies. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Comparative Financial Statistics for Public Two-Year Colleges: FY 1991 Peer Groups Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Cirino, Anna Marie
Comparative financial information, derived from two national surveys of 503 public two-year colleges, is presented in this report for fiscal year (FY) 1990-91. The report includes statistics for the national sample and six peer groups, space for colleges to compare their institutional statistics with national and peer groups, and tables, bar…
[A comparison of convenience sampling and purposive sampling].
Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien
2014-06-01
Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.
Personal exposure to aerosolized red tide toxins (brevetoxins).
Cheng, Yung Sung; Zhou, Yue; Naar, Jerome; Irvin, C Mitch; Su, Wei-Chung; Fleming, Lora E; Kirkpatrick, Barbara; Pierce, Richard H; Backer, Lorraine C; Baden, Daniel G
2010-06-01
Florida red tides occur annually in the Gulf of Mexico from blooms of the marine dinoflagellate, Karenia brevis, which produces highly potent natural polyether toxins, brevetoxins. Several epidemiologic studies have demonstrated that human exposure to red tide aerosol could result in increased respiratory symptoms. Environmental monitoring of aerosolized brevetoxins was performed using a high-volume sampler taken hourly at fixed locations on Siesta Beach, Florida. Personal exposure was monitored using personal air samplers and taking nasal swab samples from the subjects who were instructed to spend 1 hr on Sarasota Beach during two sampling periods of an active Florida red tide event in March 2005, and in May 2008 when there was no red tide. Results showed that the aerosolized brevetoxins from the personal sampler were in modest agreement with the environmental concentration taken from a high-volume sampler. Analysis of nasal swab samples for brevetoxins demonstrated 68% positive samples in the March 2005 sampling period when air concentrations of brevetoxins were between 50 to 120 ng/m(3) measured with the high-volume sampler. No swab samples showed detectable levels of brevetoxins in the May 2008 study, when all personal samples were below the limit of detection. However, there were no statistical correlations between the amounts of brevetoxins detected in the swab samples with either the environmental or personal concentration. Results showed that the personal sample might provide an estimate of individual exposure level. Nasal swab samples showed that brevetoxins indeed were inhaled and deposited in the nasal passage during the March 2005 red tide event.
Exploring High School Students Beginning Reasoning about Significance Tests with Technology
ERIC Educational Resources Information Center
García, Víctor N.; Sánchez, Ernesto
2017-01-01
In the present study we analyze how students reason about or make inferences given a particular hypothesis testing problem (without having studied formal methods of statistical inference) when using Fathom. They use Fathom to create an empirical sampling distribution through computer simulation. It is found that most student´s reasoning rely on…
Shorebird habitat availability assessment of agricultural fields using a digital aerial video system
Clinton W. Jeske; Scott Wilson; Paul C. Chadwick; Wylie Barrow
2005-01-01
Field and wetland conditions in the rice prairies of Louisiana and Texas are highly dynamic habitats. Rice prairies are important habitat for many species of migratory birds, including shorebirds, wading birds, and waterfowl. Ground sampling a variety of fields to assess habitat availability is very labor intensive, and accessibility to private lands makes statistical...
A Statistical Analysis of Data Used in Critical Decision Making by Secondary School Personnel.
ERIC Educational Resources Information Center
Dunn, Charleta J.; Kowitz, Gerald T.
Guidance decisions depend on the validity of standardized tests and teacher judgment records as measures of student achievement. To test this validity, a sample of 400 high school juniors, randomly selected from two large Gulf Coas t area schools, were administered the Iowa Tests of Educational Development. The nine subtest scores and each…
ERIC Educational Resources Information Center
Adilogullari, Ilhan
2014-01-01
The purpose of this study is to analyze the relationship between the emotional intelligence and professional burnout levels of teachers. The nature of the study consists of high school teachers employed in city center of Kirsehir Province; 563 volunteer teachers form the nature of sampling. The statistical implementation of the study is performed…
Specific Gravity Variation in a Lower Mississippi Valley Cottonwood Population
R. E. Farmer; J. R. Wilcox
1966-01-01
Specific gravity varied from 0,32 to 0.46, averaging 0.38. Most of the variation was associated with individual trees; samples within locations accounted for a smaller, but statistically significant, portion of the variation. Variation between locatians was not significant. It was concluded that individual high-density trees' should be sought throughout the...
Smoking and Cancers: Case-Robust Analysis of a Classic Data Set
ERIC Educational Resources Information Center
Bentler, Peter M.; Satorra, Albert; Yuan, Ke-Hai
2009-01-01
A typical structural equation model is intended to reproduce the means, variances, and correlations or covariances among a set of variables based on parameter estimates of a highly restricted model. It is not widely appreciated that the sample statistics being modeled can be quite sensitive to outliers and influential observations, leading to bias…
INVESTIGATION OF THE USE OF STATISTICS IN COUNSELING STUDENTS.
ERIC Educational Resources Information Center
HEWES, ROBERT F.
THE OBJECTIVE WAS TO EMPLOY TECHNIQUES OF PROFILE ANALYSIS TO DEVELOP THE JOINT PROBABILITY OF SELECTING A SUITABLE SUBJECT MAJOR AND OF ASSURING TO A HIGH DEGREE GRADUATION FROM COLLEGE WITH THAT MAJOR. THE SAMPLE INCLUDED 1,197 MIT FRESHMEN STUDENTS IN 1952-53, AND THE VALIDATION GROUP INCLUDED 699 ENTRANTS IN 1954. DATA INCLUDED SECONDARY…
Effects of Urbanization on Stream Water Quality in the City of Atlanta, Georgia, USA
NASA Astrophysics Data System (ADS)
Peters, N. E.
2009-05-01
A long-term stream water-quality monitoring network was established in the City of Atlanta (COA) during 2003 to assess baseline water-quality conditions and the effects of urbanization on stream water quality. Routine hydrologically-based manual stream sampling, including several concurrent manual point and equal width increment sampling, was conducted approximately 12 times per year at 21 stations, with drainage areas ranging from 3.7 to 232 km2. Eleven of the stations are real-time (RT) water-quality stations having continuous measures of stream stage/discharge, pH, dissolved oxygen, specific conductance, water temperature, and turbidity, and automatic samplers for stormwater collection. Samples were analyzed for field parameters, and a broad suite of water-quality and sediment-related constituents. This paper summarizes an evaluation of field parameters and concentrations of major ions, minor and trace metals, nutrient species (nitrogen and phosphorus), and coliform bacteria among stations and with respect to watershed characteristics and plausible sources from 2003 through September 2007. The concentrations of most constituents in the COA streams are statistically higher than those of two nearby reference streams. Concentrations are statistically different among stations for several constituents, despite high variability both within and among stations. The combination of routine manual sampling, automatic sampling during stormflows, and real-time water-quality monitoring provided sufficient information about the variability of urban stream water quality to develop hypotheses for causes of water-quality differences among COA streams. Fecal coliform bacteria concentrations of most individual samples at each station exceeded Georgia's water-quality standard for any water-usage class. High chloride concentrations occur at three stations and are hypothesized to be associated with discharges of chlorinated combined sewer overflows, drainage of swimming pool(s), and dissolution and transport during rainstorms of CaCl2, a deicing salt applied to roads during winter storms. Water quality of one stream was highly affected by the dissolution and transport of ammonium alum [NH4Al(SO4)2] from an alum manufacturing plant in the watershed; streamwater has low pH (<5), low alkalinity and high concentrations of minor and trace metals. Several trace metals (Cu, Pb and Zn) exceed acute and chronic water-quality standards and the high concentrations are attributed to washoff from impervious surfaces.
Gupta, S. K.; Gupta, R. C.; Seth, A. K.; Gupta, A. B.; Bassin, J. K.; Gupta, A.
1999-01-01
An epidemiological investigation was undertaken in India to assess the prevalence of methaemoglobinaemia in areas with high nitrate concentration in drinking-water and the possible association with an adaptation of cytochrome-b5 reductase. Five areas were selected, with average nitrate ion concentrations in drinking-water of 26, 45, 95, 222 and 459 mg/l. These areas were visited and house schedules were prepared in accordance with a statistically designed protocol. A sample of 10% of the total population was selected in each of the areas, matched for age and weight, giving a total of 178 persons in five age groups. For each subject, a detailed history was documented, a medical examination was conducted and blood samples were taken to determine methaemoglobin level and cytochrome-b5 reductase activity. Collected data were subjected to statistical analysis to test for a possible relationship between nitrate concentration, cytochrome-b5 reductase activity and methaemoglobinaemia. High nitrate concentrations caused methaemoglobinaemia in infants and adults. The reserve of cytochrome-b5 reductase activity (i.e. the enzyme activity not currently being used, but which is available when needed; for example, under conditions of increased nitrate ingestion) and its adaptation with increasing water nitrate concentration to reduce methaemoglobin were more pronounced in children and adolescents. PMID:10534899
DOE Office of Scientific and Technical Information (OSTI.GOV)
West, Bradley M.; Stuckelberger, Michael; Guthrey, Harvey
We present that statistical and correlative analysis are increasingly important in the design and study of new materials, from semiconductors to metals. Non-destructive measurement techniques, with high spatial resolution, capable of correlating composition and/or structure with device properties, are few and far between. For the case of polycrystalline and inhomogeneous materials, the added challenge is that nanoscale resolution is in general not compatible with the large sampling areas necessary to have a statistical representation of the specimen under study. For the study of grain cores and grain boundaries in polycrystalline solar absorbers this is of particular importance since their dissimilarmore » behavior and variability throughout the samples makes it difficult to draw conclusions and ultimately optimize the material. In this study, we present a nanoscale in-operando approach based on the multimodal utilization of synchrotron nano x-ray fluorescence and x-ray beam induced current collected for grain core and grain boundary areas and correlated pixel-by-pixel in fully operational Cu(In (1-x)Ga x)Se 2 solar cells. We observe that low gallium cells have grain boundaries that over perform compared to the grain cores and high gallium cells have boundaries that under perform. In conclusion, these results demonstrate how nanoscale correlative X-ray microscopy can guide research pathways towards grain engineering low cost, high efficiency solar cells.« less
NASA Astrophysics Data System (ADS)
Smith, Kimberly A.
The research study investigates the effectiveness of an integrated high school science curriculum on student achievement, knowledge retention and science attitudes using quantitative and qualitative research. Data was collected from tenth grade students, in a small urban high school in Kansas City, Missouri, who were enrolled in a traditional Biology course or an integrated Environmental Science course. Quantitative data was collected in Phase 1 of the study. Data collected for academic achievement included pretest and posttest scores on the CTBS MATN exam. Data collected for knowledge retention included post-posttest scores on the CTBS MATN exam. Data collected for science attitudes were scores on a pretest and posttest using the TOSRA. SPSS was used to analyze the data using independent samples t-tests, one-way ANCOVA's and paired samples statistics. Qualitative data was collected in Phase 2 of the study. Data included responses to open-ended interview questions using three focus groups. Data was analyzed for common themes. Data analysis revealed the integrated Environmental Science course had a statistically significant impact on academic achievement, knowledge retention and positive science attitudes. Gender and socioeconomic status did not influence results. The study also determined that the CTBS MATN exam was not an accurate predictor of scores on state testing as was previously thought.
NASA Astrophysics Data System (ADS)
Sergeenko, N. P.
2017-11-01
An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
42 CFR 1003.109 - Notice of proposed determination.
Code of Federal Regulations, 2010 CFR
2010-10-01
... briefly describe the statistical sampling technique utilized by the Inspector General); (3) The reason why... statistical sampling in accordance with § 1003.133 in which case the notice shall describe those claims and...
11 CFR 9036.4 - Commission review of submissions.
Code of Federal Regulations, 2010 CFR
2010-01-01
..., in conducting its review, may utilize statistical sampling techniques. Based on the results of its... nonmatchable and the reason that it is not matchable; or if statistical sampling is used, the estimated amount...
Sequential Tests of Multiple Hypotheses Controlling Type I and II Familywise Error Rates
Bartroff, Jay; Song, Jinlin
2014-01-01
This paper addresses the following general scenario: A scientist wishes to perform a battery of experiments, each generating a sequential stream of data, to investigate some phenomenon. The scientist would like to control the overall error rate in order to draw statistically-valid conclusions from each experiment, while being as efficient as possible. The between-stream data may differ in distribution and dimension but also may be highly correlated, even duplicated exactly in some cases. Treating each experiment as a hypothesis test and adopting the familywise error rate (FWER) metric, we give a procedure that sequentially tests each hypothesis while controlling both the type I and II FWERs regardless of the between-stream correlation, and only requires arbitrary sequential test statistics that control the error rates for a given stream in isolation. The proposed procedure, which we call the sequential Holm procedure because of its inspiration from Holm’s (1979) seminal fixed-sample procedure, shows simultaneous savings in expected sample size and less conservative error control relative to fixed sample, sequential Bonferroni, and other recently proposed sequential procedures in a simulation study. PMID:25092948
Estimating uncertainty in respondent-driven sampling using a tree bootstrap method.
Baraff, Aaron J; McCormick, Tyler H; Raftery, Adrian E
2016-12-20
Respondent-driven sampling (RDS) is a network-based form of chain-referral sampling used to estimate attributes of populations that are difficult to access using standard survey tools. Although it has grown quickly in popularity since its introduction, the statistical properties of RDS estimates remain elusive. In particular, the sampling variability of these estimates has been shown to be much higher than previously acknowledged, and even methods designed to account for RDS result in misleadingly narrow confidence intervals. In this paper, we introduce a tree bootstrap method for estimating uncertainty in RDS estimates based on resampling recruitment trees. We use simulations from known social networks to show that the tree bootstrap method not only outperforms existing methods but also captures the high variability of RDS, even in extreme cases with high design effects. We also apply the method to data from injecting drug users in Ukraine. Unlike other methods, the tree bootstrap depends only on the structure of the sampled recruitment trees, not on the attributes being measured on the respondents, so correlations between attributes can be estimated as well as variability. Our results suggest that it is possible to accurately assess the high level of uncertainty inherent in RDS.
Cortes, Aneg L; Montiel, Enrique R; Gimeno, Isabel M
2009-12-01
The use of Flinders Technology Associates (FTA) filter cards to quantify Marek's disease virus (MDV) DNA for the diagnosis of Marek's disease (MD) and to monitor MD vaccines was evaluated. Samples of blood (43), solid tumors (14), and feather pulp (FP; 36) collected fresh and in FTA cards were analyzed. MDV DNA load was quantified by real-time PCR. Threshold cycle (Ct) ratios were calculated for each sample by dividing the Ct value of the internal control gene (glyceraldehyde-3-phosphate dehydrogenase) by the Ct value of the MDV gene. Statistically significant correlation (P < 0.05) within Ct ratios was detected between samples collected fresh and in FTA cards by using Pearson's correlation test. Load of serotype 1 MDV DNA was quantified in 24 FP, 14 solid tumor, and 43 blood samples. There was a statistically significant correlation between FP (r = 0.95), solid tumor (r = 0.94), and blood (r = 0.9) samples collected fresh and in FTA cards. Load of serotype 2 MDV DNA was quantified in 17 FP samples, and the correlation between samples collected fresh and in FTA cards was also statistically significant (Pearson's coefficient, r = 0.96); load of serotype 3 MDV DNA was quantified in 36 FP samples, and correlation between samples taken fresh and in FTA cards was also statistically significant (r = 0.84). MDV DNA samples extracted 3 days (t0) and 8 months after collection (t1) were used to evaluate the stability of MDV DNA in archived samples collected in FTA cards. A statistically significant correlation was found for serotype 1 (r = 0.96), serotype 2 (r = 1), and serotype 3 (r = 0.9). The results show that FTA cards are an excellent media to collect, transport, and archive samples for MD diagnosis and to monitor MD vaccines. In addition, FTA cards are widely available, inexpensive, and adequate for the shipment of samples nationally and internationally.
Sjoholm-Gomez de Liano, Carl; Soberon-Ventura, Vidal F; Salcedo-Villanueva, Guillermo; Santos-Palacios, Abril; Guerrero-Naranjo, Jose Luis; Fromow-Guerra, Jans; García-Aguirre, Gerardo; Morales-Canton, Virgilio; Velez-Montoya, Raul
2017-01-01
To assess the sensitivity, specificity, positive predictive value and negative predictive value of anterior chamber tap for the diagnosis of bacterial endophthalmitis on a population with high prevalence. Retrospective, single centre, case series study. We reviewed all medical records with clinical diagnosis of bacterial endophthalmitis in our hospital from January 1st, 2000 to December 31st 2014. From each record, we documented general demographic data, best corrected visual acuity and vitreous and aqueous tap microbiological results. All cases were further divided according to the endophthalmitis aetiology to perform individual calculations of sensitivity, specificity, positive predictive value, negative predictive value, accuracy and prevalence. We used the results of the vitreous tap as the gold standard for diagnosis of bacterial endophthalmitis. We excluded those records in which the aqueous and vitreous samples were not taken simultaneously or had an incomplete microbiological report. Significance were assessed with chi squared statistics, with an alpha value of 0.05 for statistical significance. A total of 190 cases fulfilled the inclusion/exclusion criteria. Positive culture rate from vitreous samples was 64.74%. Positive culture rate from aqueous sample was 32.11%. Bacteria isolated from aqueous samples matched those isolated from vitreous samples 78.68% of the time. The overall sensitivity was 38.21%, specificity: 75.51%, positive predictive value: 79.66%, negative predictive value: 32.74% ( p = 0.08). Subgroup analysis showed that anterior chamber taps in cases of post-surgical endophthalmitis had a moderate to low sensitivity (37.73%), high specificity (93%) and high positive predictive value (95%) ( p < 0.04). The sensitivity and specificity of anterior chamber tap are low and should not be used for critical therapeutic decisions in patients with suspected bacterial endophthalmitis. In cases of post-surgical endophthalmitis, the result of an anterior chamber tap could be used for therapeutic guidance, but only in conjunction with clinical presentation and in the absence of a better method for diagnosis.
Alles, Susan; Peng, Linda X; Mozola, Mark A
2009-01-01
A modification to Performance-Tested Method (PTM) 070601, Reveal Listeria Test (Reveal), is described. The modified method uses a new media formulation, LESS enrichment broth, in single-step enrichment protocols for both foods and environmental sponge and swab samples. Food samples are enriched for 27-30 h at 30 degrees C and environmental samples for 24-48 h at 30 degrees C. Implementation of these abbreviated enrichment procedures allows test results to be obtained on a next-day basis. In testing of 14 food types in internal comparative studies with inoculated samples, there was a statistically significant difference in performance between the Reveal and reference culture [U.S. Food and Drug Administration's Bacteriological Analytical Manual (FDA/BAM) or U.S. Department of Agriculture-Food Safety and Inspection Service (USDA-FSIS)] methods for only a single food in one trial (pasteurized crab meat) at the 27 h enrichment time point, with more positive results obtained with the FDA/BAM reference method. No foods showed statistically significant differences in method performance at the 30 h time point. Independent laboratory testing of 3 foods again produced a statistically significant difference in results for crab meat at the 27 h time point; otherwise results of the Reveal and reference methods were statistically equivalent. Overall, considering both internal and independent laboratory trials, sensitivity of the Reveal method relative to the reference culture procedures in testing of foods was 85.9% at 27 h and 97.1% at 30 h. Results from 5 environmental surfaces inoculated with various strains of Listeria spp. showed that the Reveal method was more productive than the reference USDA-FSIS culture procedure for 3 surfaces (stainless steel, plastic, and cast iron), whereas results were statistically equivalent to the reference method for the other 2 surfaces (ceramic tile and sealed concrete). An independent laboratory trial with ceramic tile inoculated with L. monocytogenes confirmed the effectiveness of the Reveal method at the 24 h time point. Overall, sensitivity of the Reveal method at 24 h relative to that of the USDA-FSIS method was 153%. The Reveal method exhibited extremely high specificity, with only a single false-positive result in all trials combined for overall specificity of 99.5%.
Statistical Literacy and Sample Survey Results
ERIC Educational Resources Information Center
McAlevey, Lynn; Sullivan, Charles
2010-01-01
Sample surveys are widely used in the social sciences and business. The news media almost daily quote from them, yet they are widely misused. Using students with prior managerial experience embarking on an MBA course, we show that common sample survey results are misunderstood even by those managers who have previously done a statistics course. In…
A method for determining the weak statistical stationarity of a random process
NASA Technical Reports Server (NTRS)
Sadeh, W. Z.; Koper, C. A., Jr.
1978-01-01
A method for determining the weak statistical stationarity of a random process is presented. The core of this testing procedure consists of generating an equivalent ensemble which approximates a true ensemble. Formation of an equivalent ensemble is accomplished through segmenting a sufficiently long time history of a random process into equal, finite, and statistically independent sample records. The weak statistical stationarity is ascertained based on the time invariance of the equivalent-ensemble averages. Comparison of these averages with their corresponding time averages over a single sample record leads to a heuristic estimate of the ergodicity of a random process. Specific variance tests are introduced for evaluating the statistical independence of the sample records, the time invariance of the equivalent-ensemble autocorrelations, and the ergodicity. Examination and substantiation of these procedures were conducted utilizing turbulent velocity signals.
[Evaluation of the Performance of Two Kinds of Anti-TP Enzyme-Linked Immunosorbent Assay].
Gao, Nan; Huang, Li-Qin; Wang, Rui; Jia, Jun-Jie; Wu, Shuo; Zhang, Jing; Ge, Hong-Wei
2018-06-01
To evaluate the accuracy and precision of 2 kinds of anti-treponema pallidum (anti-TP) ELISA reagents in our laboratory for detecting the anti-TP in voluntary blood donors, so as to provide the data support for use of ELISA reagents after introduction of chemiluminescene immunoassay (CLIA). The route detection of anti-TP was performed by using 2 kinds of ELISA reagents, then 546 responsive positive samples detected by anti-TP ELISA were collected, and the infections status of samples confirmed by treponema pallidum particle agglutination (TPPA) test was identified. The confirmed results of responsive samples detected by 2 kinds of anti-TP ELISA reagents were compared, the accuracy of 2 kinds of anti-TP ELISA reagents was analyzed by drawing ROC and comparing area under curve (AUC), and precision of 2 kinds of anti-TP ELISA reagents was compared by statistical analysis of quality control data from 7.1 2016 to 6.30 2017. There were no statistical difference in confirmed positive rate of responsive samples and weak positive samples between 2 kinds of anti-TP ELISA reagents. The responsive samples detected by 2 kinds of anti-TP ELISA reagents accounted for 85.53%(467/546) of all responsive samples, the positive rate confirmed by TPPA test was 82.87%. 44 responsive samples detected by anti-TP ELISA reagent A and 35 responsive samples detected by anti-TP ELISA reagent B were confirmed to be negative by TPPA test. Comparison of AUC showed that the accuracy of 2 kinds of anti-TP ELISA reagents was more high, the difference between 2 reagents was not statistically significant. The coefficient of variation (CV) of anti-TP ELISA reagent A and B was 14.98% and 18.04% respectively, which met the precision requirement of ELISA test. The accuracy and precision of 2 kinds of anti-TP ELISA reagents used in our laboratory are similar, and using any one of anti-TP ELISA reagents all can satisfy the requirements of blood screening.
Gomes, Cinthya Cristina; Guimarães, Ludmila Silva; Pinto, Larissa Christina Costa; Camargo, Gabriela Alessandra da Cruz Galhardo; Valente, Maria Isabel Bastos; Sarquis, Maria Inêz de Moura
2017-01-01
The aim of this study was to investigate the prevalence of isolated Candida albicans from periodontal endodontic lesions in diabetic and normoglycemic patients, and the fungi's virulence in different atmospheric conditions. A case-control study was conducted on 15 patients with type 2 diabetes mellitus (G1) and 15 non-diabetics (G2) with periodontal endodontic lesions. Samples of root canals and periodontal pockets were plated on CHROMagar for later identification by polymerase chain reaction (PCR) and virulence test. C. albicans was identified in 79.2% and 20.8% of the 60 samples collected from diabetic and normoglycemic patients, respectively. Of the 30 samples collected from periodontal pockets, 13 showed a positive culture for C. albicans, with 77% belonging to G1 and 23% to G2. Of the 11 positive samples from root canals, 82% were from G1 and 18% from G2. Production of proteinase presented a precipitation zone Pz<0.63 of 100% in G1 and 72% in G2, in redox and negative (Pz=1), under anaerobic conditions in both groups. Hydrophobicity of the strains from G1 indicated 16.4% with low, 19.3% with moderate, and 64.3% with high hydrophobicity in redox. In G2, 42.2% had low, 39.8% had moderate, 18% had high hydrophobicity in redox. In anaerobic conditions, G1 showed 15.2% with low, 12.8% with moderate, and 72% with high hydrophobicity; in G2, 33.6% had low, 28.8% had moderate, and 37.6% had high hydrophobicity. There was statistical difference in the number of positive cultures between G1 and G2 (p<0.05) with predominance in G1. There was statistical difference for all virulence factors, except hemolysis (p=0.001). Candida albicans was isolated more frequently and had higher virulence in diabetic patients.
42 CFR 402.7 - Notice of proposed determination.
Code of Federal Regulations, 2010 CFR
2010-10-01
... and a brief description of the statistical sampling technique CMS or OIG used. (3) The reason why the... is relying upon statistical sampling to project the number and types of claims or requests for...
ERIC Educational Resources Information Center
Bellera, Carine A.; Julien, Marilyse; Hanley, James A.
2010-01-01
The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Characterizations of linear sufficient statistics
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.
1977-01-01
A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
Developing the design of a continuous national health survey for New Zealand
2013-01-01
Background A continuously operating survey can yield advantages in survey management, field operations, and the provision of timely information for policymakers and researchers. We describe the key features of the sample design of the New Zealand (NZ) Health Survey, which has been conducted on a continuous basis since mid-2011, and compare to a number of other national population health surveys. Methods A number of strategies to improve the NZ Health Survey are described: implementation of a targeted dual-frame sample design for better Māori, Pacific, and Asian statistics; movement from periodic to continuous operation; use of core questions with rotating topic modules to improve flexibility in survey content; and opportunities for ongoing improvements and efficiencies, including linkage to administrative datasets. Results and discussion The use of disproportionate area sampling and a dual frame design resulted in reductions of approximately 19%, 26%, and 4% to variances of Māori, Pacific and Asian statistics respectively, but at the cost of a 17% increase to all-ethnicity variances. These were broadly in line with the survey’s priorities. Respondents provided a high degree of cooperation in the first year, with an adult response rate of 79% and consent rates for data linkage above 90%. Conclusions A combination of strategies tailored to local conditions gives the best results for national health surveys. In the NZ context, data from the NZ Census of Population and Dwellings and the Electoral Roll can be used to improve the sample design. A continuously operating survey provides both administrative and statistical advantages. PMID:24364838
Does quality of drinking water matter in kidney stone disease: A study in West Bengal, India.
Mitra, Pubali; Pal, Dilip Kumar; Das, Madhusudan
2018-05-01
The combined interaction of epidemiology, environmental exposure, dietary habits, and genetic factors causes kidney stone disease (KSD), a common public health problem worldwide. Because a high water intake (>3 L daily) is widely recommended by physicians to prevent KSD, the present study evaluated whether the quantity of water that people consume daily is associated with KSD and whether the quality of drinking water has any effect on disease prevalence. Information regarding residential address, daily volume of water consumption, and source of drinking water was collected from 1,266 patients with kidney stones in West Bengal, India. Drinking water was collected by use of proper methods from case (high stone prevalence) and control (zero stone prevalence) areas thrice yearly. Water samples were analyzed for pH, alkalinity, hardness, total dissolved solutes, electrical conductivity, and salinity. Average values of the studied parameters were compared to determine if there were any statistically significant differences between the case and control areas. We observed that as many as 53.6% of the patients consumed <3 L of water daily. Analysis of drinking water samples from case and control areas, however, did not show any statistically significant alterations in the studied parameters. All water samples were found to be suitable for consumption. It is not the quality of water, rather the quantity of water consumed that matters most in the occurrence of KSD.
da Silva, Rodrigo Marques; Goulart, Carolina Tonini; Lopes, Luis Felipe Dias; Serrano, Patrícia Maria; Costa, Ana Lucia Siqueira; de Azevedo Guido, Laura
2014-03-30
Nursing students may exhibit the characteristics of resistance to stress, such as hardiness, which can reduce the risk of burnout. However, we found only one published study about these phenomena among nursing students. Thus, we investigated the association between hardiness and burnout in such students. An analytic, cross-sectional study was conducted among 570 nursing students from three Brazilian universities. Data were collected relating to sociodemographic characteristics, hardiness, and burnout, which we analyzed using inferential statistics. We observed that 64.04% of nursing students in the sample had a high level of emotional exhaustion, 35.79% had a high level of cynicism, and 87.72% had a low level of professional efficacy: these are dimensions of burnout. We also found that 48.77% had a high level of control, 61.40% a high level of commitment, and 35.44% a high level of challenge: these are dimensions of hardiness. Only 24.74% of the students experienced burnout, and 21.93% met the criteria for a hardy personality. There was a statistically significant difference between the frequency of hardiness and burnout (p = 0.033), with 68.00% of hardy students not exhibiting burnout. Although nursing students live with educational stressors, burnout was not preponderant in our sample students; this may be linked to hardiness. Thus, given its benefits to student life and health, we recommend the development of strategies to promote hardiness among nursing students.
2014-01-01
Background Nursing students may exhibit the characteristics of resistance to stress, such as hardiness, which can reduce the risk of burnout. However, we found only one published study about these phenomena among nursing students. Thus, we investigated the association between hardiness and burnout in such students. Methods An analytic, cross-sectional study was conducted among 570 nursing students from three Brazilian universities. Data were collected relating to sociodemographic characteristics, hardiness, and burnout, which we analyzed using inferential statistics. Results We observed that 64.04% of nursing students in the sample had a high level of emotional exhaustion, 35.79% had a high level of cynicism, and 87.72% had a low level of professional efficacy: these are dimensions of burnout. We also found that 48.77% had a high level of control, 61.40% a high level of commitment, and 35.44% a high level of challenge: these are dimensions of hardiness. Only 24.74% of the students experienced burnout, and 21.93% met the criteria for a hardy personality. There was a statistically significant difference between the frequency of hardiness and burnout (p = 0.033), with 68.00% of hardy students not exhibiting burnout. Conclusions Although nursing students live with educational stressors, burnout was not preponderant in our sample students; this may be linked to hardiness. Thus, given its benefits to student life and health, we recommend the development of strategies to promote hardiness among nursing students. PMID:24678676
Sharpening method of satellite thermal image based on the geographical statistical model
NASA Astrophysics Data System (ADS)
Qi, Pengcheng; Hu, Shixiong; Zhang, Haijun; Guo, Guangmeng
2016-04-01
To improve the effectiveness of thermal sharpening in mountainous regions, paying more attention to the laws of land surface energy balance, a thermal sharpening method based on the geographical statistical model (GSM) is proposed. Explanatory variables were selected from the processes of land surface energy budget and thermal infrared electromagnetic radiation transmission, then high spatial resolution (57 m) raster layers were generated for these variables through spatially simulating or using other raster data as proxies. Based on this, the local adaptation statistical relationship between brightness temperature (BT) and the explanatory variables, i.e., the GSM, was built at 1026-m resolution using the method of multivariate adaptive regression splines. Finally, the GSM was applied to the high-resolution (57-m) explanatory variables; thus, the high-resolution (57-m) BT image was obtained. This method produced a sharpening result with low error and good visual effect. The method can avoid the blind choice of explanatory variables and remove the dependence on synchronous imagery at visible and near-infrared bands. The influences of the explanatory variable combination, sampling method, and the residual error correction on sharpening results were analyzed deliberately, and their influence mechanisms are reported herein.
Hughes, Sarah A; Huang, Rongfu; Mahaffey, Ashley; Chelme-Ayala, Pamela; Klamerth, Nikolaus; Meshref, Mohamed N A; Ibrahim, Mohamed D; Brown, Christine; Peru, Kerry M; Headley, John V; Gamal El-Din, Mohamed
2017-11-01
There are several established methods for the determination of naphthenic acids (NAs) in waters associated with oil sands mining operations. Due to their highly complex nature, measured concentration and composition of NAs vary depending on the method used. This study compared different common sample preparation techniques, analytical instrument methods, and analytical standards to measure NAs in groundwater and process water samples collected from an active oil sands operation. In general, the high- and ultrahigh-resolution methods, namely high performance liquid chromatography time-of-flight mass spectrometry (UPLC-TOF-MS) and Orbitrap mass spectrometry (Orbitrap-MS), were within an order of magnitude of the Fourier transform infrared spectroscopy (FTIR) methods. The gas chromatography mass spectrometry (GC-MS) methods consistently had the highest NA concentrations and greatest standard error. Total NAs concentration was not statistically different between sample preparation of solid phase extraction and liquid-liquid extraction. Calibration standards influenced quantitation results. This work provided a comprehensive understanding of the inherent differences in the various techniques available to measure NAs and hence the potential differences in measured amounts of NAs in samples. Results from this study will contribute to the analytical method standardization for NA analysis in oil sands related water samples. Copyright © 2017 Elsevier Ltd. All rights reserved.
García-Diego, Fernando-Juan; Verticchio, Elena; Beltrán, Pedro; Siani, Anna Maria
2016-08-15
Monitoring temperature and relative humidity of the environment to which artefacts are exposed is fundamental in preventive conservation studies. The common approach in setting measuring instruments is the choice of a high sampling rate to detect short fluctuations and increase the accuracy of statistical analysis. However, in recent cultural heritage standards the evaluation of variability is based on moving average and short fluctuations and therefore massive acquisition of data in slowly-changing indoor environments could end up being redundant. In this research, the sampling frequency to set a datalogger in a museum room and inside a microclimate frame is investigated by comparing the outcomes obtained from datasheets associated with different sampling conditions. Thermo-hygrometric data collected in the Sorolla room of the Pio V Museum of Valencia (Spain) were used and the widely consulted recommendations issued in UNI 10829:1999 and EN 15757:2010 standards and in the American Society of Heating, Air-Conditioning and Refrigerating Engineers (ASHRAE) guidelines were applied. Hourly sampling proved effective in obtaining highly reliable results. Furthermore, it was found that in some instances daily means of data sampled every hour can lead to the same conclusions as those of high frequency. This allows us to improve data logging design and manageability of the resulting datasheets.
García-Diego, Fernando-Juan; Verticchio, Elena; Beltrán, Pedro; Siani, Anna Maria
2016-01-01
Monitoring temperature and relative humidity of the environment to which artefacts are exposed is fundamental in preventive conservation studies. The common approach in setting measuring instruments is the choice of a high sampling rate to detect short fluctuations and increase the accuracy of statistical analysis. However, in recent cultural heritage standards the evaluation of variability is based on moving average and short fluctuations and therefore massive acquisition of data in slowly-changing indoor environments could end up being redundant. In this research, the sampling frequency to set a datalogger in a museum room and inside a microclimate frame is investigated by comparing the outcomes obtained from datasheets associated with different sampling conditions. Thermo-hygrometric data collected in the Sorolla room of the Pio V Museum of Valencia (Spain) were used and the widely consulted recommendations issued in UNI 10829:1999 and EN 15757:2010 standards and in the American Society of Heating, Air-Conditioning and Refrigerating Engineers (ASHRAE) guidelines were applied. Hourly sampling proved effective in obtaining highly reliable results. Furthermore, it was found that in some instances daily means of data sampled every hour can lead to the same conclusions as those of high frequency. This allows us to improve data logging design and manageability of the resulting datasheets. PMID:27537886
NASA Astrophysics Data System (ADS)
Williams, Arnold C.; Pachowicz, Peter W.
2004-09-01
Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.
Zeng, Eddy Y; Tsukada, David; Diehl, Dario W
2004-11-01
Solid-phase microextraction (SPME) has been used as an in situ sampling technique for a wide range of volatile organic chemicals, but SPME field sampling of nonvolatile organic pollutants has not been reported. This paper describes the development of an SPME-based sampling method employing a poly(dimethylsiloxane) (PDMS)-coated (100-microm thickness) fiber as the sorbent phase. The laboratory-calibrated PDMS-coated fibers were used to construct SPME samplers, and field tests were conducted at three coastal locations off southern California to determine the equilibrium sampling time and compare the efficacy of the SPME samplers with that of an Infiltrex 100 water pumping system (Axys Environmental Systems Ltd., Sidney, British Columbia, Canada). p,p'-DDE and o,p'-DDE were the components consistently detected in the SPME samples among 42 polychlorinated biphenyl congeners and 17 chlorinated pesticidestargeted. SPME samplers deployed attwo locations with moderate and high levels of contamination for 18 and 30 d, respectively, attained statistically identical concentrations of p,p'-DDE and o,p'-DDE. In addition, SPME samplers deployed for 23 and 43 d, respectively, at a location of low contamination also contained statistically identical concentrations of p,p'-DDE. These results indicate that equilibrium could be reached within 18 to 23 d. The concentrations of p,p'-DDE, o,p'-DDE, or p,p'-DDD obtained with the SPME samplers and the Infiltrex 100 system were virtually identical. In particular, two water column concentration profiles of p,p'-DDE and o,p'-DDE acquired by the SPME samplers at a highly contaminated site on the Palos Verdes Shelf overlapped with the profiles obtained by the Infiltrex 100 system in 1997. The field tests not only reveal the advantages of the SPME samplers compared to the Infiltrex 100 system and other integrative passive devices but also indicate the need to improve the sensitivity of the SPME-based sampling technique.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Stochastic inference with spiking neurons in the high-conductance state
NASA Astrophysics Data System (ADS)
Petrovici, Mihai A.; Bill, Johannes; Bytschok, Ilja; Schemmel, Johannes; Meier, Karlheinz
2016-10-01
The highly variable dynamics of neocortical circuits observed in vivo have been hypothesized to represent a signature of ongoing stochastic inference but stand in apparent contrast to the deterministic response of neurons measured in vitro. Based on a propagation of the membrane autocorrelation across spike bursts, we provide an analytical derivation of the neural activation function that holds for a large parameter space, including the high-conductance state. On this basis, we show how an ensemble of leaky integrate-and-fire neurons with conductance-based synapses embedded in a spiking environment can attain the correct firing statistics for sampling from a well-defined target distribution. For recurrent networks, we examine convergence toward stationarity in computer simulations and demonstrate sample-based Bayesian inference in a mixed graphical model. This points to a new computational role of high-conductance states and establishes a rigorous link between deterministic neuron models and functional stochastic dynamics on the network level.
Dowall, Stuart D; Graham, Victoria A; Tipton, Thomas R W; Hewson, Roger
2009-08-31
Work with highly pathogenic material mandates the use of biological containment facilities, involving microbiological safety cabinets and specialist laboratory engineering structures typified by containment level 3 (CL3) and CL4 laboratories. Consequences of working in high containment are the practical difficulties associated with containing specialist assays and equipment often essential for experimental analyses. In an era of increased interest in biodefence pathogens and emerging diseases, immunological analysis has developed rapidly alongside traditional techniques in virology and molecular biology. For example, in order to maximise the use of small sample volumes, multiplexing has become a more popular and widespread approach to quantify multiple analytes simultaneously, such as cytokines and chemokines. The luminex microsphere system allows for the detection of many cytokines and chemokines in a single sample, but the detection method of using aligned lasers and fluidics means that samples often have to be analysed in low containment facilities. In order to perform cytokine analysis in materials from high containment (CL3 and CL4 laboratories), we have developed an appropriate inactivation methodology after staining steps, which although results in a reduction of median fluorescent intensity, produces statistically comparable outcomes when judged against non-inactivated samples. This methodology thus extends the use of luminex technology for material that contains highly pathogenic biological agents.
Kovač, Marko; Bauer, Arthur; Ståhl, Göran
2014-01-01
Backgrounds, Material and Methods To meet the demands of sustainable forest management and international commitments, European nations have designed a variety of forest-monitoring systems for specific needs. While the majority of countries are committed to independent, single-purpose inventorying, a minority of countries have merged their single-purpose forest inventory systems into integrated forest resource inventories. The statistical efficiencies of the Bavarian, Slovene and Swedish integrated forest resource inventory designs are investigated with the various statistical parameters of the variables of growing stock volume, shares of damaged trees, and deadwood volume. The parameters are derived by using the estimators for the given inventory designs. The required sample sizes are derived via the general formula for non-stratified independent samples and via statistical power analyses. The cost effectiveness of the designs is compared via two simple cost effectiveness ratios. Results In terms of precision, the most illustrative parameters of the variables are relative standard errors; their values range between 1% and 3% if the variables’ variations are low (s%<80%) and are higher in the case of higher variations. A comparison of the actual and required sample sizes shows that the actual sample sizes were deliberately set high to provide precise estimates for the majority of variables and strata. In turn, the successive inventories are statistically efficient, because they allow detecting the mean changes of variables with powers higher than 90%; the highest precision is attained for the changes of growing stock volume and the lowest for the changes of the shares of damaged trees. Two indicators of cost effectiveness also show that the time input spent for measuring one variable decreases with the complexity of inventories. Conclusion There is an increasing need for credible information on forest resources to be used for decision making and national and international policy making. Such information can be cost-efficiently provided through integrated forest resource inventories. PMID:24941120
Statistical considerations for agroforestry studies
James A. Baldwin
1993-01-01
Statistical topics that related to agroforestry studies are discussed. These included study objectives, populations of interest, sampling schemes, sample sizes, estimation vs. hypothesis testing, and P-values. In addition, a relatively new and very much improved histogram display is described.
Frazier, Thomas W; Ratliff, Kristin R; Gruber, Chris; Zhang, Yi; Law, Paul A; Constantino, John N
2014-01-01
Understanding the factor structure of autistic symptomatology is critical to the discovery and interpretation of causal mechanisms in autism spectrum disorder. We applied confirmatory factor analysis and assessment of measurement invariance to a large (N = 9635) accumulated collection of reports on quantitative autistic traits using the Social Responsiveness Scale, representing a broad diversity of age, severity, and reporter type. A two-factor structure (corresponding to social communication impairment and restricted, repetitive behavior) as elaborated in the updated Diagnostic and Statistical Manual of Mental Disorders (5th ed.; DSM-5) criteria for autism spectrum disorder exhibited acceptable model fit in confirmatory factor analysis. Measurement invariance was appreciable across age, sex, and reporter (self vs other), but somewhat less apparent between clinical and nonclinical populations in this sample comprised of both familial and sporadic autism spectrum disorders. The statistical power afforded by this large sample allowed relative differentiation of three factors among items encompassing social communication impairment (emotion recognition, social avoidance, and interpersonal relatedness) and two factors among items encompassing restricted, repetitive behavior (insistence on sameness and repetitive mannerisms). Cross-trait correlations remained extremely high, that is, on the order of 0.66-0.92. These data clarify domains of statistically significant factoral separation that may relate to partially-but not completely-overlapping biological mechanisms, contributing to variation in human social competency. Given such robust intercorrelations among symptom domains, understanding their co-emergence remains a high priority in conceptualizing common neural mechanisms underlying autistic syndromes.
Randomized trials are frequently fragmented in multiple secondary publications.
Ebrahim, Shanil; Montoya, Luis; Kamal El Din, Mostafa; Sohani, Zahra N; Agarwal, Arnav; Bance, Sheena; Saquib, Juliann; Saquib, Nazmus; Ioannidis, John P A
2016-11-01
To assess the frequency and features of secondary publications of randomized controlled trials (RCTs). For 191 RCTs published in high-impact journals in 2009, we searched for secondary publications coauthored by at least one same author of the primary trial publication. We evaluated the probability of having secondary publications, characteristics of the primary trial publication that predict having secondary publications, types of secondary analyses conducted, and statistical significance of those analyses. Of 191 primary trials, 88 (46%) had a total of 475 secondary publications by 2/2014. Eight trials had >10 (up to 51) secondary publications each. In multivariable modeling, the risk of having subsequent secondary publications increased 1.32-fold (95% CI 1.05-1.68) per 10-fold increase in sample size, and 1.71-fold (95% CI 1.19-2.45) in the presence of a design article. In a sample of 197 secondary publications examined in depth, 193 tested different hypotheses than the primary publication. Of the 193, 43 tested differences between subgroups, 85 assessed predictive factors associated with an outcome of interest, 118 evaluated different outcomes than the original article, 71 had differences in eligibility criteria, and 21 assessed different durations of follow-up; 176 (91%) presented at least one analysis with statistically significant results. Approximately half of randomized trials in high-impact journals have secondary publications published with a few trials followed by numerous secondary publications. Almost all of these publications report some statistically significant results. Copyright © 2016 Elsevier Inc. All rights reserved.
Forensic discrimination of copper wire using trace element concentrations.
Dettman, Joshua R; Cassabaum, Alyssa A; Saunders, Christopher P; Snyder, Deanna L; Buscaglia, JoAnn
2014-08-19
Copper may be recovered as evidence in high-profile cases such as thefts and improvised explosive device incidents; comparison of copper samples from the crime scene and those associated with the subject of an investigation can provide probative associative evidence and investigative support. A solution-based inductively coupled plasma mass spectrometry method for measuring trace element concentrations in high-purity copper was developed using standard reference materials. The method was evaluated for its ability to use trace element profiles to statistically discriminate between copper samples considering the precision of the measurement and manufacturing processes. The discriminating power was estimated by comparing samples chosen on the basis of the copper refining and production process to represent the within-source (samples expected to be similar) and between-source (samples expected to be different) variability using multivariate parametric- and empirical-based data simulation models with bootstrap resampling. If the false exclusion rate is set to 5%, >90% of the copper samples can be correctly determined to originate from different sources using a parametric-based model and >87% with an empirical-based approach. These results demonstrate the potential utility of the developed method for the comparison of copper samples encountered as forensic evidence.
NASA Astrophysics Data System (ADS)
Roy, P. K.; Pal, S.; Banerjee, G.; Biswas Roy, M.; Ray, D.; Majumder, A.
2014-12-01
River is considered as one of the main sources of freshwater all over the world. Hence analysis and maintenance of this water resource is globally considered a matter of major concern. This paper deals with the assessment of surface water quality of the Ichamati river using multivariate statistical techniques. Eight distinct surface water quality observation stations were located and samples were collected. For the samples collected statistical techniques were applied to the physico-chemical parameters and depth of siltation. In this paper cluster analysis is done to determine the relations between surface water quality and siltation depth of river Ichamati. Multiple regressions and mathematical equation modeling have been done to characterize surface water quality of Ichamati river on the basis of physico-chemical parameters. It was found that surface water quality of the downstream river was different from the water quality of the upstream. The analysis of the water quality parameters of the Ichamati river clearly indicate high pollution load on the river water which can be accounted to agricultural discharge, tidal effect and soil erosion. The results further reveal that with the increase in depth of siltation, water quality degraded.
Ao, Xiaoping; Stenken, Julie A
2003-09-01
Microdialysis relative recovery (RR) enhancement using different water-soluble, epichlorohydrin-based cyclodextrin polymers (CD-EPS) was studied in vitro for different analytes, amitryptiline, carbamazepine, hydroquinone, ibuprofen, and 4-nitrophenol. When compared to the native CDs (alpha, beta, and gamma) on a per mole basis, the CD-EPS enhanced microdialysis RR was either statistically greater or the same. beta-CD-EPS was more highly retained than native beta-CD by a 20 000 Da molecular weight cutoff (MWCO) polycarbonate membrane, but showed no statistical difference for loss across a 100 000 Da MWCO polyethersulfone membrane (PES). When the same weight percent of beta-CD or beta-CD-EPS was included in the microdialysis perfusion fluid, the beta-CD-EPS produced a higher microdialysis RR than native beta-CD for all analytes across the PES membrane. However, enhancements for the PC membrane were statistically insignificant when beta-CD and beta-CD-EPS were compared on a per mole basis. These results suggest that CD-EPS may be used as effective enhancement agents during microdialysis sampling and for some membranes provide the additional advantage of being retained more than native CDs.
Patterson, Megan S; Goodson, Patricia
2017-05-01
Compulsive exercise, a form of unhealthy exercise often associated with prioritizing exercise and feeling guilty when exercise is missed, is a common precursor to and symptom of eating disorders. College-aged women are at high risk of exercising compulsively compared with other groups. Social network analysis (SNA) is a theoretical perspective and methodology allowing researchers to observe the effects of relational dynamics on the behaviors of people. SNA was used to assess the relationship between compulsive exercise and body dissatisfaction, physical activity, and network variables. Descriptive statistics were conducted using SPSS, and quadratic assignment procedure (QAP) analyses were conducted using UCINET. QAP regression analysis revealed a statistically significant model (R 2 = .375, P < .0001) predicting compulsive exercise behavior. Physical activity, body dissatisfaction, and network variables were statistically significant predictor variables in the QAP regression model. In our sample, women who are connected to "important" or "powerful" people in their network are likely to have higher compulsive exercise scores. This result provides healthcare practitioners key target points for intervention within similar groups of women. For scholars researching eating disorders and associated behaviors, this study supports looking into group dynamics and network structure in conjunction with body dissatisfaction and exercise frequency.
Landes, Reid D.; Lensing, Shelly Y.; Kodell, Ralph L.; Hauer-Jensen, Martin
2014-01-01
The dose of a substance that causes death in P% of a population is called an LDP, where LD stands for lethal dose. In radiation research, a common LDP of interest is the radiation dose that kills 50% of the population by a specified time, i.e., lethal dose 50 or LD50. When comparing LD50 between two populations, relative potency is the parameter of interest. In radiation research, this is commonly known as the dose reduction factor (DRF). Unfortunately, statistical inference on dose reduction factor is seldom reported. We illustrate how to calculate confidence intervals for dose reduction factor, which may then be used for statistical inference. Further, most dose reduction factor experiments use hundreds, rather than tens of animals. Through better dosing strategies and the use of a recently available sample size formula, we also show how animal numbers may be reduced while maintaining high statistical power. The illustrations center on realistic examples comparing LD50 values between a radiation countermeasure group and a radiation-only control. We also provide easy-to-use spreadsheets for sample size calculations and confidence interval calculations, as well as SAS® and R code for the latter. PMID:24164553
NASA Technical Reports Server (NTRS)
Jerde, Eric A.; Warren, Paul H.; Morris, Richard V.
1990-01-01
Bulk compositions of 21 Apollo regolith breccias were determined using an INAA procedure modified from that of Kallemeyn et al. (1989). With one major exception, namely, the 14076,1 sample, the regolith breccias analyzed were found to be not significantly different from the surfaces from which they were collected. In contrast, the 14076,1 sample from the Fra Mauro (Apollo 14) region is a highly anorthositic regolith breccia from a site where anorthosites are extremely scarce. The sample's composition resembles soils from the Descartes (Apollo 16) highlands. However, the low statistical probability for long-distance horizontal transport by impact cratering, together with the relatively high contents of imcompatible elements in 14076,1 suggest that this regolith breccia originated within a few hundred kilometers of the Apollo 14 site. Its compositional resemblance to ferroan anorthosite strengthens the hypothesis that ferroan anorthosite originated as the flotation crust of a global magmasphere.
Yan, Jianjun; Shen, Xiaojing; Wang, Yiqin; Li, Fufeng; Xia, Chunming; Guo, Rui; Chen, Chunfeng; Shen, Qingwei
2010-01-01
This study aims at utilising Wavelet Packet Transform (WPT) and Support Vector Machine (SVM) algorithm to make objective analysis and quantitative research for the auscultation in Traditional Chinese Medicine (TCM) diagnosis. First, Wavelet Packet Decomposition (WPD) at level 6 was employed to split more elaborate frequency bands of the auscultation signals. Then statistic analysis was made based on the extracted Wavelet Packet Energy (WPE) features from WPD coefficients. Furthermore, the pattern recognition was used to distinguish mixed subjects' statistical feature values of sample groups through SVM. Finally, the experimental results showed that the classification accuracies were at a high level.
NASA Astrophysics Data System (ADS)
Aguilar-Arevalo, A. A.; Anderson, C. E.; Bazarko, A. O.; Brice, S. J.; Brown, B. C.; Bugel, L.; Cao, J.; Coney, L.; Conrad, J. M.; Cox, D. C.; Curioni, A.; Djurcic, Z.; Finley, D. A.; Fleming, B. T.; Ford, R.; Garcia, F. G.; Garvey, G. T.; Green, C.; Green, J. A.; Hart, T. L.; Hawker, E.; Imlay, R.; Johnson, R. A.; Karagiorgi, G.; Kasper, P.; Katori, T.; Kobilarcik, T.; Kourbanis, I.; Koutsoliotas, S.; Laird, E. M.; Linden, S. K.; Link, J. M.; Liu, Y.; Liu, Y.; Louis, W. C.; Mahn, K. B. M.; Marsh, W.; McGary, V. T.; McGregor, G.; Metcalf, W.; Meyers, P. D.; Mills, F.; Mills, G. B.; Monroe, J.; Moore, C. D.; Nelson, R. H.; Nienaber, P.; Nowak, J. A.; Osmanov, B.; Ouedraogo, S.; Patterson, R. B.; Perevalov, D.; Polly, C. C.; Prebys, E.; Raaf, J. L.; Ray, H.; Roe, B. P.; Russell, A. D.; Sandberg, V.; Schirato, R.; Schmitz, D.; Shaevitz, M. H.; Shoemaker, F. C.; Smith, D.; Soderberg, M.; Sorel, M.; Spentzouris, P.; Spitz, J.; Stancu, I.; Stefanski, R. J.; Sung, M.; Tanaka, H. A.; Tayloe, R.; Tzanov, M.; van de Water, R.; Wascko, M. O.; White, D. H.; Wilking, M. J.; Yang, H. J.; Zeller, G. P.; Zimmerman, E. D.
2009-08-01
Using high statistics samples of charged-current νμ interactions, the MiniNooNE Collaboration reports a measurement of the single-charged-pion production to quasielastic cross section ratio on mineral oil (CH2), both with and without corrections for hadron reinteractions in the target nucleus. The result is provided as a function of neutrino energy in the range 0.4GeV
Ganna, Andrea; Lee, Donghwan; Ingelsson, Erik; Pawitan, Yudi
2015-07-01
It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a 'training' sample that are replicated in a 'validation' sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed. © The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
ERIC Educational Resources Information Center
Noll, Jennifer; Hancock, Stacey
2015-01-01
This research investigates what students' use of statistical language can tell us about their conceptions of distribution and sampling in relation to informal inference. Prior research documents students' challenges in understanding ideas of distribution and sampling as tools for making informal statistical inferences. We know that these…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1995 National Sample.
ERIC Educational Resources Information Center
Meeker, Bradley
Based on responses by 405 public two-year colleges in the United States to 2 surveys, this report provides comparative financial information for fiscal year 1994-95. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…
Comparative Financial Statistics for Public Two-Year Colleges: FY 1994 National Sample.
ERIC Educational Resources Information Center
Dickmeyer, Nathan; Meeker, Bradley
Based on responses by 427 public two-year colleges in the United States to two surveys, this report provides comparative financial information for fiscal year 1993-94. The report provides space for colleges to compare their institutional statistics with national sample medians, quartile data for the national sample, and tables and graphs of…
USDA-ARS?s Scientific Manuscript database
Small, coded, pill-sized tracers embedded in grain are proposed as a method for grain traceability. A sampling process for a grain traceability system was designed and investigated by applying probability statistics using a science-based sampling approach to collect an adequate number of tracers fo...
People Patterns: Statistics. Environmental Module for Use in a Mathematics Laboratory Setting.
ERIC Educational Resources Information Center
Zastrocky, Michael; Trojan, Arthur
This module on statistics consists of 18 worksheets that cover such topics as sample spaces, mean, median, mode, taking samples, posting results, analyzing data, and graphing. The last four worksheets require the students to work with samples and use these to compare people's responses. A computer dating service is one result of this work.…
Zooming in on vibronic structure by lowest-value projection reconstructed 4D coherent spectroscopy
NASA Astrophysics Data System (ADS)
Harel, Elad
2018-05-01
A fundamental goal of chemical physics is an understanding of microscopic interactions in liquids at and away from equilibrium. In principle, this microscopic information is accessible by high-order and high-dimensionality nonlinear optical measurements. Unfortunately, the time required to execute such experiments increases exponentially with the dimensionality, while the signal decreases exponentially with the order of the nonlinearity. Recently, we demonstrated a non-uniform acquisition method based on radial sampling of the time-domain signal [W. O. Hutson et al., J. Phys. Chem. Lett. 9, 1034 (2018)]. The four-dimensional spectrum was then reconstructed by filtered back-projection using an inverse Radon transform. Here, we demonstrate an alternative reconstruction method based on the statistical analysis of different back-projected spectra which results in a dramatic increase in sensitivity and at least a 100-fold increase in dynamic range compared to conventional uniform sampling and Fourier reconstruction. These results demonstrate that alternative sampling and reconstruction methods enable applications of increasingly high-order and high-dimensionality methods toward deeper insights into the vibronic structure of liquids.
Willoughby, Timothy C.
2000-01-01
In June 1992, a wet-deposition collection site was established at the Gary (Indiana) Regional Airport to monitor the quantity and chemical quality of wet deposition. During the first phase of sampling, 48 wet-deposition samples were collected between June 30, 1992, and August 31, 1993. A second phase of sampling began in October 1995. During the second phase of sampling, 40 wet-deposition samples were collected between October 17, 1995, and November 12, 1996. This report presents the findings for the second phase of sampling and compares those results to the first phase of sampling. Northwestern Indiana is a heavily industrialized area. Steel production and petroleum refining are two of the area?s predominant industries. High-temperature processes, such as fossil-fuel combustion and steel production, release contaminants to the atmosphere that may result in wet deposition being a major contributor to major-ion and trace-metal loadings in northwestern Indiana and Lake Michigan. Wet-deposition samples collected during the first and second phases of sampling were analyzed for pH, specific conductance, and selected major ions and trace metals. Forty weekly wet-deposition samples were collected at the Gary (Indiana) Regional Airport during the second phase of sampling. Approximately 1.2 times as much wet deposition was collected during the second phase of sampling compared to the first phase. Statistically significant increases (at the 5-percent significance level) in concentrations of potassium, iron, lead, and zinc were determined for samples collected during the second phase of sampling when compared to the first. No statistically significant differences were determined in constituent concentrations between samples collected during warm weather (April 1 through October 31) and during cold weather (November 1 through March 31). Annual loadings for the second phase of sampling were greater than 2 times the loadings determined during the first phase of sampling for silica, iron, potassium, lead, and zinc.
Fu, Wenjiang J.; Stromberg, Arnold J.; Viele, Kert; Carroll, Raymond J.; Wu, Guoyao
2009-01-01
Over the past two decades, there have been revolutionary developments in life science technologies characterized by high throughput, high efficiency, and rapid computation. Nutritionists now have the advanced methodologies for the analysis of DNA, RNA, protein, low-molecular-weight metabolites, as well as access to bioinformatics databases. Statistics, which can be defined as the process of making scientific inferences from data that contain variability, has historically played an integral role in advancing nutritional sciences. Currently, in the era of systems biology, statistics has become an increasingly important tool to quantitatively analyze information about biological macromolecules. This article describes general terms used in statistical analysis of large, complex experimental data. These terms include experimental design, power analysis, sample size calculation, and experimental errors (type I and II errors) for nutritional studies at population, tissue, cellular, and molecular levels. In addition, we highlighted various sources of experimental variations in studies involving microarray gene expression, real-time polymerase chain reaction, proteomics, and other bioinformatics technologies. Moreover, we provided guidelines for nutritionists and other biomedical scientists to plan and conduct studies and to analyze the complex data. Appropriate statistical analyses are expected to make an important contribution to solving major nutrition-associated problems in humans and animals (including obesity, diabetes, cardiovascular disease, cancer, ageing, and intrauterine fetal retardation). PMID:20233650
Libiger, Ondrej; Schork, Nicholas J.
2015-01-01
It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061
Efficient statistical tests to compare Youden index: accounting for contingency correlation.
Chen, Fangyao; Xue, Yuqiang; Tan, Ming T; Chen, Pingyan
2015-04-30
Youden index is widely utilized in studies evaluating accuracy of diagnostic tests and performance of predictive, prognostic, or risk models. However, both one and two independent sample tests on Youden index have been derived ignoring the dependence (association) between sensitivity and specificity, resulting in potentially misleading findings. Besides, paired sample test on Youden index is currently unavailable. This article develops efficient statistical inference procedures for one sample, independent, and paired sample tests on Youden index by accounting for contingency correlation, namely associations between sensitivity and specificity and paired samples typically represented in contingency tables. For one and two independent sample tests, the variances are estimated by Delta method, and the statistical inference is based on the central limit theory, which are then verified by bootstrap estimates. For paired samples test, we show that the estimated covariance of the two sensitivities and specificities can be represented as a function of kappa statistic so the test can be readily carried out. We then show the remarkable accuracy of the estimated variance using a constrained optimization approach. Simulation is performed to evaluate the statistical properties of the derived tests. The proposed approaches yield more stable type I errors at the nominal level and substantially higher power (efficiency) than does the original Youden's approach. Therefore, the simple explicit large sample solution performs very well. Because we can readily implement the asymptotic and exact bootstrap computation with common software like R, the method is broadly applicable to the evaluation of diagnostic tests and model performance. Copyright © 2015 John Wiley & Sons, Ltd.
Miller, Michael A; Colby, Alison C C; Kanehl, Paul D; Blocksom, Karen
2009-03-01
The Wisconsin Department of Natural Resources (WDNR), with support from the U.S. EPA, conducted an assessment of wadeable streams in the Driftless Area ecoregion in western Wisconsin using a probabilistic sampling design. This ecoregion encompasses 20% of Wisconsin's land area and contains 8,800 miles of perennial streams. Randomly-selected stream sites (n = 60) equally distributed among stream orders 1-4 were sampled. Watershed land use, riparian and in-stream habitat, water chemistry, macroinvertebrate, and fish assemblage data were collected at each true random site and an associated "modified-random" site on each stream that was accessed via a road crossing nearest to the true random site. Targeted least-disturbed reference sites (n = 22) were also sampled to develop reference conditions for various physical, chemical, and biological measures. Cumulative distribution function plots of various measures collected at the true random sites evaluated with reference condition thresholds, indicate that high proportions of the random sites (and by inference the entire Driftless Area wadeable stream population) show some level of degradation. Study results show no statistically significant differences between the true random and modified-random sample sites for any of the nine physical habitat, 11 water chemistry, seven macroinvertebrate, or eight fish metrics analyzed. In Wisconsin's Driftless Area, 79% of wadeable stream lengths were accessible via road crossings. While further evaluation of the statistical rigor of using a modified-random sampling design is warranted, sampling randomly-selected stream sites accessed via the nearest road crossing may provide a more economical way to apply probabilistic sampling in stream monitoring programs.
2015-07-24
Business Office Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least...19 Parameters ______________________________________________________________________________ 19 Statistical ...the UBO Manual at the six MTFs reviewed. Based on the statistical sample, there were 144,930 claims worth $34.8 million that had at least one
Marsh, Adam G.; Cottrell, Matthew T.; Goldman, Morton F.
2016-01-01
Epigenetics is a rapidly developing field focused on deciphering chemical fingerprints that accumulate on human genomes over time. As the nascent idea of precision medicine expands to encompass epigenetic signatures of diagnostic and prognostic relevance, there is a need for methodologies that provide high-throughput DNA methylation profiling measurements. Here we report a novel quantification methodology for computationally reconstructing site-specific CpG methylation status from next generation sequencing (NGS) data using methyl-sensitive restriction endonucleases (MSRE). An integrated pipeline efficiently incorporates raw NGS metrics into a statistical discrimination platform to identify functional linkages between shifts in epigenetic DNA methylation and disease phenotypes in samples being analyzed. In this pilot proof-of-concept study we quantify and compare DNA methylation in blood serum of individuals with Parkinson's Disease relative to matched healthy blood profiles. Even with a small study of only six samples, a high degree of statistical discrimination was achieved based on CpG methylation profiles between groups, with 1008 statistically different CpG sites (p < 0.0025, after false discovery rate correction). A methylation load calculation was used to assess higher order impacts of methylation shifts on genes and pathways and most notably identified FGF3, FGF8, HTT, KMTA5, MIR8073, and YWHAG as differentially methylated genes with high relevance to Parkinson's Disease and neurodegeneration (based on PubMed literature citations). Of these, KMTA5 is a histone methyl-transferase gene and HTT is Huntington Disease Protein or Huntingtin, for which there are well established neurodegenerative impacts. The future need for precision diagnostics now requires more tools for exploring epigenetic processes that may be linked to cellular dysfunction and subsequent disease progression. PMID:27853465
Pohl, Lydia; Kölbl, Angelika; Werner, Florian; Mueller, Carsten W; Höschen, Carmen; Häusler, Werner; Kögel-Knabner, Ingrid
2018-04-30
Aluminium (Al)-substituted goethite is ubiquitous in soils and sediments. The extent of Al-substitution affects the physicochemical properties of the mineral and influences its macroscale properties. Bulk analysis only provides total Al/Fe ratios without providing information with respect to the Al-substitution of single minerals. Here, we demonstrate that nanoscale secondary ion mass spectrometry (NanoSIMS) enables the precise determination of Al-content in single minerals, while simultaneously visualising the variation of the Al/Fe ratio. Al-substituted goethite samples were synthesized with increasing Al concentrations of 0.1, 3, and 7 % and analysed by NanoSIMS in combination with established bulk spectroscopic methods (XRD, FTIR, Mössbauer spectroscopy). The high spatial resolution (50-150 nm) of NanoSIMS is accompanied by a high number of single-point measurements. We statistically evaluated the Al/Fe ratios derived from NanoSIMS, while maintaining the spatial information and reassigning it to its original localization. XRD analyses confirmed increasing concentration of incorporated Al within the goethite structure. Mössbauer spectroscopy revealed 11 % of the goethite samples generated at high Al concentrations consisted of hematite. The NanoSIMS data show that the Al/Fe ratios are in agreement with bulk data derived from total digestion and demonstrated small spatial variability between single-point measurements. More advantageously, statistical analysis and reassignment of single-point measurements allowed us to identify distinct spots with significantly higher or lower Al/Fe ratios. NanoSIMS measurements confirmed the capacity to produce images, which indicated the uniform increase in Al-concentrations in goethite. Using a combination of statistical analysis with information from complementary spectroscopic techniques (XRD, FTIR and Mössbauer spectroscopy) we were further able to reveal spots with lower Al/Fe ratios as hematite. Copyright © 2018 John Wiley & Sons, Ltd.
ARK: Aggregation of Reads by K-Means for Estimation of Bacterial Community Composition.
Koslicki, David; Chatterjee, Saikat; Shahrivar, Damon; Walker, Alan W; Francis, Suzanna C; Fraser, Louise J; Vehkaperä, Mikko; Lan, Yueheng; Corander, Jukka
2015-01-01
Estimation of bacterial community composition from high-throughput sequenced 16S rRNA gene amplicons is a key task in microbial ecology. Since the sequence data from each sample typically consist of a large number of reads and are adversely impacted by different levels of biological and technical noise, accurate analysis of such large datasets is challenging. There has been a recent surge of interest in using compressed sensing inspired and convex-optimization based methods to solve the estimation problem for bacterial community composition. These methods typically rely on summarizing the sequence data by frequencies of low-order k-mers and matching this information statistically with a taxonomically structured database. Here we show that the accuracy of the resulting community composition estimates can be substantially improved by aggregating the reads from a sample with an unsupervised machine learning approach prior to the estimation phase. The aggregation of reads is a pre-processing approach where we use a standard K-means clustering algorithm that partitions a large set of reads into subsets with reasonable computational cost to provide several vectors of first order statistics instead of only single statistical summarization in terms of k-mer frequencies. The output of the clustering is then processed further to obtain the final estimate for each sample. The resulting method is called Aggregation of Reads by K-means (ARK), and it is based on a statistical argument via mixture density formulation. ARK is found to improve the fidelity and robustness of several recently introduced methods, with only a modest increase in computational complexity. An open source, platform-independent implementation of the method in the Julia programming language is freely available at https://github.com/dkoslicki/ARK. A Matlab implementation is available at http://www.ee.kth.se/ctsoftware.
Pollen diversity and volatile variability of honey from Corsican Anthyllis hermanniae L. habitat.
Yang, Yin; Battesti, Marie-José; Paolini, Julien; Costa, Jean
2014-12-01
Melissopalynological, physicochemical, and volatile analyses of 29 samples of Corsican 'summer maquis' honey were performed. The pollen spectrum was characterized by a wide diversity of nectariferous and/or polleniferous taxa. The most important were Anthyllis hermanniae and Rubus sp., associated with some endemic taxa. Castanea sativa was also determined in these honeys with a great variation. The volatile fraction was characterized by 37 compounds and dominated by phenolic aldehydes and linear acids. The major compounds were phenylacetaldehyde, benzaldehyde, and nonanoic acid. Statistical analysis of pollen and volatile data showed that 18 samples were characterized by a high abundance of phenylacetaldehyde, which might relate to the high amount of A. hermanniae and Rubus sp. Eleven other samples displayed a higher proportion of phenolic ketones and linear acids, which characterized the nectar contribution of C. sativa and Thymus herba-barona, respectively. Copyright © 2014 Verlag Helvetica Chimica Acta AG, Zürich.
Drinking water quality assessment.
Aryal, J; Gautam, B; Sapkota, N
2012-09-01
Drinking water quality is the great public health concern because it is a major risk factor for high incidence of diarrheal diseases in Nepal. In the recent years, the prevalence rate of diarrhoea has been found the highest in Myagdi district. This study was carried out to assess the quality of drinking water from different natural sources, reservoirs and collection taps at Arthunge VDC of Myagdi district. A cross-sectional study was carried out using random sampling method in Arthunge VDC of Myagdi district from January to June,2010. 84 water samples representing natural sources, reservoirs and collection taps from the study area were collected. The physico-chemical and microbiological analysis was performed following standards technique set by APHA 1998 and statistical analysis was carried out using SPSS 11.5. The result was also compared with national and WHO guidelines. Out of 84 water samples (from natural source, reservoirs and tap water) analyzed, drinking water quality parameters (except arsenic and total coliform) of all water samples was found to be within the WHO standards and national standards.15.48% of water samples showed pH (13) higher than the WHO permissible guideline values. Similarly, 85.71% of water samples showed higher Arsenic value (72) than WHO value. Further, the statistical analysis showed no significant difference (P<0.05) of physico-chemical parameters and total coliform count of drinking water for collection taps water samples of winter (January, 2010) and summer (June, 2010). The microbiological examination of water samples revealed the presence of total coliform in 86.90% of water samples. The results obtained from physico-chemical analysis of water samples were within national standard and WHO standards except arsenic. The study also found the coliform contamination to be the key problem with drinking water.
Valid statistical inference methods for a case-control study with missing data.
Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun
2018-04-01
The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
ERIC Educational Resources Information Center
Celebuski, Carin; Farris, Elizabeth
This report presents the findings from the "Nutrition Education in Public Schools, K-12" survey that was designed to provide data on the status of nutrition education in U.S. public schools. Questionnaires were sent to 1,000 school principals of a nationally representative sample of U.S. elementary, middle, and high schools. The survey…
Kalman filter for statistical monitoring of forest cover across sub-continental regions
Raymond L. Czaplewski
1991-01-01
The Kalman filter is a multivariate generalization of the composite estimator which recursively combines a current direct estimate with a past estimate that is updated for expected change over time with a prediction model. The Kalman filter can estimate proportions of different cover types for sub-continental regions each year. A random sample of high-resolution...
Michael L. Hoppus; Andrew J. Lister
2002-01-01
A Landsat TM classification method (iterative guided spectral class rejection) produced a forest cover map of southern West Virginia that provided the stratification layer for producing estimates of timberland area from Forest Service FIA ground plots using a stratified sampling technique. These same high quality and expensive FIA ground plots provided ground reference...
Johnson, Eric D; Tubau, Elisabet
2017-06-01
Presenting natural frequencies facilitates Bayesian inferences relative to using percentages. Nevertheless, many people, including highly educated and skilled reasoners, still fail to provide Bayesian responses to these computationally simple problems. We show that the complexity of relational reasoning (e.g., the structural mapping between the presented and requested relations) can help explain the remaining difficulties. With a non-Bayesian inference that required identical arithmetic but afforded a more direct structural mapping, performance was universally high. Furthermore, reducing the relational demands of the task through questions that directed reasoners to use the presented statistics, as compared with questions that prompted the representation of a second, similar sample, also significantly improved reasoning. Distinct error patterns were also observed between these presented- and similar-sample scenarios, which suggested differences in relational-reasoning strategies. On the other hand, while higher numeracy was associated with better Bayesian reasoning, higher-numerate reasoners were not immune to the relational complexity of the task. Together, these findings validate the relational-reasoning view of Bayesian problem solving and highlight the importance of considering not only the presented task structure, but also the complexity of the structural alignment between the presented and requested relations.
Huang, Shuguang; Yeo, Adeline A; Li, Shuyu Dan
2007-10-01
The Kolmogorov-Smirnov (K-S) test is a statistical method often used for comparing two distributions. In high-throughput screening (HTS) studies, such distributions usually arise from the phenotype of independent cell populations. However, the K-S test has been criticized for being overly sensitive in applications, and it often detects a statistically significant difference that is not biologically meaningful. One major reason is that there is a common phenomenon in HTS studies that systematic drifting exists among the distributions due to reasons such as instrument variation, plate edge effect, accidental difference in sample handling, etc. In particular, in high-content cellular imaging experiments, the location shift could be dramatic since some compounds themselves are fluorescent. This oversensitivity of the K-S test is particularly overpowered in cellular assays where the sample sizes are very big (usually several thousands). In this paper, a modified K-S test is proposed to deal with the nonspecific location-shift problem in HTS studies. Specifically, we propose that the distributions are "normalized" by density curve alignment before the K-S test is conducted. In applications to simulation data and real experimental data, the results show that the proposed method has improved specificity.
Does size matter? Statistical limits of paleomagnetic field reconstruction from small rock specimens
NASA Astrophysics Data System (ADS)
Berndt, Thomas; Muxworthy, Adrian R.; Fabian, Karl
2016-01-01
As samples of ever decreasing sizes are being studied paleomagnetically, care has to be taken that the underlying assumptions of statistical thermodynamics (Maxwell-Boltzmann statistics) are being met. Here we determine how many grains and how large a magnetic moment a sample needs to have to be able to accurately record an ambient field. It is found that for samples with a thermoremanent magnetic moment larger than 10-11Am2 the assumption of a sufficiently large number of grains is usually given. Standard 25 mm diameter paleomagnetic samples usually contain enough magnetic grains such that statistical errors are negligible, but "single silicate crystal" works on, for example, zircon, plagioclase, and olivine crystals are approaching the limits of what is physically possible, leading to statistic errors in both the angular deviation and paleointensity that are comparable to other sources of error. The reliability of nanopaleomagnetic imaging techniques capable of resolving individual grains (used, for example, to study the cloudy zone in meteorites), however, is questionable due to the limited area of the material covered.
Implications of the Observed Ultraluminous X-Ray Source Luminosity Function
NASA Technical Reports Server (NTRS)
Swartz, Douglas A.; Tennant, Allyn; Soria, Roberto; Yukita, Mihoko
2012-01-01
We present the X-ray luminosity function (XLF) of ultraluminous X-ray (ULX) sources with 0.3-10.0 keV luminosities in excess of 10(sup 39) erg/s in a complete sample of nearby galaxies. The XLF shows a break or cut-off at high luminosities that deviates from its pure power law distribution at lower luminosities. The cut-off is at roughly the Eddington luminosity for a 90-140 solar mass accretor. We examine the effects on the observed XLF of sample biases, of small-number statistics (at the high luminosity end) and of measurement uncertainties. We consider the physical implications of the shape and normalization of the XLF. The XLF is also compared and contrasted to results of other recent surveys.
Shared-environmental contributions to high cognitive ability.
Kirkpatrick, Robert M; McGue, Matt; Iacono, William G
2009-07-01
Using a combined sample of adolescent twins, biological siblings, and adoptive siblings, we estimated and compared the differential shared-environmentality for high cognitive ability and the shared-environmental variance for the full range of ability during adolescence. Estimates obtained via multiple methods were in the neighborhood of 0.20, and suggest a modest effect of the shared environment on both high and full-range ability. We then examined the association of ability with three measures of the family environment in a subsample of adoptive siblings: parental occupational status, parental education, and disruptive life events. Only parental education showed significant (albeit modest) association with ability in both the biological and adoptive samples. We discuss these results in terms of the need for cognitive-development research to combine genetically sensitive designs and modern statistical methods with broad, thorough environmental measurement.
Topology in two dimensions. II - The Abell and ACO cluster catalogues
NASA Astrophysics Data System (ADS)
Plionis, Manolis; Valdarnini, Riccardo; Coles, Peter
1992-09-01
We apply a method for quantifying the topology of projected galaxy clustering to the Abell and ACO catalogues of rich clusters. We use numerical simulations to quantify the statistical bias involved in using high peaks to define the large-scale structure, and we use the results obtained to correct our observational determinations for this known selection effect and also for possible errors introduced by boundary effects. We find that the Abell cluster sample is consistent with clusters being identified with high peaks of a Gaussian random field, but that the ACO shows a slight meatball shift away from the Gaussian behavior over and above that expected purely from the high-peak selection. The most conservative explanation of this effect is that it is caused by some artefact of the procedure used to select the clusters in the two samples.
A Multi-Omics Approach to Evaluate the Quality of Milk Whey Used in Ricotta Cheese Production
Sattin, Eleonora; Andreani, Nadia A.; Carraro, Lisa; Lucchini, Rosaria; Fasolato, Luca; Telatin, Andrea; Balzan, Stefania; Novelli, Enrico; Simionati, Barbara; Cardazzo, Barbara
2016-01-01
In the past, milk whey was only a by-product of cheese production, but currently, it has a high commercial value for use in the food industries. However, the regulation of whey management (i.e., storage and hygienic properties) has not been updated, and as a consequence, its microbiological quality is very challenging for food safety. The Next Generation Sequencing (NGS) technique was applied to several whey samples used for Ricotta production to evaluate the microbial community composition in depth using both RNA and DNA as templates for NGS library construction. Whey samples demonstrating a high microbial and aerobic spore load contained mostly Firmicutes; although variable, some samples contained a relevant amount of Gammaproteobacteria. Several lots of whey acquired as raw material for Ricotta production presented defective organoleptic properties. To define the volatile compounds in normal and defective whey samples, a headspace gas chromatography/mass spectrometry (GC/MS) analysis was conducted. The statistical analysis demonstrated that different microbial communities resulted from DNA or cDNA library sequencing, and distinguishable microbiota composed the communities contained in the organoleptic-defective whey samples. PMID:27582735
NASA Astrophysics Data System (ADS)
Swan, B.; Laverdiere, M.; Yang, L.
2017-12-01
In the past five years, deep Convolutional Neural Networks (CNN) have been increasingly favored for computer vision applications due to their high accuracy and ability to generalize well in very complex problems; however, details of how they function and in turn how they may be optimized are still imperfectly understood. In particular, their complex and highly nonlinear network architecture, including many hidden layers and self-learned parameters, as well as their mathematical implications, presents open questions about how to effectively select training data. Without knowledge of the exact ways the model processes and transforms its inputs, intuition alone may fail as a guide to selecting highly relevant training samples. Working in the context of improving a CNN-based building extraction model used for the LandScan USA gridded population dataset, we have approached this problem by developing a semi-supervised, highly-scalable approach to select training samples from a dataset of identified commission errors. Due to the large scope this project, tens of thousands of potential samples could be derived from identified commission errors. To efficiently trim those samples down to a manageable and effective set for creating additional training sample, we statistically summarized the spectral characteristics of areas with rates of commission errors at the image tile level and grouped these tiles using affinity propagation. Highly representative members of each commission error cluster were then used to select sites for training sample creation. The model will be incrementally re-trained with the new training data to allow for an assessment of how the addition of different types of samples affects the model performance, such as precision and recall rates. By using quantitative analysis and data clustering techniques to select highly relevant training samples, we hope to improve model performance in a manner that is resource efficient, both in terms of training process and in sample creation.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
Statistical inference for tumor growth inhibition T/C ratio.
Wu, Jianrong
2010-09-01
The tumor growth inhibition T/C ratio is commonly used to quantify treatment effects in drug screening tumor xenograft experiments. The T/C ratio is converted to an antitumor activity rating using an arbitrary cutoff point and often without any formal statistical inference. Here, we applied a nonparametric bootstrap method and a small sample likelihood ratio statistic to make a statistical inference of the T/C ratio, including both hypothesis testing and a confidence interval estimate. Furthermore, sample size and power are also discussed for statistical design of tumor xenograft experiments. Tumor xenograft data from an actual experiment were analyzed to illustrate the application.
An Independent Filter for Gene Set Testing Based on Spectral Enrichment.
Frost, H Robert; Li, Zhigang; Asselbergs, Folkert W; Moore, Jason H
2015-01-01
Gene set testing has become an indispensable tool for the analysis of high-dimensional genomic data. An important motivation for testing gene sets, rather than individual genomic variables, is to improve statistical power by reducing the number of tested hypotheses. Given the dramatic growth in common gene set collections, however, testing is often performed with nearly as many gene sets as underlying genomic variables. To address the challenge to statistical power posed by large gene set collections, we have developed spectral gene set filtering (SGSF), a novel technique for independent filtering of gene set collections prior to gene set testing. The SGSF method uses as a filter statistic the p-value measuring the statistical significance of the association between each gene set and the sample principal components (PCs), taking into account the significance of the associated eigenvalues. Because this filter statistic is independent of standard gene set test statistics under the null hypothesis but dependent under the alternative, the proportion of enriched gene sets is increased without impacting the type I error rate. As shown using simulated and real gene expression data, the SGSF algorithm accurately filters gene sets unrelated to the experimental outcome resulting in significantly increased gene set testing power.
The Math Problem: Advertising Students' Attitudes toward Statistics
ERIC Educational Resources Information Center
Fullerton, Jami A.; Kendrick, Alice
2013-01-01
This study used the Students' Attitudes toward Statistics Scale (STATS) to measure attitude toward statistics among a national sample of advertising students. A factor analysis revealed four underlying factors make up the attitude toward statistics construct--"Interest & Future Applicability," "Confidence," "Statistical Tools," and "Initiative."…
Chaibub Neto, Elias
2015-01-01
In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling. PMID:26125965
Vigli, Georgia; Philippidis, Angelos; Spyros, Apostolos; Dais, Photis
2003-09-10
A combination of (1)H NMR and (31)P NMR spectroscopy and multivariate statistical analysis was used to classify 192 samples from 13 types of vegetable oils, namely, hazelnut, sunflower, corn, soybean, sesame, walnut, rapeseed, almond, palm, groundnut, safflower, coconut, and virgin olive oils from various regions of Greece. 1,2-Diglycerides, 1,3-diglycerides, the ratio of 1,2-diglycerides to total diglycerides, acidity, iodine value, and fatty acid composition determined upon analysis of the respective (1)H NMR and (31)P NMR spectra were selected as variables to establish a classification/prediction model by employing discriminant analysis. This model, obtained from the training set of 128 samples, resulted in a significant discrimination among the different classes of oils, whereas 100% of correct validated assignments for 64 samples were obtained. Different artificial mixtures of olive-hazelnut, olive-corn, olive-sunflower, and olive-soybean oils were prepared and analyzed by (1)H NMR and (31)P NMR spectroscopy. Subsequent discriminant analysis of the data allowed detection of adulteration as low as 5% w/w, provided that fresh virgin olive oil samples were used, as reflected by their high 1,2-diglycerides to total diglycerides ratio (D > or = 0.90).
Barish, Syndi; Ochs, Michael F.; Sontag, Eduardo D.; Gevertz, Jana L.
2017-01-01
Cancer is a highly heterogeneous disease, exhibiting spatial and temporal variations that pose challenges for designing robust therapies. Here, we propose the VEPART (Virtual Expansion of Populations for Analyzing Robustness of Therapies) technique as a platform that integrates experimental data, mathematical modeling, and statistical analyses for identifying robust optimal treatment protocols. VEPART begins with time course experimental data for a sample population, and a mathematical model fit to aggregate data from that sample population. Using nonparametric statistics, the sample population is amplified and used to create a large number of virtual populations. At the final step of VEPART, robustness is assessed by identifying and analyzing the optimal therapy (perhaps restricted to a set of clinically realizable protocols) across each virtual population. As proof of concept, we have applied the VEPART method to study the robustness of treatment response in a mouse model of melanoma subject to treatment with immunostimulatory oncolytic viruses and dendritic cell vaccines. Our analysis (i) showed that every scheduling variant of the experimentally used treatment protocol is fragile (nonrobust) and (ii) discovered an alternative region of dosing space (lower oncolytic virus dose, higher dendritic cell dose) for which a robust optimal protocol exists. PMID:28716945
Li, Cen; Yang, Hongxia; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao
2016-01-01
Zuotai (gTso thal) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100–800 nm, which commonly further aggregate into 1–30 μm loosely amorphous particles. XRD test shows that β-HgS, S8, and α-HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai, and it would play a positive role in interpreting this mysterious Tibetan drug. PMID:27738409
Li, Cen; Yang, Hongxia; Du, Yuzhi; Xiao, Yuancan; Zhandui; Sanglao; Wang, Zhang; Ladan, Duojie; Bi, Hongtao; Wei, Lixin
2016-01-01
Zuotai ( gTso thal ) is one of the famous drugs containing mercury in Tibetan medicine. However, little is known about the chemical substance basis of its pharmacodynamics and the intrinsic link of different samples sources so far. Given this, energy dispersive spectrometry of X-ray (EDX), scanning electron microscopy (SEM), atomic force microscopy (AFM), and powder X-ray diffraction (XRD) were used to assay the elements, micromorphology, and phase composition of nine Zuotai samples from different regions, respectively; the XRD fingerprint features of Zuotai were analyzed by multivariate statistical analysis. EDX result shows that Zuotai contains Hg, S, O, Fe, Al, Cu, and other elements. SEM and AFM observations suggest that Zuotai is a kind of ancient nanodrug. Its particles are mainly in the range of 100-800 nm, which commonly further aggregate into 1-30 μ m loosely amorphous particles. XRD test shows that β -HgS, S 8 , and α -HgS are its main phase compositions. XRD fingerprint analysis indicates that the similarity degrees of nine samples are very high, and the results of multivariate statistical analysis are broadly consistent with sample sources. The present research has revealed the physicochemical characteristics of Zuotai , and it would play a positive role in interpreting this mysterious Tibetan drug.
Bovine origin Staphylococcus aureus: A new zoonotic agent?
Rao, Relangi Tulasi; Jayakumar, Kannan; Kumar, Pavitra
2017-10-01
The study aimed to assess the nature of animal origin Staphylococcus aureus strains. The study has zoonotic importance and aimed to compare virulence between two different hosts, i.e., bovine and ovine origin. Conventional polymerase chain reaction-based methods used for the characterization of S. aureus strains and chick embryo model employed for the assessment of virulence capacity of strains. All statistical tests carried on R program, version 3.0.4. After initial screening and molecular characterization of the prevalence of S. aureus found to be 42.62% in bovine origin samples and 28.35% among ovine origin samples. Meanwhile, the methicillin-resistant S. aureus prevalence is found to be meager in both the hosts. Among the samples, only 6.8% isolates tested positive for methicillin resistance. The biofilm formation quantified and the variation compared among the host. A Welch two-sample t -test found to be statistically significant, t=2.3179, df=28.103, and p=0.02795. Chicken embryo model found effective to test the pathogenicity of the strains. The study helped to conclude healthy bovines can act as S. aureus reservoirs. Bovine origin S. aureus strains are more virulent than ovine origin strains. Bovine origin strains have high probability to become zoonotic pathogen. Further, gene knock out studies may be conducted to conclude zoonocity of the bovine origin strains.
NASA Astrophysics Data System (ADS)
Lavely, Adam; Vijayakumar, Ganesh; Brasseur, James; Paterson, Eric; Kinzel, Michael
2011-11-01
Using large-eddy simulation (LES) of the neutral and moderately convective atmospheric boundary layers (NBL, MCBL), we analyze the impact of coherent turbulence structure of the atmospheric surface layer on the short-time statistics that are commonly collected from wind turbines. The incoming winds are conditionally sampled with a filtering and thresholding algorithm into high/low horizontal and vertical velocity fluctuation coherent events. The time scales of these events are ~5 - 20 blade rotations and are roughly twice as long in the MCBL as the NBL. Horizontal velocity events are associated with greater variability in rotor power, lift and blade-bending moment than vertical velocity events. The variability in the industry standard 10 minute average for rotor power, sectional lift and wind velocity had a standard deviation of ~ 5% relative to the ``infinite time'' statistics for the NBL and ~10% for the MCBL. We conclude that turbulence structure associated with atmospheric stability state contributes considerable, quantifiable, variability to wind turbine statistics. Supported by NSF and DOE.
40 CFR 90.712 - Request for public hearing.
Code of Federal Regulations, 2010 CFR
2010-07-01
... sampling plans and statistical analyses have been properly applied (specifically, whether sampling procedures and statistical analyses specified in this subpart were followed and whether there exists a basis... Clerk and will be made available to the public during Agency business hours. ...
Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona
2017-01-01
In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural equation model and, therefore, contributed indirectly and negatively to performance. Furthermore, it had a direct negative impact on performance (probably via increased tension and worry in the exam). The results of the study speak for shared but also unique components of statistics anxiety and mathematics anxiety. They are also important for instruction and give recommendations to learners as well as to instructors. PMID:28790938
Paechter, Manuela; Macher, Daniel; Martskvishvili, Khatuna; Wimmer, Sigrid; Papousek, Ilona
2017-01-01
In many social science majors, e.g., psychology, students report high levels of statistics anxiety. However, these majors are often chosen by students who are less prone to mathematics and who might have experienced difficulties and unpleasant feelings in their mathematics courses at school. The present study investigates whether statistics anxiety is a genuine form of anxiety that impairs students' achievements or whether learners mainly transfer previous experiences in mathematics and their anxiety in mathematics to statistics. The relationship between mathematics anxiety and statistics anxiety, their relationship to learning behaviors and to performance in a statistics examination were investigated in a sample of 225 undergraduate psychology students (164 women, 61 men). Data were recorded at three points in time: At the beginning of term students' mathematics anxiety, general proneness to anxiety, school grades, and demographic data were assessed; 2 weeks before the end of term, they completed questionnaires on statistics anxiety and their learning behaviors. At the end of term, examination scores were recorded. Mathematics anxiety and statistics anxiety correlated highly but the comparison of different structural equation models showed that they had genuine and even antagonistic contributions to learning behaviors and performance in the examination. Surprisingly, mathematics anxiety was positively related to performance. It might be that students realized over the course of their first term that knowledge and skills in higher secondary education mathematics are not sufficient to be successful in statistics. Part of mathematics anxiety may then have strengthened positive extrinsic effort motivation by the intention to avoid failure and may have led to higher effort for the exam preparation. However, via statistics anxiety mathematics anxiety also had a negative contribution to performance. Statistics anxiety led to higher procrastination in the structural equation model and, therefore, contributed indirectly and negatively to performance. Furthermore, it had a direct negative impact on performance (probably via increased tension and worry in the exam). The results of the study speak for shared but also unique components of statistics anxiety and mathematics anxiety. They are also important for instruction and give recommendations to learners as well as to instructors.
Defining window-boundaries for genomic analyses using smoothing spline techniques
Beissinger, Timothy M.; Rosa, Guilherme J.M.; Kaeppler, Shawn M.; ...
2015-04-17
High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the datamore » and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome.« less
Statistical estimation of femur micro-architecture using optimal shape and density predictors.
Lekadir, Karim; Hazrati-Marangalou, Javad; Hoogendoorn, Corné; Taylor, Zeike; van Rietbergen, Bert; Frangi, Alejandro F
2015-02-26
The personalization of trabecular micro-architecture has been recently shown to be important in patient-specific biomechanical models of the femur. However, high-resolution in vivo imaging of bone micro-architecture using existing modalities is still infeasible in practice due to the associated acquisition times, costs, and X-ray radiation exposure. In this study, we describe a statistical approach for the prediction of the femur micro-architecture based on the more easily extracted subject-specific bone shape and mineral density information. To this end, a training sample of ex vivo micro-CT images is used to learn the existing statistical relationships within the low and high resolution image data. More specifically, optimal bone shape and mineral density features are selected based on their predictive power and used within a partial least square regression model to estimate the unknown trabecular micro-architecture within the anatomical models of new subjects. The experimental results demonstrate the accuracy of the proposed approach, with average errors of 0.07 for both the degree of anisotropy and tensor norms. Copyright © 2015 Elsevier Ltd. All rights reserved.
The endothelial sample size analysis in corneal specular microscopy clinical examinations.
Abib, Fernando C; Holzchuh, Ricardo; Schaefer, Artur; Schaefer, Tania; Godois, Ronialci
2012-05-01
To evaluate endothelial cell sample size and statistical error in corneal specular microscopy (CSM) examinations. One hundred twenty examinations were conducted with 4 types of corneal specular microscopes: 30 with each BioOptics, CSO, Konan, and Topcon corneal specular microscopes. All endothelial image data were analyzed by respective instrument software and also by the Cells Analyzer software with a method developed in our lab. A reliability degree (RD) of 95% and a relative error (RE) of 0.05 were used as cut-off values to analyze images of the counted endothelial cells called samples. The sample size mean was the number of cells evaluated on the images obtained with each device. Only examinations with RE < 0.05 were considered statistically correct and suitable for comparisons with future examinations. The Cells Analyzer software was used to calculate the RE and customized sample size for all examinations. Bio-Optics: sample size, 97 ± 22 cells; RE, 6.52 ± 0.86; only 10% of the examinations had sufficient endothelial cell quantity (RE < 0.05); customized sample size, 162 ± 34 cells. CSO: sample size, 110 ± 20 cells; RE, 5.98 ± 0.98; only 16.6% of the examinations had sufficient endothelial cell quantity (RE < 0.05); customized sample size, 157 ± 45 cells. Konan: sample size, 80 ± 27 cells; RE, 10.6 ± 3.67; none of the examinations had sufficient endothelial cell quantity (RE > 0.05); customized sample size, 336 ± 131 cells. Topcon: sample size, 87 ± 17 cells; RE, 10.1 ± 2.52; none of the examinations had sufficient endothelial cell quantity (RE > 0.05); customized sample size, 382 ± 159 cells. A very high number of CSM examinations had sample errors based on Cells Analyzer software. The endothelial sample size (examinations) needs to include more cells to be reliable and reproducible. The Cells Analyzer tutorial routine will be useful for CSM examination reliability and reproducibility.
USDA-ARS?s Scientific Manuscript database
Whether a required Salmonella test series is passed or failed depends not only on the presence of the bacteria, but also on the methods for taking samples, the methods for culturing samples, and the statistics associated with the sampling plan. The pass-fail probabilities of the two-class attribute...
ERIC Educational Resources Information Center
Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy
2006-01-01
We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Testing for independence in J×K contingency tables with complex sample survey data.
Lipsitz, Stuart R; Fitzmaurice, Garrett M; Sinha, Debajyoti; Hevelone, Nathanael; Giovannucci, Edward; Hu, Jim C
2015-09-01
The test of independence of row and column variables in a (J×K) contingency table is a widely used statistical test in many areas of application. For complex survey samples, use of the standard Pearson chi-squared test is inappropriate due to correlation among units within the same cluster. Rao and Scott (1981, Journal of the American Statistical Association 76, 221-230) proposed an approach in which the standard Pearson chi-squared statistic is multiplied by a design effect to adjust for the complex survey design. Unfortunately, this test fails to exist when one of the observed cell counts equals zero. Even with the large samples typical of many complex surveys, zero cell counts can occur for rare events, small domains, or contingency tables with a large number of cells. Here, we propose Wald and score test statistics for independence based on weighted least squares estimating equations. In contrast to the Rao-Scott test statistic, the proposed Wald and score test statistics always exist. In simulations, the score test is found to perform best with respect to type I error. The proposed method is motivated by, and applied to, post surgical complications data from the United States' Nationwide Inpatient Sample (NIS) complex survey of hospitals in 2008. © 2015, The International Biometric Society.
Eklund, Andreas; Bergström, Gunnar; Bodin, Lennart; Axén, Iben
2015-10-19
Psychological, behavioral and social factors have long been considered important in the development of persistent pain. Little is known about how chiropractic low back pain (LBP) patients compare to other LBP patients in terms of psychological/behavioral characteristics. In this cross-sectional study, the aim was to investigate patients with LBP as regards to psychosocial/behavioral characteristics by describing a chiropractic primary care population and comparing this sample to three other populations using the MPI-S instrument. Thus, four different samples were compared. A: Four hundred eighty subjects from chiropractic primary care clinics. B: One hundred twenty-eight subjects from a gainfully employed population (sick listed with high risk of developing chronicity). C: Two hundred seventy-three subjects from a secondary care rehabilitation clinic. D: Two hundred thirty-five subjects from secondary care clinics. The Swedish version of the Multidimensional Pain Inventory (MPI-S) was used to collect data. Subjects were classified using a cluster analytic strategy into three pre-defined subgroups (named adaptive copers, dysfunctional and interpersonally distressed). The data show statistically significant overall differences across samples for the subgroups based on psychological and behavioral characteristics. The cluster classifications placed (in terms of the proportions of the adaptive copers and dysfunctional subgroups) sample A between B and the two secondary care samples C and D. The chiropractic primary care sample was more affected by pain and worse off with regards to psychological and behavioral characteristics compared to the other primary care sample. Based on our findings from the MPI-S instrument the 4 samples may be considered statistically and clinically different. Sample A comes from an ongoing trial registered at clinical trials.gov; NCT01539863 , February 22, 2012.
NASA Astrophysics Data System (ADS)
Ghannadpour, Seyyed Saeed; Hezarkhani, Ardeshir
2016-03-01
The U-statistic method is one of the most important structural methods to separate the anomaly from the background. It considers the location of samples and carries out the statistical analysis of the data without judging from a geochemical point of view and tries to separate subpopulations and determine anomalous areas. In the present study, to use U-statistic method in three-dimensional (3D) condition, U-statistic is applied on the grade of two ideal test examples, by considering sample Z values (elevation). So far, this is the first time that this method has been applied on a 3D condition. To evaluate the performance of 3D U-statistic method and in order to compare U-statistic with one non-structural method, the method of threshold assessment based on median and standard deviation (MSD method) is applied on the two example tests. Results show that the samples indicated by U-statistic method as anomalous are more regular and involve less dispersion than those indicated by the MSD method. So that, according to the location of anomalous samples, denser areas of them can be determined as promising zones. Moreover, results show that at a threshold of U = 0, the total error of misclassification for U-statistic method is much smaller than the total error of criteria of bar {x}+n× s. Finally, 3D model of two test examples for separating anomaly from background using 3D U-statistic method is provided. The source code for a software program, which was developed in the MATLAB programming language in order to perform the calculations of the 3D U-spatial statistic method, is additionally provided. This software is compatible with all the geochemical varieties and can be used in similar exploration projects.
Image correlation and sampling study
NASA Technical Reports Server (NTRS)
Popp, D. J.; Mccormack, D. S.; Sedwick, J. L.
1972-01-01
The development of analytical approaches for solving image correlation and image sampling of multispectral data is discussed. Relevant multispectral image statistics which are applicable to image correlation and sampling are identified. The general image statistics include intensity mean, variance, amplitude histogram, power spectral density function, and autocorrelation function. The translation problem associated with digital image registration and the analytical means for comparing commonly used correlation techniques are considered. General expressions for determining the reconstruction error for specific image sampling strategies are developed.
Robin M. Reich; Hans T. Schreuder
2006-01-01
The sampling strategy involving both statistical and in-place inventory information is presented for the natural resources project of the Green Belt area (Centuron Verde) in the Mexican state of Jalisco. The sampling designs used were a grid based ground sample of a 90x90 m plot and a two-stage stratified sample of 30 x 30 m plots. The data collected were used to...
Data mining and statistical inference in selective laser melting
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamath, Chandrika
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Data mining and statistical inference in selective laser melting
Kamath, Chandrika
2016-01-11
Selective laser melting (SLM) is an additive manufacturing process that builds a complex three-dimensional part, layer-by-layer, using a laser beam to fuse fine metal powder together. The design freedom afforded by SLM comes associated with complexity. As the physical phenomena occur over a broad range of length and time scales, the computational cost of modeling the process is high. At the same time, the large number of parameters that control the quality of a part make experiments expensive. In this paper, we describe ways in which we can use data mining and statistical inference techniques to intelligently combine simulations andmore » experiments to build parts with desired properties. We start with a brief summary of prior work in finding process parameters for high-density parts. We then expand on this work to show how we can improve the approach by using feature selection techniques to identify important variables, data-driven surrogate models to reduce computational costs, improved sampling techniques to cover the design space adequately, and uncertainty analysis for statistical inference. Here, our results indicate that techniques from data mining and statistics can complement those from physical modeling to provide greater insight into complex processes such as selective laser melting.« less
Ritchie, Marylyn D.; Hahn, Lance W.; Roodi, Nady; Bailey, L. Renee; Dupont, William D.; Parl, Fritz F.; Moore, Jason H.
2001-01-01
One of the greatest challenges facing human geneticists is the identification and characterization of susceptibility genes for common complex multifactorial human diseases. This challenge is partly due to the limitations of parametric-statistical methods for detection of gene effects that are dependent solely or partially on interactions with other genes and with environmental exposures. We introduce multifactor-dimensionality reduction (MDR) as a method for reducing the dimensionality of multilocus information, to improve the identification of polymorphism combinations associated with disease risk. The MDR method is nonparametric (i.e., no hypothesis about the value of a statistical parameter is made), is model-free (i.e., it assumes no particular inheritance model), and is directly applicable to case-control and discordant-sib-pair studies. Using simulated case-control data, we demonstrate that MDR has reasonable power to identify interactions among two or more loci in relatively small samples. When it was applied to a sporadic breast cancer case-control data set, in the absence of any statistically significant independent main effects, MDR identified a statistically significant high-order interaction among four polymorphisms from three different estrogen-metabolism genes. To our knowledge, this is the first report of a four-locus interaction associated with a common complex multifactorial disease. PMID:11404819
DREAM: An Efficient Methodology for DSMC Simulation of Unsteady Processes
NASA Astrophysics Data System (ADS)
Cave, H. M.; Jermy, M. C.; Tseng, K. C.; Wu, J. S.
2008-12-01
A technique called the DSMC Rapid Ensemble Averaging Method (DREAM) for reducing the statistical scatter in the output from unsteady DSMC simulations is introduced. During post-processing by DREAM, the DSMC algorithm is re-run multiple times over a short period before the temporal point of interest thus building up a combination of time- and ensemble-averaged sampling data. The particle data is regenerated several mean collision times before the output time using the particle data generated during the original DSMC run. This methodology conserves the original phase space data from the DSMC run and so is suitable for reducing the statistical scatter in highly non-equilibrium flows. In this paper, the DREAM-II method is investigated and verified in detail. Propagating shock waves at high Mach numbers (Mach 8 and 12) are simulated using a parallel DSMC code (PDSC) and then post-processed using DREAM. The ability of DREAM to obtain the correct particle velocity distribution in the shock structure is demonstrated and the reduction of statistical scatter in the output macroscopic properties is measured. DREAM is also used to reduce the statistical scatter in the results from the interaction of a Mach 4 shock with a square cavity and for the interaction of a Mach 12 shock on a wedge in a channel.
Methods for processing microarray data.
Ares, Manuel
2014-02-01
Quality control must be maintained at every step of a microarray experiment, from RNA isolation through statistical evaluation. Here we provide suggestions for analyzing microarray data. Because the utility of the results depends directly on the design of the experiment, the first critical step is to ensure that the experiment can be properly analyzed and interpreted. What is the biological question? What is the best way to perform the experiment? How many replicates will be required to obtain the desired statistical resolution? Next, the samples must be prepared, pass quality controls for integrity and representation, and be hybridized and scanned. Also, slides with defects, missing data, high background, or weak signal must be rejected. Data from individual slides must be normalized and combined so that the data are as free of systematic bias as possible. The third phase is to apply statistical filters and tests to the data to determine genes (1) expressed above background, (2) whose expression level changes in different samples, and (3) whose RNA-processing patterns or protein associations change. Next, a subset of the data should be validated by an alternative method, such as reverse transcription-polymerase chain reaction (RT-PCR). Provided that this endorses the general conclusions of the array analysis, gene sets whose expression, splicing, polyadenylation, protein binding, etc. change in different samples can be classified with respect to function, sequence motif properties, as well as other categories to extract hypotheses for their biological roles and regulatory logic.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Kumar, R Vinoth; Rajvikram, N; Rajakumar, P; Saravanan, R; Deepak, V Arun; Vijaykumar, V
2016-03-01
The aim of this study was to evaluate the release of nickel and chromium ions in human saliva during fixed orthodontic therapy. Ten patients with Angle's Class-I malocclusion with bimaxillary protrusion without any metal restorations or crowns and with all the permanent teeth were selected. Five male patients and five female patients in the age group range of 14 to 23 years were scheduled for orthodontic treatment with first premolar extraction. Saliva samples were collected in three stages: sample 1, before orthodontic treatment; sample 2, after 10 days of bonding sample; and sample 3, after 1 month of bonding. The samples were analyzed for the following metals nickel and chromium using inductively coupled plasma optical emission spectrometry (ICP-OES). The levels of nickel and chromium were statistically significant, while nickel showed a gradual increase in the first 10 days and a decline thereafter. Chromium showed a gradual increase and was statistically significant on the 30th day. There was greatest release of ions during the first 10 days and a gradual decline thereafter. Control group had traces of nickel and chromium. While comparing levels of nickel in saliva, there was a significant rise from baseline to 10th and 30th-day sample, which was statistically significant. While comparing 10th day to that of 30th day, there was no statistical significance. The levels of chromium ion in the saliva were more in 30th day, and when comparing 10th-day sample with 30th day, there was statistical significance. Nickel and chromium levels were well within the permissible levels. However, some hypersensitive individuals may be allergic to this minimal permissible level.